Practice with geopandas
#
Here I have write out the code to load in the SARP 2020 at home data. If you’d like to use your own dataset for this practice the questions are generally generic enough for you to try this with any spatial dataset.
Hints are below
import pandas as pd
filepath = '../data/SARP 2020 final.xlsx'
SARP_2020 = pd.read_excel(filepath)[:929]
What are the column names for latitude and longitude?
Use the latitude and longitude columns to create a list of geometry objects that represents the data location of this dataframe.
Create a GeoDataFrame for the SARP_2020 data. Call the dataset
geoSARP_2020
.
Since you know you will likely want to do some distance or area calculations, reproject your dataframe to epsg:3857 and save the reprojected dataframe to the variable
geoSARP_2020_meters
The world’s largest ball of twine is located in Cawker City, Kansas, at 39.5094° N, 98.4344° W. For reasons which are scientifically ambiguous, find the distance from each location to the world’s largest ball of twine.
Which sampling location is closest to the world’s largest ball of twine? Which is the furthest away?
(process not covered in lecture) What is the total area covered by all all of the points in the dataset?
Google hint: “shapely calcuale area of set of points” or this article_
Here I have imported the shape of California from a geojson.
from shapely.geometry import shape
import json
with open('../data/california.geojson') as f:
ca_geom_json = json.load(f)
ca_geom = shape(ca_geom_json['geometry'])
Create a boolean Series showing if each point is intersects the state of California.
Use that boolean series to filter your dataframe to only the samples which were taken in California
Filter your dataset to only samples which were taken in California and also had a
'CO (ppbv)'
value over 102
Hints#
Try using
DATAFRAME.columns.values
gpd.points_from_xy()
, with an example in the lecture notebook
gpd.GeoDataFrame(DATAFRAME, geometry=LIST_OF_GEOMETRIES)
Don’t forget to set the crs of the dataframe to epsg=4326 before reprojecting
Remember to use the dataframe in meters
First answer in the stackoverflow article. Calculate area with
SHAPELY_OBJ.area
Which version of the SARP_2020 data should you be using? The one in meters or in degrees?
The spatial operation that corresponds to “taken in California” is
.intersects()
Here we are using multiple statements in a filter. Look back at lesson 4 for a reminder on this syntax