Practice with geopandas#

Here I have write out the code to load in the SARP 2020 at home data. If you’d like to use your own dataset for this practice the questions are generally generic enough for you to try this with any spatial dataset.

Hints are below

import pandas as pd
filepath = '../data/SARP 2020 final.xlsx'
SARP_2020 = pd.read_excel(filepath)[:929]
  1. What are the column names for latitude and longitude?

  1. Use the latitude and longitude columns to create a list of geometry objects that represents the data location of this dataframe.

  1. Create a GeoDataFrame for the SARP_2020 data. Call the dataset geoSARP_2020.

  1. Since you know you will likely want to do some distance or area calculations, reproject your dataframe to epsg:3857 and save the reprojected dataframe to the variable geoSARP_2020_meters

  1. The world’s largest ball of twine is located in Cawker City, Kansas, at 39.5094° N, 98.4344° W. For reasons which are scientifically ambiguous, find the distance from each location to the world’s largest ball of twine.

  1. Which sampling location is closest to the world’s largest ball of twine? Which is the furthest away?

  1. (process not covered in lecture) What is the total area covered by all all of the points in the dataset?

Google hint: “shapely calcuale area of set of points” or this article_

  1. Here I have imported the shape of California from a geojson.

from shapely.geometry import shape
import json
with open('../data/california.geojson') as f:
    ca_geom_json = json.load(f)
ca_geom = shape(ca_geom_json['geometry'])

Create a boolean Series showing if each point is intersects the state of California.

  1. Use that boolean series to filter your dataframe to only the samples which were taken in California

  1. Filter your dataset to only samples which were taken in California and also had a 'CO (ppbv)' value over 102

Hints#

  1. Try using DATAFRAME.columns.values

  1. gpd.points_from_xy(), with an example in the lecture notebook

  1. gpd.GeoDataFrame(DATAFRAME, geometry=LIST_OF_GEOMETRIES)

  1. Don’t forget to set the crs of the dataframe to epsg=4326 before reprojecting

  1. Remember to use the dataframe in meters

  1. First answer in the stackoverflow article. Calculate area with SHAPELY_OBJ.area

  1. Which version of the SARP_2020 data should you be using? The one in meters or in degrees?

  1. The spatial operation that corresponds to “taken in California” is .intersects()

  1. Here we are using multiple statements in a filter. Look back at lesson 4 for a reminder on this syntax