Accessing Data#

Context#

Today we are going to revisit our AVIRIS dataset and look again at the metadata. This time instead of focusing on properties like height and width we are going to focus on the spatial metadata, like crs bounds and transform. We will also created some numpy masked arrays to start considering nodata values or ROIs.

Accessing Raster Coordinates#

So to figure out where a particular point is on the earth we need two things:

  1. the EPSG code to tell us which grid we are looking at

  2. our coordinates

Example with AVIRIS#

Since our AVIRIS data uses UTM grids our coordinates will be an easting and a northing.

import rasterio
filepath_rad = './data/subset_f180628t01p00r02_corr_v1k1_img'
with rasterio.open(filepath_rad, 'r') as src:
    bbox = src.bounds
    src_crs = src.crs
    
print('crs is ', src_crs)
print('bbox is ', bbox)
crs is  EPSG:32610
bbox is  BoundingBox(left=769855.79, bottom=3751257.6, right=774030.79, top=3757520.1)

Looking at the output of the above cell we see that the EPSG code is 32610. The 6 in this code tells us that we are looking at the northern hemisphere, and the 10 tells us that we are looking at UTM zone 10.

Reprojecting a vector to epsg:4326#

Depending on why you want these coordinates the easting/northing still might not be very useful unless we can get them to latitude and longitude. Reprojecting a whole raster is a multistep process, but in the second part of the practice yesterday we saw how to reproject a vector object, like a point or a polygon.

from pyproj import Transformer
from shapely.ops import transform
Transformer.from_crs(INPUT_PROJ, OUTPUT_PROJ, always_xy=True)
t = Transformer.from_crs('epsg:4326', 'epsg:3857', always_xy=True).transform
transform(t, SHAPELY_POINT)

Example 1#

Reproject a point

Example 2#

Reproject the bounding box

📝 Check your understanding

How can we figure out where our raster is located in space?

Affine Transformations#

We can get the coordinates of the corners of our bounding box from the metadata, but what if we want the coordinates of some other location on our raster? We know how to find a pixel value using rows and columns, but what about using coordinates?

Affine transformations are the way that we move from row, column notation (pixel space) to the notation of a coordinate reference system (usually either latitude & longitude or easting & northing)

My favorite affine transformation article is this one written by Matthew Perry.

Here is what an affine tranformation looks like:

with rasterio.open(filepath_rad, 'r') as src:
    print(src.transform)
| 0.00, 16.70, 769855.79|
| 16.70,-0.00, 3751257.60|
| 0.00, 0.00, 1.00|

Where each number means something different:

a

b

c

d

e

f

0

0

1

  • a = width of a pixel in units of the crs

  • b = row rotation (typically zero)

  • c = x-coordinate of the upper-left corner of the upper-left pixel

  • d = column rotation (typically zero)

  • e = height of a pixel in units of the crs (typically negative)

  • f = y-coordinate of the of the upper-left corner of the upper-left pixel

The three numbers in the bottom row are always 0, 0, 1 (since we are working on a 2 dimensional plane).

So looking again at our affine transform from above we see that the size of an single pixel is 17.10 meters square and that the upper left corner of the image is located at 475,785.77, 3,350,578.50.

Using the affine to get the coordinates of a particular grid cell#

The affine transform gives us some information just by looking at it, but it can also be used to convert a row, column coordiante to a grid coordinate.

# Define which pixel we want to convert
row, col = 40, 567

# Extract the affine transform
with rasterio.open(filepath_rad, 'r') as src:
    affine = src.transform

# Use the transform to convert our input pixel row/column to coordinates
x, y = affine * (col, row)
print('pixel ', row, col, ' corresponds to easting, northing of ', x, y)
pixel  40 567  corresponds to easting, northing of  770523.79 3760726.5

Using the inverse affine to get the pixel of a coordinate#

If you have a grid coordinate and want to find out what the row, column location is you need the inverse affine transform. The inverse transform is calculated with the ~ operator.

# Define the coordinates we want to convert
xcoord, ycoord = 475900, 3377000

# Extract the affine transform
with rasterio.open(filepath_rad, 'r') as src:
    affine = src.transform

# Use the transform to convert our input coordinates to pixel row/column
# col, row = ~a * (x, y)
col, row = ~affine * (xcoord, ycoord)
print('row/column of my coordinates: ', row, col)
row/column of my coordinates:  -17602.143113772454 -22410.634730538935

If you get a negative number here or your pixel row/column are bigger than the height or width of your raster it means you have asked for the pixel location of a point which is not located in your raster.

with rasterio.open(filepath_rad, 'r') as src:
    print('shape of my raster ', src.height, src.width)
shape of my raster  250 375

📝 Check your understanding

What is the purpose of an affine transform?

Subsetting Data#

Subsetting your data refers to selecting some amount of data to work with that is less than the whole image. In Python I see two main ways to do this:

  1. manually by dropping data from your numpy array

  2. using rasterio’s windowed reading methods

The manual option is simplier, but has the limitations that

  • it is a little harder to save the output data to a new file

  • you still have to read the whole dataset into memory

The rasterio option doesn’t have these limitations, but is a bit more complicated.

Memory vs. Storage#

Memory

Storage

Definition

how much data you can have open at one time

how much data you can have saved on your computer

Persistence

goes away when you close the program

stays unless you delete it

Size

volume in 10s (Ex. 32 GB RAM)

volume in hundreds (ex. 256 GB)

Using Indexes to subset data#

filepath_rad = './data/subset_f180628t01p00r02_corr_v1k1_img'
with rasterio.open(filepath_rad, 'r') as src:
    full_dataset = src.read()

Using Window to subset data#

A lot of the code from this section comes from the rasterio docs.

Windowed reading requires a bit more work up front, but it allows you to keep your transform geospatially updated in the event you want to save out your data. This method involves using a rasterio object called Window to access your subset instead of indexing it directly.

The steps to this method are:

  1. create the window object

  2. read the data from src using the window

Creating the Window object#

Sytnax #1#

The syntax for a the subset chunk in rasterio is Window and the sytnax to use the Window object looks like this:

Window(COLUMN_OFFSET, ROW_OFFSET, WIDTH, HEIGHT)

The OFFSETs specify the row and column numbers of the upper left corner of your window.

So getting the 1000-1800th row and 200-420th column with a window object would look like Window(200, 1000, 220, 800).

from rasterio.windows import Window
Window(200, 1000, 220, 800)
Window(col_off=200, row_off=1000, width=220, height=800)

Syntax #2#

Another way to create the window is to use the .from_slices() method.

Window.from_slices((ROW_START, ROW_STOP), (COLUMN_START, COLUMNS_STOP))

Window.from_slices((1000, 1800), (200, 420))
Window(col_off=200, row_off=1000, width=220, height=800)

These two are equivalent ways to do the same thing, so we can can see in the output that this gives us the same window as above.

Reading the data with the Window#

To read the data out we use our regular src.read() method but this time we specify window.

import rasterio
with rasterio.open(filepath_rad, 'r') as src:
    my_window = Window(200, 1000, 220, 800)
    window_data = src.read(window=my_window)
window_data.shape
(224, 0, 175)

📝 Check your understanding

If using the syntax

Window(COLUMN_OFFSET, ROW_OFFSET, WIDTH, HEIGHT)

how many pixels would the code Window(15, 30, 10, 10) return?

Language Agnostic: What’s my workflow?#

At this point you’ve seen how to do many tasks in ENVI and in Python. For example:

  • viewing specific pixel reflectance values

  • getting a z transect

  • statistics on the full raster

  • statistics on a subset of a raster

  • defining an ROI

When it comes time for you to do this in your reseach, how might you choose which tool to use?

Questions to consider#

  • What would the steps look like in the different options?

  • What are the technical pros/cons to each option?

  • Which options do I know exactly what to do and in which options would I need to be experimenting?

  • Is there a skill I am interested in gaining that makes it worth extra time?

  • Who are the resources I have to support me if I get stuck?

Coding in Python in notebooks#

Pros

  • highly customizable, most flexibilty

  • easily repeated if scalaing is needed

Cons

  • slower

  • visualizations are not interactive