Accessing Data#
Lesson Content
Accessing raster coordinates
Affine Transformations
Subsetting data with
Window
Context#
Today we are going to revisit our AVIRIS dataset and look again at the metadata. This time instead of focusing on properties like height and width we are going to focus on the spatial metadata, like crs
bounds
and transform
. We will also created some numpy masked arrays to start considering nodata values or ROIs.
Accessing Raster Coordinates#
So to figure out where a particular point is on the earth we need two things:
the EPSG code to tell us which grid we are looking at
our coordinates
Example with AVIRIS#
Since our AVIRIS data uses UTM grids our coordinates will be an easting and a northing.
import rasterio
filepath_rad = './data/subset_f180628t01p00r02_corr_v1k1_img'
with rasterio.open(filepath_rad, 'r') as src:
bbox = src.bounds
src_crs = src.crs
print('crs is ', src_crs)
print('bbox is ', bbox)
crs is EPSG:32610
bbox is BoundingBox(left=769855.79, bottom=3751257.6, right=774030.79, top=3757520.1)
Looking at the output of the above cell we see that the EPSG code is 32610
. The 6 in this code tells us that we are looking at the northern hemisphere, and the 10
tells us that we are looking at UTM zone 10.
Reprojecting a vector to epsg:4326#
Depending on why you want these coordinates the easting/northing still might not be very useful unless we can get them to latitude and longitude. Reprojecting a whole raster is a multistep process, but in the second part of the practice yesterday we saw how to reproject a vector object, like a point or a polygon.
from pyproj import Transformer
from shapely.ops import transform
Transformer.from_crs(INPUT_PROJ, OUTPUT_PROJ, always_xy=True)
t = Transformer.from_crs('epsg:4326', 'epsg:3857', always_xy=True).transform
transform(t, SHAPELY_POINT)
Example 1#
Reproject a point
Example 2#
Reproject the bounding box
📝 Check your understanding
How can we figure out where our raster is located in space?
Affine Transformations#
We can get the coordinates of the corners of our bounding box from the metadata, but what if we want the coordinates of some other location on our raster? We know how to find a pixel value using rows and columns, but what about using coordinates?
Affine transformations are the way that we move from row, column notation (pixel space) to the notation of a coordinate reference system (usually either latitude & longitude or easting & northing)
My favorite affine transformation article is this one written by Matthew Perry.
Here is what an affine tranformation looks like:
with rasterio.open(filepath_rad, 'r') as src:
print(src.transform)
| 0.00, 16.70, 769855.79|
| 16.70,-0.00, 3751257.60|
| 0.00, 0.00, 1.00|
Where each number means something different:
a |
b |
c |
d |
e |
f |
0 |
0 |
1 |
a = width of a pixel in units of the crs
b = row rotation (typically zero)
c = x-coordinate of the upper-left corner of the upper-left pixel
d = column rotation (typically zero)
e = height of a pixel in units of the crs (typically negative)
f = y-coordinate of the of the upper-left corner of the upper-left pixel
The three numbers in the bottom row are always 0, 0, 1 (since we are working on a 2 dimensional plane).
So looking again at our affine transform from above we see that the size of an single pixel is 17.10 meters square and that the upper left corner of the image is located at 475,785.77, 3,350,578.50.
Using the affine to get the coordinates of a particular grid cell#
The affine transform gives us some information just by looking at it, but it can also be used to convert a row, column coordiante to a grid coordinate.
# Define which pixel we want to convert
row, col = 40, 567
# Extract the affine transform
with rasterio.open(filepath_rad, 'r') as src:
affine = src.transform
# Use the transform to convert our input pixel row/column to coordinates
x, y = affine * (col, row)
print('pixel ', row, col, ' corresponds to easting, northing of ', x, y)
pixel 40 567 corresponds to easting, northing of 770523.79 3760726.5
Using the inverse affine to get the pixel of a coordinate#
If you have a grid coordinate and want to find out what the row, column location is you need the inverse affine transform. The inverse transform is calculated with the ~
operator.
# Define the coordinates we want to convert
xcoord, ycoord = 475900, 3377000
# Extract the affine transform
with rasterio.open(filepath_rad, 'r') as src:
affine = src.transform
# Use the transform to convert our input coordinates to pixel row/column
# col, row = ~a * (x, y)
col, row = ~affine * (xcoord, ycoord)
print('row/column of my coordinates: ', row, col)
row/column of my coordinates: -17602.143113772454 -22410.634730538935
If you get a negative number here or your pixel row/column are bigger than the height or width of your raster it means you have asked for the pixel location of a point which is not located in your raster.
with rasterio.open(filepath_rad, 'r') as src:
print('shape of my raster ', src.height, src.width)
shape of my raster 250 375
📝 Check your understanding
What is the purpose of an affine transform?
Subsetting Data#
Subsetting your data refers to selecting some amount of data to work with that is less than the whole image. In Python I see two main ways to do this:
manually by dropping data from your numpy array
using rasterio’s windowed reading methods
The manual option is simplier, but has the limitations that
it is a little harder to save the output data to a new file
you still have to read the whole dataset into memory
The rasterio option doesn’t have these limitations, but is a bit more complicated.
Memory vs. Storage#
Memory |
Storage |
|
---|---|---|
Definition |
how much data you can have open at one time |
how much data you can have saved on your computer |
Persistence |
goes away when you close the program |
stays unless you delete it |
Size |
volume in 10s (Ex. 32 GB RAM) |
volume in hundreds (ex. 256 GB) |
Using Indexes to subset data#
filepath_rad = './data/subset_f180628t01p00r02_corr_v1k1_img'
with rasterio.open(filepath_rad, 'r') as src:
full_dataset = src.read()
Using Window
to subset data#
A lot of the code from this section comes from the rasterio docs.
Windowed reading requires a bit more work up front, but it allows you to keep your transform geospatially updated in the event you want to save out your data. This method involves using a rasterio object called Window
to access your subset instead of indexing it directly.
The steps to this method are:
create the window object
read the data from
src
using the window
Creating the Window
object#
Sytnax #1#
The syntax for a the subset chunk in rasterio is Window
and the sytnax to use the Window
object looks like this:
Window(COLUMN_OFFSET, ROW_OFFSET, WIDTH, HEIGHT)
The OFFSETs specify the row and column numbers of the upper left corner of your window.
So getting the 1000-1800th row and 200-420th column with a window object would look like Window(200, 1000, 220, 800)
.
from rasterio.windows import Window
Window(200, 1000, 220, 800)
Window(col_off=200, row_off=1000, width=220, height=800)
Syntax #2#
Another way to create the window is to use the .from_slices()
method.
Window.from_slices((ROW_START, ROW_STOP), (COLUMN_START, COLUMNS_STOP))
Window.from_slices((1000, 1800), (200, 420))
Window(col_off=200, row_off=1000, width=220, height=800)
These two are equivalent ways to do the same thing, so we can can see in the output that this gives us the same window as above.
Reading the data with the Window
#
To read the data out we use our regular src.read()
method but this time we specify window
.
import rasterio
with rasterio.open(filepath_rad, 'r') as src:
my_window = Window(200, 1000, 220, 800)
window_data = src.read(window=my_window)
window_data.shape
(224, 0, 175)
📝 Check your understanding
If using the syntax
Window(COLUMN_OFFSET, ROW_OFFSET, WIDTH, HEIGHT)
how many pixels would the code Window(15, 30, 10, 10)
return?
Language Agnostic: What’s my workflow?#
At this point you’ve seen how to do many tasks in ENVI and in Python. For example:
viewing specific pixel reflectance values
getting a z transect
statistics on the full raster
statistics on a subset of a raster
defining an ROI
When it comes time for you to do this in your reseach, how might you choose which tool to use?
Questions to consider#
What would the steps look like in the different options?
What are the technical pros/cons to each option?
Which options do I know exactly what to do and in which options would I need to be experimenting?
Is there a skill I am interested in gaining that makes it worth extra time?
Who are the resources I have to support me if I get stuck?
Coding in Python in notebooks#
Pros
highly customizable, most flexibilty
easily repeated if scalaing is needed
Cons
slower
visualizations are not interactive