Spatial Operations and rio#

Today we are going to use the spatial background we learned yesterday and apply it to dataset exploration of an AVIRIS image. We are going to use the rio module of xarray.

# Libraries for image visualization
from IPython.display import Image
from IPython.core.display import HTML 
import xarray as xr

First let’s open the datafile. We are going to use the argument engine='rasterio' because we are using an ENVI file type. This is also the recommended method for opening TIF files.

aviris_ds = xr.open_dataset('./data/subset_f131205t01p00r10rdn_e_sc01_ort_img', engine='rasterio')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 aviris_ds = xr.open_dataset('./data/subset_f131205t01p00r10rdn_e_sc01_ort_img', engine='rasterio')

File ~/miniconda3/envs/sarp/lib/python3.10/site-packages/xarray/backends/api.py:556, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    553 if from_array_kwargs is None:
    554     from_array_kwargs = {}
--> 556 backend = plugins.get_backend(engine)
    558 decoders = _resolve_decoders_kwargs(
    559     decode_cf,
    560     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    566     decode_coords=decode_coords,
    567 )
    569 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)

File ~/miniconda3/envs/sarp/lib/python3.10/site-packages/xarray/backends/plugins.py:205, in get_backend(engine)
    203     engines = list_engines()
    204     if engine not in engines:
--> 205         raise ValueError(
    206             f"unrecognized engine {engine} must be one of: {list(engines)}"
    207         )
    208     backend = engines[engine]
    209 elif isinstance(engine, type) and issubclass(engine, BackendEntrypoint):

ValueError: unrecognized engine rasterio must be one of: ['netcdf4', 'scipy', 'store']
# Extract just the band DataArray from the dataset
aviris = aviris_ds['band_data']
aviris
<xarray.DataArray 'band_data' (band: 224, y: 300, x: 100)>
[6720000 values with dtype=float32]
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 ... 218 219 220 221 222 223 224
    xc           (y, x) float64 5.777e+05 5.777e+05 ... 5.814e+05 5.814e+05
    yc           (y, x) float64 4.151e+06 4.151e+06 ... 4.148e+06 4.148e+06
    spatial_ref  int64 0
Dimensions without coordinates: y, x
Attributes:
    long_name:                 ('Band 1', 'Band 2', 'Band 3', 'Band 4', 'Band...
    bands:                     224
    band_names:                Band 1,Band 2,Band 3,Band 4,Band 5,Band 6,Band...
    byte_order:                0
    coordinate_system_string:  PROJCS["unnamed",GEOGCS["GCS_WGS_1984",DATUM["...
    data_type:                 2
    description:               ./output_data/subset_f131205t01p00r10rdn_e_sc0...
    file_type:                 ENVI Standard
    header_offset:             0
    interleave:                bsq
    lines:                     300
    samples:                   100

When we are opening TIF or ENVI files using the rasterio engine is that we get an additional set of attributes and methods that give us information about the image. We access this using .rio

print(aviris.rio.bounds())  # bounding box
print(aviris.rio.crs)  # coordinate reference system
print(aviris.rio.height)  # number of pixels tall
print(aviris.rio.width)  # number of pixels wide
print(aviris.rio.nodata)  # a nodata fill value, if one has been set
print(aviris.rio.transform())  # A CRS property called an affine transformation
(577666.150342955, 4155829.70969225, 579276.150342955, 4150999.70969225)
EPSG:32610
300
100
None
| 14.08, 7.81, 577666.15|
| 7.81,-14.08, 4150999.71|
| 0.00, 0.00, 1.00|

These metadata can be really helpful for orienting you to your data.

One field I want to make a note about is the nodata field. The nodata value is sometimes not set, but that doesn’t mean that there isn’t a nodata value in the dataset. There still may be a nodata value present in the dataset even if one is not set.

📝 Check your understanding

Look at the EPSG code of this dataset. Reference the figure below from the spatial data lecture to figure out approximately where in the world the dataset is located. Is the image in the northern or southern hemisphere?

Image(url= "http://www.dmap.co.uk/utmworld.gif")

Although the EPSG code is useful for general orientation and an overall sanity check, we usually want to know more specifically where our data is located. We can do this by extracting the bounds and converting the coordinates to EPSG 4326.

The general syntax for that is:

t = Transformer.from_crs(INPUT_PROJ, OUTPUT_PROJ, always_xy=True).transform
transform(t, SHAPELY_POINT)

Where everything in ALL_CAPS is a variable you will be inserting yourself.

Here’s an example

from pyproj import Transformer
from shapely.ops import transform
# Get the bounding box
bbox = aviris.rio.bounds()
from shapely.geometry import box
# Create a shapely object with the bounding box
bbox_geom = box(bbox[0], bbox[3], bbox[2], bbox[1])
# Consider this as a copy-and-paste able code snippet that you can use if you ever need to
# transform coordinates
t = Transformer.from_crs('epsg:32610', 'epsg:4326', always_xy=True).transform
bbox_4326 = transform(t, bbox_geom)
# View the output as a wkt
bbox_4326.wkt
'POLYGON ((-122.10308958299767 37.50252918542879, -122.10256826050976 37.5460596647619, -122.12079199682005 37.54619677057729, -122.12130273640575 37.50266607712935, -122.10308958299767 37.50252918542879))'