Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example Zarr file containing spatiotemporal remote sensing imagery #10

Closed
kapadia opened this issue Mar 7, 2023 · 3 comments
Closed

Comments

@kapadia
Copy link

kapadia commented Mar 7, 2023

Per request from our last meeting, I’m sharing a Zarr file containing spatiotemporal observations of remote sensing imagery. Following GDAL's Zarr documentation, this file contains a _CRS attribute in WKT format. Additionally, an affine transform is also appended allowing for pixel to world coordinate conversions. Following GDAL’s current behavior, these attributes are stored on the Zarr Array rather than the Zarr Group.

https://storage.googleapis.com/pl-amit-public/geozarr/planet-fusion.zarr.zip

Two issues were observed in this process:

  • Running gdalinfo on the prepared file fails
    • This may be due to my installation of GDAL which lacks support for blosc
  • Opening a Zarr file created by gdal_translate results in a UnicodeDecodeError when opening with xarray or accessing attributes with python-zarr

If you'd like to observe the second issue, it can be reproduced using the following commands.

gdal_translate /vsicurl/https://github.com/rasterio/rasterio/blob/4c01600b0e8c8841c661cbaf1297ac52f601fc3a/tests/data/RGB.byte.tif?raw=true RGB.byte.zarr -of Zarr
python -c "import xarray; xarray.open_zarr('RGB.byte.zarr')"

I hope the file and these observations are useful to this group.

Edit:

This is the metadata associated with each array in the file above

<xarray.DataArray 'sr' (datetime: 365, band: 4, row: 256, col: 256)>
dask.array<open_dataset-e9e64732163986119bc61b23a21b10a4sr, shape=(365, 4, 256, 256), dtype=int16, chunksize=(365, 4, 32, 32), chunktype=numpy.ndarray>
Coordinates:
  * band      (band) <U5 'blue' 'green' 'red' 'nir'
  * datetime  (datetime) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Dimensions without coordinates: row, col
Attributes:
    _CRS:        PROJCS["WGS 84 / UTM zone 14N",GEOGCS["WGS 84",DATUM["WGS_19...
    _TRANSFORM:  [3.0, 0.0, 696000.0, 0.0, -3.0, 4536000.0]
@rabernat
Copy link

rabernat commented Mar 29, 2023

I was able to open this Zarr file successfully using Xarray, with no errors. I'm on Zarr v2.14.2 and Xarray v2023.2.0.

<xarray.Dataset>
Dimensions:  (Y: 718, X: 791)
Coordinates:
  * X        (X) float64 1.021e+05 1.024e+05 1.027e+05 ... 3.389e+05 3.392e+05
  * Y        (Y) float64 2.827e+06 2.826e+06 2.826e+06 ... 2.612e+06 2.612e+06
Data variables:
    Band1    (Y, X) float32 ...
    Band2    (Y, X) float32 ...
    Band3    (Y, X) float32 ...

@kapadia
Copy link
Author

kapadia commented Apr 26, 2023

After discussion around this example, here is the code used to generate the example file:

import requests

import pandas as pd
import numpy as np
import dask.array as da
import dask
from dask.distributed import Client
import zarr
from numcodecs import Zstd
import xarray as xr
import rasterio as rio
from rasterio.windows import Window


def read_raster(uri, window):

    with rio.open(uri) as src:
        arr = src.read(window=window)

    return arr


def read_stac_item(uri):
    r = requests.get(uri)
    feature = r.json()

    datetime = pd.to_datetime(feature['properties']['datetime'])
    asset = feature['assets']['sr']
    bands = [item['common_name'] for item in asset['eo:bands']]
    band_count = len(bands)
    image_uri = asset['href']
    height = 256
    width = 256
    dtype = np.int16
    window = Window(col_off=0, row_off=0, width=width, height=height)

    image_arr = da.from_delayed(
        dask.delayed(read_raster)(image_uri, window),
        shape=(band_count, height, width),
        dtype=dtype)

    return xr.DataArray(image_arr[None, :],
        coords=dict(
            datetime=[datetime],
            band=bands
        ),
        dims=["datetime", "band", "row", "col"],
    )


def main():

    client = Client(processes=False)

    base_url = "https://www.planet.com/data/stac/fusion/14N/29E-188N"
    collection_url = f"{base_url}/collection.json"
    r = requests.get(collection_url)
    collection = r.json()
    item_uris = [
        f"{base_url}/{link['href']}" for link in collection['links'] if link['rel'] == 'item']

    # Probe one image to get CRS
    r_item = requests.get(item_uris[0])
    image_uri = r_item.json()['assets']['sr']['href']
    with rio.open(image_uri) as src:
        crs = src.crs.to_wkt()
        transform = list(src.transform)[0:6]

    futures = client.map(read_stac_item, item_uris)
    arrays = client.gather(futures)
    ds = xr.concat(arrays, dim='datetime', combine_attrs="identical").chunk(
        datetime=-1, band=-1, row=32, col=32)
    ds['datetime'] = pd.to_datetime(ds.datetime)
    ds.attrs = {
        **ds.attrs,
        "_CRS": crs,
        "_TRANSFORM": transform
    }

    dataset = xr.Dataset({
        "sr": ds
    })

    destination = "planet-fusion.zarr"
    dataset.to_zarr(
        destination,
        encoding={
            variable: {
                "compressor": zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
            } for variable in dataset
    })

main()

@kapadia
Copy link
Author

kapadia commented May 10, 2023

A new file is available that uses the proposal in #19.

  • This changes the the transform attribute from _TRANSFORM to _GeoTransform

https://storage.googleapis.com/pl-amit-public/geozarr/planet-fusion-20230510.zarr.zip

In [1]: import xarray as xr

In [2]: ds = xr.open_zarr('planet-fusion.zarr')

In [3]: ds
Out[3]:
<xarray.Dataset>
Dimensions:   (band: 4, datetime: 365, row: 256, col: 256)
Coordinates:
  * band      (band) <U5 'blue' 'green' 'red' 'nir'
  * datetime  (datetime) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Dimensions without coordinates: row, col
Data variables:
    sr        (datetime, band, row, col) int16 dask.array<chunksize=(365, 4, 32, 32), meta=np.ndarray>

In [4]: ds.sr
Out[4]:
<xarray.DataArray 'sr' (datetime: 365, band: 4, row: 256, col: 256)>
dask.array<open_dataset-78e64952e25134fc0fcda392329a7e9bsr, shape=(365, 4, 256, 256), dtype=int16, chunksize=(365, 4, 32, 32), chunktype=numpy.ndarray>
Coordinates:
  * band      (band) <U5 'blue' 'green' 'red' 'nir'
  * datetime  (datetime) datetime64[ns] 2021-01-01 2021-01-02 ... 2021-12-31
Dimensions without coordinates: row, col
Attributes:
    GeoTransform:  [3.0, 0.0, 696000.0, 0.0, -3.0, 4536000.0]
    _CRS:          PROJCS["WGS 84 / UTM zone 14N",GEOGCS["WGS 84",DATUM["WGS_...

In [5]: xr.__version__
Out[5]: '2023.4.2'

cc. @briannapagan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants