Solution: Raster data needs reprojection

📂 Open the source data (in EPSG:3413)

%matplotlib widget
# Set up matplotlib interactive "magic"
import rioxarray
import matplotlib.pyplot as plt
import numpy as np
from rasterio.enums import Resampling
from cartopy import crs as ccrs

epsg_3413 = rioxarray.open_rasterio('../dem_with_altered_metadata.tif')
epsg_3413
<xarray.DataArray (band: 1, y: 561, x: 301)>
[168861 values with dtype=float32]
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -8e+05 -7.95e+05 -7.9e+05 ... 6.95e+05 7e+05
  * y            (y) float64 -6e+05 -6.05e+05 -6.1e+05 ... -3.395e+06 -3.4e+06
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:        Area
    TIFFTAG_XRESOLUTION:  1
    TIFFTAG_YRESOLUTION:  1
    scale_factor:         1.0
    add_offset:           0.0
    long_name:            band_data

Plot the source data

plt.figure()
plot = plt.imshow(epsg_3413[0])

plot.axes.set_title('EPSG:3413')
plt.colorbar()
plt.show()

🌐 Reproject the dataset

epsg_4326 = epsg_3413.rio.reproject(
    "EPSG:4326",
    resampling=Resampling.bilinear,
)

Plot it again!

plt.figure(figsize=(10, 2))

plot = plt.imshow(epsg_4326[0])
plot.axes.set_title('EPSG:4326')
plt.colorbar()
plt.show()
/home/mfisher/.miniconda3/envs/geotools/lib/python3.11/site-packages/matplotlib/image.py:505: RuntimeWarning: overflow encountered in multiply
  A_resampled *= ((a_max - a_min) / frac)

⁉️ What’s going on?

This result is clearly wrong. What is this shape? Why is the data “gone”, and what does the yellow color represent?

Let’s take a look at some of these values to get a better idea of what we are seeing.

print(f"===Pre-reprojection===")
print(f"Shape: {epsg_3413[0].shape}")
print(f"Top-left corner value: {epsg_3413[0][0][0].data}")
print(f"Metadata NoData value: {epsg_3413[0].rio.nodata}")
print()
print(f"===Post-reprojection===")
print(f"Shape: {epsg_4326[0].shape}")
print(f"Top-left corner value: {epsg_4326[0][0][0].data}")
print(f"Top-center value: {epsg_4326[0][0][int(epsg_4326[0].shape[1] / 2)].data}")
print(f"Metadata NoData value: {epsg_4326[0].rio.nodata}")
===Pre-reprojection===
Shape: (561, 301)
Top-left corner value: 9999.0
Metadata NoData value: None

===Post-reprojection===
Shape: (240, 955)
Top-left corner value: 3.4028234663852886e+38
Top-center value: 9999.0
Metadata NoData value: 3.4028234663852886e+38
/home/mfisher/.miniconda3/envs/geotools/lib/python3.11/site-packages/rioxarray/raster_writer.py:130: UserWarning: The nodata value (3.402823466e+38) has been automatically changed to (3.4028234663852886e+38) to match the dtype of the data.
  warnings.warn(

We are seeing the shape of the extent of the original data in purple. The original rectangular shape had to be warped to align to the geographic projection grid, creating this “funnel” shape. Outside that area, the yellow color is a very large number being used as a NoData value.

It turns out that your colleagues’ modification to this dataset was to set the NoData pixels to 9999, but they forgot to correctly specify that 9999 is this dataset’s NoData flag value in the metadata.

Above, we can see that 9999 is being used as the NoData value for the source dataset by inspecting the top-left pixel. We can also see that the NoData value is not set in the metadata. After reprojecting we have a new NoData value (the maximum possible Float32 value) being added and being set in the metadata.

expected_nodata = 9999

🥸 Masking the NoData values

We expect the NoData value to be 9999, so let’s mask it out.

epsg_4326_da = epsg_4326[0]
epsg_4326_da_masked_9999 = epsg_4326_da.where(epsg_4326_da != expected_nodata)

Plot!

plt.figure(figsize=(10, 2))
plot = plt.imshow(epsg_4326_da_masked_9999, interpolation='nearest')
plot.axes.set_title('EPSG:4326')
plot.axes.grid()
plt.colorbar()
plt.show()

⁉️ What’s going on?

We’ve successfully masked out the original NoData values, but that only applied to the NoData values used in the original dataset. This plot is showing two NoData values: 9999 from the original dataset, and the maximum Float32 value for pixels outside of the original dataset.

We can mask those out, too:

epsg_4326_da_masked_all_nodata = epsg_4326_da_masked_9999.where(epsg_4326_da_masked_9999 <= expected_nodata)
plt.figure(figsize=(10, 2))
plot = plt.imshow(epsg_4326_da_masked_all_nodata)
plot.axes.set_title('EPSG:4326')
plot.axes.grid()
plt.colorbar()
plt.show()

At a glance, things look OK, but on closer inspection this has uncovered another issue! Zoom in on the coastline of Greenland. The edge pixels should not have higher values than interior pixels; the ice sheet is thinner at the edges. These edge pixels were interpolated during the original reprojection because the NoData value was not appropriately set – normally, NoData pixels are not used during interpolation. Looks like we need to go back even further to fix this, rather than masking the problem.

🎉 Reproject the dataset; 2nd try

First, correctly set the NoData value on the source and mask out the nodata values. Then reproject.

# Replace the nodata value 
epsg_3413_fixed = epsg_3413.where(epsg_3413 != expected_nodata)
epsg_3413_fixed = epsg_3413_fixed.rio.write_nodata(expected_nodata, encoded=True)
print(f"NoData encoded as: {epsg_3413_fixed.rio.encoded_nodata}")

epsg_4326_fixed = epsg_3413_fixed.rio.reproject(
    "EPSG:4326",
    resampling=Resampling.bilinear,
)
NoData encoded as: 9999.0
plt.figure(figsize=(10, 2))
plot = plt.imshow(epsg_4326_fixed[0])
plot.axes.set_title('EPSG:4326')
plot.axes.grid()
plt.colorbar()
plt.show()