Fixing common data/metadata issues

QGreenland Researcher Workshop 2023

Common data/metadata issues?

It’s common for data to be missing metadata. We often see:

  • CRS information missing
  • Flag values (e.g. NoData) not set
  • In a format without metadata support

It’s also not uncommon for data or metadata to be incorrect or in conflict with the dataset’s user guide. Fixing these issues requires thought and judgment calls.

What tool should I use?

The best one for the job. Explore the alternatives available in the ecosystem you want to work!

  • GUI-based GIS tools (QGIS!) are useful for visualization of data, especially in comparison with other layers like a basemap.
  • Command-line tools are especially useful for getting a quick answer.
  • Language-specific (e.g., Python) tools are good for automations or research code.

GDAL/OGR drivers

It’s worth looking at the list of drivers for your datatype and reading the documentation. We’ll revisit this in a later slide.

Data scenario: Raster missing geospatial metadata

View the full scenario

QGreenland screenshot of layer with missing metadata
gdalinfo dem_without_metadata.tif
Driver: GTiff/GeoTIFF
Files: dem_without_metadata.tif
Size is 301, 561
Metadata:
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  561.0)
Upper Right (  301.0,    0.0)
Lower Right (  301.0,  561.0)
Center      (  150.5,  280.5)
Band 1 Block=301x6 Type=Float32, ColorInterp=Gray

Raster missing geospatial metadata: Solution

Use gdal_translate to edit the metadata with the information provided to us in the mock dataset landing page.

gdal_translate \
  -a_srs "+proj=stere +lat_0=90 +lat_ts=71 +lon_0=-39 +x_0=0 +y_0=0 +a=6378137 +rf=298.257024882273 +units=m +no_defs" \
  -a_ullr -802500.000 -597500.000 702500.000 -3402500.000 \
  dem_without_metadata.tif dem_with_metadata.tif

Raster missing geospatial metadata: All better!

gdalinfo dem_with_metadata.tif
Driver: GTiff/GeoTIFF
Files: dem_with_metadata.tif
Size is 301, 561
Coordinate System is:
PROJCRS["unknown",
    BASEGEOGCRS["unknown",
        DATUM["unknown",
            ELLIPSOID["unknown",6378137,298.257024882273,
                LENGTHUNIT["metre",1,
                    ID["EPSG",9001]]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433,
                ID["EPSG",9122]]]],
    CONVERSION["Polar Stereographic (variant B)",
        METHOD["Polar Stereographic (variant B)",
            ID["EPSG",9829]],
        PARAMETER["Latitude of standard parallel",71,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8832]],
        PARAMETER["Longitude of origin",-39,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8833]],
        PARAMETER["False easting",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",south,
            MERIDIAN[90,
                ANGLEUNIT["degree",0.0174532925199433,
                    ID["EPSG",9122]]],
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",south,
            MERIDIAN[180,
                ANGLEUNIT["degree",0.0174532925199433,
                    ID["EPSG",9122]]],
            ORDER[2],
            LENGTHUNIT["metre",1]]]
Data axis to CRS axis mapping: 1,2
Origin = (-802500.000000000000000,-597500.000000000000000)
Pixel Size = (5000.000000000000000,-5000.000000000000000)
Metadata:
  AREA_OR_POINT=Area
  TIFFTAG_XRESOLUTION=1
  TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( -802500.000, -597500.000) ( 92d19'49.93"W, 80d48'38.29"N)
Lower Left  ( -802500.000,-3402500.000) ( 52d16'15.67"W, 58d36'12.78"N)
Upper Right (  702500.000, -597500.000) ( 10d37' 3.76"E, 81d31'36.98"N)
Lower Right (  702500.000,-3402500.000) ( 27d20' 3.47"W, 58d47'17.98"N)
Center      (  -50000.000,-2000000.000) ( 40d25'55.55"W, 71d44'12.84"N)
Band 1 Block=301x6 Type=Float32, ColorInterp=Gray

Raster missing geospatial metadata: All better!

QGreenland screenshot of layer with fixed metadata

Data scenario: Vector data needs reformatting

View the full scenario

xlsx vector data
ogrinfo -al -so \
  kcbcc_DS4_final_v2_final.xlsx
INFO: Open of `kcbcc_DS4_final_v2_final.xlsx'
      using driver `XLSX' successful.

Layer name: Sheet1
Geometry: None
Feature Count: 12
Layer SRS WKT:
(unknown)
x: Real (0.0)
y: Real (0.0)
cuteness_rating: Real (0.0)
blood_acetone_grams_per_ml: Real (0.0)
blood_acetylcholine_grams_per_ml: Real (0.0)

Vector data needs reformatting: Solution

The gdal VRT driver can be used.

Add a kcbcc_DS4_final_v2_final.vrt with the following content:

<OGRVRTDataSource>
    <OGRVRTLayer name="Sheet1">
        <SrcDataSource>kcbcc_DS4_final_v2_final.xlsx</SrcDataSource>
        <SrcLayer>Sheet1</SrcLayer>
        <GeometryType>wkbPoint</GeometryType>
        <LayerSRS>EPSG:4326</LayerSRS>
        <GeometryField encoding="PointFromColumns" x="x" y="y" reportSrcColumn="FALSE" />
    </OGRVRTLayer>
</OGRVRTDataSource>

Then, use ogr2ogr:

ogr2ogr \
  -nln "kcbcc" \
  output.geojson \
  kcbcc_DS4_final_v2_final.vrt

Vector data needs reformatting: All better!

cat output.geojson | jq
{
  "type": "FeatureCollection",
  "name": "kcbcc",
  "crs": {
    "type": "name",
    "properties": {
      "name": "urn:ogc:def:crs:OGC:1.3:CRS84"
    }
  },
  "features": [
    {
      "type": "Feature",
      "properties": {
        "cuteness_rating": 9.226,
        "blood_acetone_grams_per_ml": 1.945e-05,
        "blood_acetylcholine_grams_per_ml": 7.22e-08
      },
      "geometry": {
        "type": "Point",
        "coordinates": [
          -63.3070952173288,
          76.9630048613197
        ]
      }
    },
    ...<clipped for brevity>...
  ]
}

Vector data needs reformatting: All better!

ogrinfo -al -so output.geojson
INFO: Open of `output.geojson'
      using driver `GeoJSON' successful.

Layer name: kcbcc
Geometry: Point
Feature Count: 12
Extent: (-63.307095, 64.081793) - (-22.181734, 76.963005)
Layer SRS WKT:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
cuteness_rating: Real (0.0)
blood_acetone_grams_per_ml: Real (0.0)
blood_acetylcholine_grams_per_ml: Real (0.0)

Vector data needs reformatting: All better!

QGreenland screenshot of reformatted vector layer

What if I can not find a way to fix my data?

Metadata may be undocumented or incorrectly documented; contact data producer!

Warning

It’s possible nobody living knows how to fix the problem!

See continued learning to learn more about how georegistration could help.

Exercise

💪 Fixing an issue with data/metadata

References

Bamber, J. 2001. “Greenland 5 Km DEM, Ice Thickness, and Bedrock Elevation Grids, Version 1.” NASA National Snow; Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/01A10Z9BM7KP.