Assignment 03

Kriging interpolation

Deadline is 2022-12-20 17:00

Late submission? 10% will be removed for each day that you are late.

This is an individual assignment.



The aim of this assignment is to put the kriging theory (Chapter 6) you learned in practice. For this, you will use rasterio to read an input data set and Pyinterpolate to create an experimental variogram, apply kriging and IDW and compare your results with the different methods.

You should show the results of your work in a simple report with some plots. In order to create these plots, I suggest you use either: (i) rasterio to write your results and QGIS to display them, or (ii) matplotlib to plot them directly. If you want to use something else to create them, it’s also fine.

Getting ready

After hw02, you should already have rasterio installed, so you only need to install Pyinterpolate:

pip install pyinterpolate

Since matplotlib is a Pyinterpolate dependency, it should be installed automatically. The sames applies to other dependencies, like numpy. However, bear in mind that the specific versions of the libraries installed by pip install pyinterpolate have issues with Python 3.11, so please stick to Python 3.7 to 3.10.

If you need help reading a dataset with rasterio, check its Python Quickstart tutorial.

For matplotlib, you’ll likely want to use the scatter or imshow plots.

Basically all of the kriging code you need is already in the Pyinterpolate tutorials (but you do need to figure out the relevant parts and adjust them to work with your data). In particular, I advise you to have a quick look at these 4 tutorials:

  1. Variogram Point Cloud
  2. Semivariogram estimation
  3. Ordinary and Simple Kriging
  4. How good is my Kriging model? - tests with IDW algorithm

Finally, you should download a 30m SRTM tile (we will discuss the SRTM dataset in lesson 5.2) of a region of your interest (eg your hometown or a hilly region of your country) through this website and use it for your assignment. Choose a region that isn’t too flat, because otherwise it will be difficult to adequately compare and evaluate the different methods (since fitting a plane is easy for any method).

What to do?

First, you will do some preprocessing of your data. Because the SRTM tiles are in lat-long (EPSG:4326 - WGS 84) with 1-arcsecond resolution, you should reproject your dataset to an adequate CRS with uniform units (eg meters). For the purposes of this assignment, you can use something like a properly selected UTM zone or the national CRS for the area of your dataset. You can do this reprojection in the software of your choice, but my suggestion is to use QGIS or GDAL.

Then, load your reprojected input dataset into memory using rasterio. Since the entire SRTM tile is a bit too large to use with kriging in Python, you will extract two subsets of points:

  1. A random sample of 1000 points, which you will treat as the data points to be interpolated. Because the SRTM data you downloaded has already been cleaned, we will simulate the raw measurements by adding some noise. For this, you will add normally distributed noise with a standard deviation of 10 meters to the x and y axes and of 16 meters to the z axis of each point. I took these numbers from the SRTM specs. The numpy function random.normal will help you to do this. Note: the noise along each axes shouldn’t be the same, so generate it independently!

  2. Another random sample of 10000 points, which you will use as the ground truth to compare to the value of your interpolation at that location. Don’t add any noise to this dataset.

Visualise both of these datasets to get a better understanding of them. Tip: be careful with how you will handle points with NODATA values in your input. You don’t want to use these in your interpolation.

Then, use your two subsets in Pyinterpolate and do the following steps:

  1. view the variogram cloud,
  2. create the experimental variogram,
  3. obtain a theoretical variogram function that fits your data well (check the different models available in Pyinterpolate and compute a good nugget, sill and range for them),
  4. apply simple kriging (using your theoretical variogram function) on your 1000 data points to interpolate the value at the 10000 ground truth point locations,
  5. apply ordinary kriging (using your theoretical variogram function) on your 1000 data points to interpolate the value at the 10000 ground truth point locations,
  6. apply IDW on your 1000 data points to interpolate the value at the 10000 ground truth point locations, and
  7. compare the results of the 3 interpolation methods with respect to the 10000 ground truth point values.

As in the Pyinterpolate tutorials, test different values for the number of neighbours in simple kriging and ordinary kriging, as well as different powers for IDW.

Report and marking

In order to show your work, you are asked to prepare a simple report outlining what you did. The report should contain the 10 items listed in the table below, but there’s no need to make it long. Each item can be summarised in one short paragraph, some key Python commands, and one plot (with one-sentence caption). For each plot, include a short caption to help interpret what is visible/interesting in it.

For the results section of each interpolation method (SK, OK and IDW), make sure you include a numerical measure of how good it is (eg RMSE).

Finally, in the last section of the report, analyse all of these numbers and compare them to each other. Provide your assessment of the pros and cons of the different methods.

Criterion Points
reprojection of chosen tile 1
creation of the 1000 data points with random noise 1
creation of the 10000 ground truth points 1
variogram cloud 1
experimental variogram 1
theoretical variogram function 1
simple kriging results 1
ordinary kriging results 1
IDW results 1
comparison and analysis of results 1

What to submit and how to submit it

You have to submit a ZIP file (or another compressed format) with:

  1. Your report as a PDF, and
  2. all the Python files of your source code.

Submit your files by uploading them here.