Feedback hw03

First, I want to say that I was generally impressed by the quality of the reports, the code, and the work you did for hw03. I don’t think you were given an easy task, and you all delivered (well, almost) code that ran out-of-the-box and generated datasets as requested, often with very good quality.

Most of you managed GitHub well (except two teams that made problematic commits at the end; no marks were lost for those as I mentioned), and almost all submissions were on time with the correct structure.

I learned many things reading some reports, which is a testament to the quality of what you did 👍

The marks reflect this: the average is 75% and the median 80%. One team provided results that I didn’t know could be achieved with only geometric methods, and their report was better written than many scientific articles out there, so they got 100% (bravo to them 👏).


To mark the assignments, I used the AHN5 file I had given you and a tile of AHN6 (this one: https://surfdrive.surf.nl/s/cb7fjoqrirsjpyM).

Then, for each team, I used their default parameters, except for the CSF where I set the grid resolution to 2.0m and the epsilon to 1.0 for classification.

I then checked if your DTM was close to mine by doing a map-algebra subtraction and checking values.

Then I visually reviewed the results to see if and how the classification performed.

Marking was complex because you optimised/calibrated your values for some datasets and those parameters often work best together (so if I used a given cloth resolution, it might break your workflow…). Since all started with the ground first (I think, though it wasn’t always clear!), if this is not performed correctly then other classes are affected. I was aware of this, so marking was done based on general observations with 2 randomly chosen datasets and studying the results.

For each group, I have added an issue to your GitHub repository, where I provided a summary of my comments and a link to the PDF with annotations.

If you have comments, it’s best to ask me during the exam session on Tuesday 2025-02-10 @ 11:00—12:30 in room BG.West.670 (where you can also see your exam copy).

Some comments, in no particular order

  1. Your teams were a real mix of nationalities/genders/background and if I had had to make teams to maximise diversity, I wouldn’t have done better. It’s nice to see!
  2. A picture is worth a thousand words, they say. Well, often many of you wrote extensively while a simple figure would have helped me, the poor reader who struggled to understand what you meant.
  3. Figures showing an overview of your trees from 300m away printed in a PDF mean nothing. You need to zoom in to discover that 20% of the points inside the trees are class=1 or that under the tree everything is buildings (class=6).
  4. As is the case for most processing of geographical datasets: buffer your dataset to be processed and crop at the end to avoid edge artifacts. This meant you should have selected 10m/20m more from the 500m×500m to ensure you avoid artifacts. When you’re done processing, you then crop, not before.
  5. In a scientific report, please do not describe in chronological order everything you tried. It is confusing for the reader. Explain what works and what you implemented, and then you can add a section explaining that you tried RANSAC but finally opted for region-growing. If you state at first that you use RANSAC and then reveal on p.8 that it wasn’t working so you dropped it, it’s really confusing.
  6. CSF and unmovable points:

    p.118 of the terrainbook:

    For each particle 𝑝, we need to define the lowest elevation it can move, once it reaches it it is labelled as unmovable(line5). The lowest elevation of one particle 𝑝is defined as the original elevation of the closest sample point S after projecting both to the 2D-plane.

    A few teams seem to have skipped this part or it wasn’t clear, and they got weird results in the end (partly because of this). All points needed to be projected to the 2D plane and the closest in 2D space was used for the lowest elevation (and not in 3D!).

  7. Some teams classified buildings before trees, others the other way around. All classified the ground first, though. What is the correct order? I would intuitively process buildings before trees, and the team that got 100% did this and obtained excellent results.
  8. You all used eigenvalues and some exploited return_number but not all. Using return_number was a double-edged sword: you find more trees but you omit a lot of the leaves on the periphery of the tree…
  9. Calculating characteristics/features (based on eigenvalues) at different scales is probably the way to go to consider the context around a tree or a building. And since carefully crafting the decision tree for the values to use, using machine learning with a random forest is probably the way to go (you’ll learn this in GEO5017). But ultimately, deep learning methods are the ones people will use in the future; at the moment, as I said during a lecture, it’s not clear what companies classifying the AHN are doing (but one thing is sure: they use humans to verify/fix errors from the computer).
  10. The overall algorithm should be presented clearly. Often you only had 3 sections, but were those performed in the order you presented? I should be informed of that as a reader.