Grades and feedback assignment 03

The marks for assignment 3 are online. The statistics for this assignments are as follows:

avg	67%
max	88%
min	37%
stddev	18%
median	72%

The marks are generally lower than the previous two, probably because the assignment gave you more liberty and was generally more challenging.

I must say that most of you did great, I enjoyed reading your solutions, which were almost all completely different.

For each group, I’ve created a folder in Dropbox where I’ve unzipped your submissions, and added a few files, eg b.json, t.json, and potentially some screenshots. Your report in PDF contains comments (I wrote directly in it), some of them were comments to me and for me, to remember later. I added one page at the end of your report, and scribbled the marks there, I think it’s readable if not let me know.

I have sent to one member of the team the link to the Dropbox folder (I used discord DM or email), please send to the other members of your team.

If you have questions about the marking or want more details, please email or send me a Discord DM (hledoux#8017); we can also use video.

How it was marked

The marking was complex because you all had completely different workflows and focused on different aspects. I took the marking scheme I had given in the description, and refined it as follows:

CityJSON file--Buildings
geometry valid? solid used? inner rings?	5%
realistic results? justification building height?	5%
semantics?	5%
schema valid?	5%
ground elevation?	5%
CityJSON file--Trees
solid valid?	5%
realistic results overall?	5%
separation efforts	10%
trunks: efforts to make them look ok, integration with the crown	5%
Report
Overview of the methodology used: clarity, "good" decisions, efforts to make it good, etc	25%
Pros/cons of the methodology: critical assessment	5%
contribution members	5%
improvements to CityJSON	5%

After reading them all, I decided that the 100% for each category would be the best among the solutions.

For marking, I took your files and visualised them with azul, used cjio validate(), and val3dity. I cropped randomly 2 buildings and 2 trees (cjio subset --random 2), those are the files in your folder.

The report was marked on clarity (did I have to read 3 times to get it?) and on the criteria that were in the description of the assignment.

Some feedback, in no particular order

A report is not a chronological order of what you did! Don’t write about all the hurdles and your early failures! The reader wants to first read what is your solution. You could add later some discussion about other parameters/algorithms you used, but please do not start with a story about what you did.
The structure of a scientific report should be TOP-DOWN. You start with the overview of the method and the results, and then you explain how you got there by adding more details. I know you might have learned otherwise in a writing course or communication courses. Why is this taught you might ask? I’d like to know too. Look at the good papers you have read (or will read next year for your thesis), they all start with the results, give a global overview and then focus on the details. This makes the life of the reader way easier, and they will not have to read 3X the text to get what was done.
Most of you (but not all) put the ground at 0m as an arbitrary elevation. While this coincidently make some sense (the elevation of the Mekelpark is about -1m), in practice doing this has a lot of influence and shouldn’t be done. The measuredHeight attribute you put in as an attribute was the elevation of the house wrt to NAP, a vertical datum. It wasn’t the height of the building… Also, many of you filtered the PC for the trees based on their height, you meant surely above the ground but you picked the z values (which are wrt NAP). If the ground is -2m, that will make a huge difference. 5% was subtracted for not taking the ground into account. The solution was to create a DTM from the ground points (a grid could have been used, or a TIN) and then subtract this from the z values everywhere. Some of you did this.
The AHN3 was big, and for the point-in-polygon step doing it brute-force wasn’t advised. Spatially indexing the footprints with an R-tree and then iterating over the points was the right thing to do (but other solutions like using a virtual grids like some of you did are also great). Some of you indexed the PC with an R-tree though, and I wonder why. Notice that an R-tree uses the bbox of the object to index it, but the bbox of a point is TWO POINTS, at the same location. You basically tripled the size of the PC, and I don’t know what it brings you. What is faster? I doubt, but let me know if I missed something.
Thinning is a good idea to test and debug, but ideally the end result should be without, especially for the trees where you’d like to have the full shapes of the trees. I understand that thinning=10 for this assignment in Python was good enough, and it was accepted with full marks. Some of you thinned by 50.000, which is really not a proper solution!
cjio has 2 operators that could have been used: remove_duplicates() and merge(). The former removes the duplicates in the "vertices" property; notice that this wasn’t an error just a warning in the cjio validation. The worst consequence of having duplicates is that the file is larger. A file could have duplicates but still be valid in val3dity? Yes, since val3dity merges duplicates when it reads a file (based on tolerance, 1mm by default).
CityJSON allows you to have 1+ geometry for a City Object, so for the trees if you didn’t feel like merging the crown and the trunk together into a CompositeSolid (kudos to the teams who tried this!), you could have had 2 geometries. Or a MultiSolid, where the topological relationships between the solids are not mandated, anything goes.
One team made me realise that I gave you an older version of the BAG. However, as many of you noticed, there are many discrepancies between the AHN3 datasets (~2015 I reckon) and the BAG; especially that a lot has changed on the campus the last few years in the southern part. No point was removed for those cases, just interesting to highlight.
Speed is not everything! Many of you seem literally obsessed with having the fastest solution, often at the expense of a good solution! You are using Python, so speed cannot be awesome, just admit it. Fast is great, but having good results is more important in my opinion. Especially for tasks like median value of a building, you would do it once and then save the result, no point in optimising this task, it’s not as if it’ll be done OTF all the time, no?
I too wish cjio validate() gave better granularity of errors, at this moment the errors you get are not very helpful, I admit. The reason is that JSON Schemas are used, and cjio simply passes to the user the errors that are raised… Modifying the code to validate generic JSON schemas seems overkill to me.
AHN3 has points already classified! Some of you didn’t seem to notice!? It’s a nice exercise to start from a unclassified PC, but here that wasn’t the idea. Should you trust the classification of AHN3? Good question, I guess here yes totally. In general is it perfect? No, but it’s a pretty good start.
Why are trees as “unclassified” in AHN3? And not in vegetation ASPRS classes? It has to do with how AHN3 is financed: Het Waterschapshuis pays for it, and the clients do not care about trees but about objects that will have an effect on flooding/runoff, and trees do not…
“An image is worth a thousand words” –> put more figures in your reports! Often you write-write-write about complex things (“the normal points towards this and that”) and a figure would make it so much easier for the reader. Draw it by hand on paper and take a picture with your phone, it’s fine the goal is to help the reader not having something beautiful.
cjio save() was potentially compressing your files more because all spaces/tabs/CRs are removed from the file.
“What was the best way to separate the trees?” Hmmm, I can’t answer this. There is some research papers about the topic, one team found that and the implementation: PyCrown. But one simple methods is similar to what many of you did: create clusters based on a tolerance/distance to other points. This could be done in 2D. Although notice that many started it seems with an arbitrary seed and then grew their clusters, why not start with the highest one (top of a tree)? Some of you used DBSCAN and obtained nice results.
Some of you got lower marks simply because you submitted files that were invalid: both for the schema and the geometry! These were “easy” to solve in general, and on the forum/discord we helped you. In general I’d say you should read those, they are public because we want everyone to benefit from the answers, not because we don’t like emails.