Search this blog

25 September, 2023

14 September, 2023


Read the article here: Crap: WASMtoy. (

This blogspot site is dead! 

Update your links (and RSS!) to my new blog at

09 September, 2023

20x1000 Use(r)net Archive.

 Read the following article here: 20x1000 Use(r)net Archive. (

This blog is dead! Update your links (and RSS!) to

Below you will find a draft version of the post, all images, formatting and links will be missing here as I moved to my new system.

20x1000 Use(r)net Archive.

An investigation of the old web.

This website is a manifestation of an interest I've acquired over the past couple of years in internet communities, creative spaces and human-centric technology. Yes, my landing back on the "small web" is not just a reaction to seeing what happen when someone like Moron acquires a social network like they did with Twitter...

Part of it is that I consider Roblox itself (my employer at the time of writing, in case you don't know) to be part of the more "humanistic" web, a social experience not driven by ads, algorithms and passive feeds, but by creativity, agency, active participation.

As part of this exploration, I wanted to go back and see what we had when posting online was not subject to an algorithm, was not driven to maximize engagement to be able to monetize ads and the like... I downloaded a few archives of old usenet postings (i.e. when newsgroups were still used for discussions, and not as they later devolved, exclusively as a way to distribute binary files of dubious legality) and wrote a small script to convert them to HTML.

The conversion process is far from... good. As far as I could tell, there is no encoding of the comment trees in usenet, it's just a linear stream of email-like messages as received by the server. 

There does not even seem to be a standard for dates or... anything regarding the headers, so whilst I did write a parser that is robust enough to guess a date for each post in the archive, the date itself is not reliable, as I've seen a ton of different encodings, timezone formats and so on. 

Even the post subject is not entirely reliable, because people change it, sometimes by mistake (misspelling, corrections, truncation), sometimes adding chains of "re:" or "was:" and so on, which again, I tried somewhat to account for, but succeeded only partially.

For each archive I converted only the top 1000 posts by number of replies, and no other filtering was done, so you will see the occasional spam, and a ton of less than politically correct stuff. Proceed at your peril, you have been warned.

And now without further ado, here are a few archives for your perusal.

01 [FILE:EXTERNAL/news/alt.philosophy/index_main.htm alt.philosophy]

02 [FILE:EXTERNAL/news/alt.postmodern/index_main.htm alt.postmodern]

03 [FILE:EXTERNAL/news/]

04 [FILE:EXTERNAL/news/]

05 [FILE:EXTERNAL/news/]

06 [FILE:EXTERNAL/news/]

07 [FILE:EXTERNAL/news/comp.arch/index_main.htm comp.arch]

08 [FILE:EXTERNAL/news/comp.compilers/index_main.htm comp.compilers]

09 [FILE:EXTERNAL/news/ comp.development.industry]

10 [FILE:EXTERNAL/news/ comp.development.programming.algorithms]

11 [FILE:EXTERNAL/news/]

12 [FILE:EXTERNAL/news/]

13 [FILE:EXTERNAL/news/comp.lang.forth/index_main.htm comp.lang.forth]

14 [FILE:EXTERNAL/news/comp.lang.functional/index_main.htm comp.lang.functional]

15 [FILE:EXTERNAL/news/comp.lang.lisp/index_main.htm comp.lang.lisp]

16 [FILE:EXTERNAL/news/]

17 [FILE:EXTERNAL/news/comp.society.futures/index_main.htm comp.society.futures]

18 [FILE:EXTERNAL/news/]

19 [FILE:EXTERNAL/news/comp.sys.apple2/index_main.htm comp.sys.apple2]

20 [FILE:EXTERNAL/news/]

Better times? Worse times?

07 September, 2023

Notes: Reversing Revopoint Scanner.

Read the following article here: Notes: Reversing Revopoint Scanner. ( This blog is dead! Update your links (and RSS!) to

Below you will find a draft version of the post, all images, formatting and links will be missing here as I moved to my new system.

I have to admit, I bought my (...checking the settings...) iPhone 13 pro back in the day mostly because of its 3d scanning abilities, I wanted to have fun with acqusition of 3d scenes. It turns out that the lidar camera is not that strong, it's still good fun both for "serious" uses photogrammetry is better (RealityScan or the nerf-based but I digress...

[IMG:sitescape.jpg SiteScape iOS app]

[IMG:nerds.jpg No NERFs, only nerds.]

Point is, I have been fascinated with 3d scanning for quite a while, so when [LINK: revopoint] came out with a new kickstarter for its "range" scanner, I bit the bullet and got me one.
Unfortunately, as it often happens with new companies and products, albeit the hardware in the scanner is quite competent, the software side is still lacking. A fact that is often brought up in the support forums, the most annoying issue being its propensity to lose tracking of the object being scanned, and thus failing to align frames.

[IMG:revoscan.png I assure you, there is no Toshiba Libretto with a keyboard that large...]

This is especially infuriating as in theory one could run a more expensive alignment algorithm on the captured frames offline, but the software only works with realtime alignment, and it is not good enough to actually succeed at that.

Well, this is where knowing a bit of (python) programming, a bit about 3d and a dash of numerical optimization can come to rescue.

Luckily, revoscan saves a "cache" of raw frames in a trivial to load format. The output of the color camera is stored straight as images, while the depth camera is saved in ".dph" files - all being the same size: 500kb.

Now... 640*400 is 256000... so it seems that the depth is saved in a raw 2-byte per pixel format, which indeed is the case. Depth appears to be encoded as a 16 bit integer, with actual range going in the frames I've dumped from circa 3000 to 7000, with zero signaling an invalid pixel.
This seems close enough to the spec sheet, which describes the scanner as able to go from 300 to 800mm with a 0.1mm precision. So far so good!

[IMG:specs.png From the revopoint website.]

I don't want to make this too long, but suffice to say that trying to guess the right projection entirely from the specs I saw, didn't work. In fact, it seems to me the measurements they give (picture above) do not really make for a straight furstum.

[IMG:stretch.png Trying to do some math on pen an paper, from the specs - clearly wrong.]

One idea could be to just scan a simple scene with the included software, either capturing just a single frame (turns out the easiest is to delete all other frames in the "cache" folder, then reopen the scan) or using the included tripod to get a static scan, then convert it to a point cloud with as minimal processing as possible, and try to deduce the projection from there.

Well... that's exactly what I've done.

[IMG:calibration.jpg Trying to create a scene with a good, smooth depth range and some nice details.]

[IMG:revoscan2.jpg How it looks like in RevoScan.]

Point clouds are a well known thing, so of course you can find packages to handle them. For this I chose to work with [LINK: open3d] in Python/Jupyter (I use the Anaconda distribution), which is nowadays my go-to setup for lots of quick experiments. 
Open3d provides a lot of functionality, but what I was interested on for this is that it has a simple interface to load and visutalize point clouds, to find alignment between two clouds and estimate the distance between clouds.

Not, here is where a lot of elbow grease was wasted. It's trivial enough to write code to do numerical optimization for this problem, especially as open3d provides a fast enough distance metric that can be directly plugged in as an error term. The problem is to decide what parameters to optimize and how the model should look like. Do we assume everything is linear? Is there going to be any sort of lens distortion to compensate for? Do we allow for a translation term? A rotation term? How to best formulate all of these parameters in order to help the numerical optimization routine?

I tried a bunch of different options, I went through using quaternions, I tried optimizing first with some rigid transform compentation by having open3d align the point clouds before computing the error, to isolate just the projection parameters, and then fixing the projection and optimizing for translation and rotation (as unfortunately I did not find a way to constrain open3d alignment to an orthogonal transform) and so on.

At the beginning I was using differential evolution for a global search, followed by Nelder-Mead to refine the best candidate found, but I quickly moved to just doing NM for as a local optimizer and just "eyeballing" good starting parameters for a given model. I did restart NM by hand, by feeding it the best solution it found if the error seemed still large - this is a common trick as there is a phenomenon called "simplex collapse" that scipy does not seem to account for.

In the end, I just gave up trying to be "smart" and optimized a 3x4 matrix... yielding this:

[IMG:opt.png Eureka! Cyan is the RevoScan .ply exported cloud, Yellow is my own decoding of .dph files]

In python:
opt_M = [0.,-1.,0.,0., -1.,0.,0.,0. ,0.,0.,-1.,0.] # Initial guess
opt_M = [ 0.00007,-5.20327,0.09691,0.0727 , -3.25187,-0.00033,0.97579,-0.02795,  0.00015,0.00075,-5.00007,0.01569]
#opt_M = [ 0.,-5.2,0.1,0. ,-3.25,0.,0.976,0., 0.,0.,-5.,0.]

def img_to_world_M(ix,iy,d,P=opt_M): # Note: ix,iy are pixel coordinates (ix:0...400, iy:0...640), d = raw uint16 depth at that pixel location
    d/=50. # could have avoided this but I didn't want to look at large numbers in the matrix
    return np.matmul(np.array(P).reshape(3,4), np.array([(ix/400.0-0.5)*d,(iy/640.0-0.5)*d,d,1]))

with open(dph_file_path, 'rb') as f:
    depth_image = np.fromfile(f, dtype=np.uint16)
    print(min(depth_image), max(depth_image), min(depth_image[depth_image != 0]))
    depth_image = depth_image.reshape(400,640)

subset = [(iy,ix) for iy,ix in np.ndindex(depth_image.shape) if depth_image[iy,ix]!=0]
points = [img_to_world_M(ix,iy,depth_image[iy,ix]) for iy, ix in subset]
Surprisingly... the correct matrix is not orthogonal! To be honest, I would not have imagined that, and this in the end is why all my other fancy attempts failed. I tried with a couple of different scenes, and the results were always the same, so this seems to be the correct function to use.

Now, armed with this, I can write my own offline alignment system, or hack the scanner to produce for example and animated point cloud! Fun!

[offline_align.png Several frames aligned offline.]


- In RevoScan 5, the settings that seemed the best are: "accurate" scanning mode, set the range to the maximum 300 to 1200, fuse the point cloud with the "standard" algorithm set at the minimum distance of 0.1. This still does not produce, even for a single frame, the same exact points as decoding the .dph with my method, as RevoScan seems always to drop/average some points.

- The minimum and maximum scanning distance seem to be mostly limited by the IR illumiation, more than parallax? Too far, the IR won't reach, too near, it seems to saturate the depth cameras. This would explain also why the scanner does better with objects with a simple, diffuse, white albedo, and why it won't work as well in the sun.

[IMG:sls.jpg This is probably about ten years old now, around the time Alex Evans (see was toying with structured light scanning, I was doing the same. Sadly, the hard drives with these scans broke and I lost all this :/]

03 September, 2023

How does this work?

Read the following article here: This blog is dead! Update your links (and RSS!) to

Below you will find a draft version of the post, all images, formatting and links will be missing here as I moved to my new system.

(tl;dr: badly)

The common wisdom when starting a personal website nowadays is to go for a static generator. [LINK: Hugo] seems particularly popular and touted as a simple, fast, no-brainer solution.

OMG! If that's what simplicity looks like these days, we are really off the deep end. Now, I don't want to badmouth what is likely an amazing feat of engineering, I don't know enough about anything to say that... But, tradeoffs, right? Let's not just adopt some tech stack because it's "so hot right now". Right? [LINK: Overengineering is the root of all evil].


I had to interact for the first time with hugo for REAC2023, as I was trying to style a bit more our homepage with the graphic design I made this year, and that was enough to persuade me it's not made for my use-cases. I can imagine that if you are running a bigger shop, a "serious" website, handled by professionals, perhaps it makes sense? But for personal use I felt, quite literally, I could be more efficient using raw HTML. And I don't know HTML, at all!

Indeed in most cases for a blog like this, [LINK: raw HTML is all you need] (exhibit [LINK: B]). But I'm a programmer, first and foremost, and thus trained to waste time in futile efforts if they promise vague efficiency improvements "down the line" (perhaps, in the next life).

Bikeshedding, what can go wrong? In all seriousness though, this is a hobby, and so, everything goes. Plus, I love Python, but I don't know much about it (that's probably why I still love it), so more exercise can only help.

From the get go, I had a few requirements. Or anti-requirements, really:

1) I don't want to build a site generator, i.e. my own version of hugo et al. I'll write some code that generates the website, but the code and the website are one and the same, everything hardcoded/ad-hoc for it.
2) I don't want to write "much" code. Ideally I aim at fewer lines in total than the average Hugo configuration/template script.
3) I don't want to use markdown. Markdown is great, everyone loves it, but it's already overengineering for me. I just need plain text, plus the ability to put links and images.
4) I don't want to spin a webserver just to preview a dumb static website! Why that's a requirement is puzzling to me.
5) I want to be able to easily work on my articles anywhere, without having to install anything.
6) No javascript required. Might add some JS in the future for fun stuff, but the website will always work without.

This is actually how I used to write my blog anyways. Most of my posts are textfiles, I don't write in the horrible blogspot editor my drafts, that would be insane. The textfiles are littered with informal "tags" (e.g. "TODO" or "add IMAGE here" etc) that I can search and replace when publishing. So why not just formalize that!

That's about it. "txt2web" is a python script that scans a folder for .txt files, and convert them mechanically to HTML, mostly dealing with adding "br" tags and "nbsp". It prepends a small CSS inline file to them for "styling", and it understands how to make links, add images... and nothing else! Oh, yeah, I can **bold** text too, this is another thing I actually use in my writing.

Then it generates an index file, which is mostly the same flow converting an "index.txt" to web, but appending at the end a list of links to all other pages it found. And because I felt extra-fancy, I also record modification dates, so I can put them next to posts.

Yet, in its simplicity it has a few features that are important to me, and I could not find in "off the shelf" website builders. As of "v0.1":

- It checks links for validity, so I can know if a link expired. Maybe one day I could automatically link via Internet Archive, but I don't know if that's even wise (might confuse google or something?).
- It parses image size so the page does not need to reflow on load. Maybe one day I'll generate thumbnails as well. In general, the pages it generates are the fastest thing you'll ever see on the web.
- It reminds me of leftover "TODO"s in the page.
- The 10-liner CSS I added should correctly support day/night modes, and it should be mobile-friendly.
- It generates a good old RSS feed! I personally use Feedly/Reeder (iOS app) daily, after google killed its reader product.

If you want to check out the code (beware, it's horrible, I always forget how to write good "pythonic" code as I use it rarely), you'll find it [ here.]

Also, for each .htm there should be on the server the source .txt, as I upload everything (the source and the "production" website are one and the same). For example [FILE:001_txt2web.txt]!



What about gopher/the tildeverse/smol-net/permacomputing?
I like the idea. A lot. I believe there is more value to the individuals in being in smaller communities than in "megascale" ones. I believe that there is more value in content that is harder to digest than in the current "junkfood for the brain" homogenized crap we are currently serving.

I suspect Twitter and TikTok "won" because they are exploiting evolutionary biases - which make sense and we have to accept, but that do not necessarily serve us the best anymore. And I suspect that the most value of world-scale anything is extracted by celebrities and advertisers, to have a platform with a wide reach, not by most of the people on the platform.

But, needless to say, this is bigger topic for another time! BTW, if you don't know what I'm talking about, let me save you some google: [LINK:], [LINK:], [LINK:]

What's relevant to this post is that yes, the fact I have control over the website and I chose a minimalistic, text-based format, would allow me to output to other representations as well... Maybe one day I'll have a gopher page for work-in-progress stuff, for few people who care to lurk those kind of things.

[IMG:libretto.jpg Achievement unlocked?]

[IMG:cafe.jpg Hipster coffee, hipster writing.]

31 August, 2023

A new Blog: Reinventing the wheel.

Read the following article here: A new Blog: Reinventing the wheel. ( This blog is dead! Update your links (and RSS!) to 

Below you will find a draft version of the post, all images, formatting and links will be missing here as I moved to my new system.

 A new Blog: Reinventing the wheel.

I made my first website in high school, must not have been long after I discovered the internet and signed a contract with the first provider of my town. Remember Microsoft Frontpage and GeoCities? Photoshop web export? That!

It was nothing much, the kind of things that later would find home on MySpace: music, friends, some drawings and 3d art I was making at the time, demoscene, a bit of photography, animated gifs of course. I think later on even had some java effects on it. All in all, teenager stuff.

Realizing that nobody in the world would care about my crappy art page, it was not long lived, in fact I don't even think I saved a copy in my archives. But it introduced me to this idea of the web and using it for personal spaces.

So, soon after I started another web project, this time focusing on mainstream subjects such as a teenager's view of philosophy, politics, fountain pens and lisp... This time, it was going to using cutting-edge, newfangled technology. It was going to be a blog! 

[IMG:lemon.png Celebrity! Somehow a "famous" lisp website noticed me back in the days...]

[NOTE: do not link... but it's still up ->]

And yes, that was on blogspot, where my main blog is/used to be until today!

It was truly exciting, even if in retrospect, dumb. See, the idea of keeping an online journal and sharing it is great. What's not to like. Writing - great. Journaling - great. Sharing - great. Even if you don't get any visitors, just the feeling of being part of a community, and a cutting-edge one at that, exploring the cyberspace, joining webrings... Why not?

Dumb... because, well, I already knew how to write websites, and blogspot offered... nothing. The value is zero, and it has always been zero. It had and has a crappy editor - and we already had frontpage and geocities, you didn't need to know HTML. It was not a social network. Even basic stuff like visitor count and so on had to be brought from external providers. 

True, it does allow for comments, and back in the days these were a bit better, but they were never great - today they are only spam. We felt good using it, even if it was truly never good. And... we pay a price, a quite high price at that.

We got nothing, nothing of value anyways. And in return we locked ourselves in a platform - one that happens to be dying nowadays, but in general, we gave our creativity for free to an entity that gave us nothing in return.

Big whoop you say! This is the deal of the modern internet, didn't you hear? "If you are not paying for it, you're not the customer - you are the product being sold". Yes, yes, I'm not that naive. There is a nuance - at least for me. The trade is not per-se bad. But it is a trade, and you have to understand how much value you are getting.

This is true for everything, really, in tech, perhaps in life. Tradeoffs. I made a few bad deals, and it's time to rectify them. Blogspot has no value. I even used to host my presentations and files on Scribd for it - and boy was that a mistake. 

We should talk about Twitter and similar communities as well... But that will be for another time...

I abandoned my first blog when I started working professionally in gaming. I didn't want to have my real name associated with it as I was navigating my first jobs, and I didn't want to have to discuss with my employer the nuances of what's good to post or not on a personal, but technical blog. 

Eventually the blog became "famous" enough that people knew it was me behind it, so I dropped the pretense of anonymity - but that came many years after its inception.

And here we are now. So, this is going to be my new homepage. I hope you enjoy it! It has many features Blogspot never supported, both for you as a viewer and certainly for me as a writer.

You can understand why I went with my own website instead of simply moving to the next "great for now" platform. I've looked around a bit, and found nothing that provided any value to me. 

Medium is about the same as Blogspot. CoHost - I don't need to tangle my writing with the social media I use to advertise and discuss about it. Substack? I don't care about getting paid... Github pages? Why on earth?

I just want a place to share random crap.

The old blog will stay up and for a while I plan to cross-post on both. Currently, I have no plans to take the old blog down, but I have scraped its contents in a few different ways "just in case".

- Angelo Pesce, a.k.a. deadc0de on c0de517e, a.k.a. "kenpex"


[IMG:1stweb.png Quirky, unprofessional web "design", wasn't life more fun when we were not using all the same cookie molds?]

[IMG:1stweb_2.png Yeah, the entrace featured my first car, cruising on the Salerno coast hightway, with bad Photoshop effects!]

[IMG:engblog.png Teenage problems on display. And lisp.]

[IMG:itblog.png Even more personal, even more random, and of course, more bad Photoshop!]

12 April, 2023

Half baked and a half: A small update.

Previously: C0DE517E: Half baked: Dynamic Occlusion Culling

Trying the idea of using the (incrementally accumulated) voxel data to augment the reprojection of the previous depth buffer.

Actually, I use here a depth from five frames ago (storing them in a ring buffer) - to simulate the (really worst-case) delay we would expect from CPU readbacks.

Scene and the final occlusion buffer (quarter res):

Here is the occlusion buffer, generated with different techniques. Top: without median, Bottom: with. Left to right: depth reprojection only, voxel only, both. 

Note that the camera was undergoing fast rotation, you can see that the reprojected depth has a large area along the bottom and left edges where there is no information.

Debug views: accumulated voxel data. 256x256x128 (8mb) 8bit voxels, each voxel stores a 2x2x2 binary sub-voxel. 

The sub-voxels are rendered only "up close", they are a simple LOD scheme. In practice, we can LOD more, render (splat) only up close and only in areas where the depth reprojection has holes.

Note that my voxel renderer (point splatter) right now is just a brute-force compute shader that iterates over the entire 3d texture (doesn't even try to frustum cull). 
Of course that's bad, but it's not useful for me to improve performance, only to test LOD ideas, memory requirements and so on, as the real implementation would need to be on the CPU anyways.

Let's go step by step now, to further illustrate the idea thus far.

Naive Z reprojection (bottom left) and the ring buffer of five quarter-res depth buffers:

Note the three main issues with the depth reprojection:
  1. It cannot cover the entire frame, there is a gap (in this case on the bottom left) where we had no data due to camera movement/rotation.
  2. The point reprojection undersampled in the areas of the frame that get "stretched" - creating small holes (look around the right edge of the image). This is the primary job of the median filter to fix, albeit I suspect that this step can be fast enough that we could also supersample a bit (say, reproject a half-res depth into the quarter res buffer...)
  3. Disocclusion "holes" (see around the poles on the left half of the frame)
After the median filter (2x magnification). On the left, a debug image showing the absolute error compared to the real (end of frame) z-buffer. 

The error scale goes from yellow (negative error - false occlusion) to black (no error) to cyan (positive error - false disocclusion. Also, there is a faint yellow dot pattern marking the areas that were not written at all by the reprojection.

Note how all the error right now it "positive" - which is good:

My current hole-filling median algorithm does not fix all the small reprojection gaps, it could be more aggressive, but in practice right now it didn't seem to be a problem.

Now let's start adding in the voxel point splats:

And finally, only in the areas that still are "empty" from either pass, we do a further dilation (this time, a larger filter, starting from 3x3 but going up to 5x5, taking the farthest sample)

We get the entire frame reconstructed, with an error that is surprisingly decent.

A cute trick: it's cheap to use the subvoxel data, when we don't render the 2x2x2, to bias the position of the voxel point. Just a simple lookup[256] to a float3 with the average position of the corresponding full subvoxels for that given encoded byte.

This reasoning could be extended to "supervoxels", 64 bits could and should (data should be in Morton order, which would result in an implicit, full octree) encode 2x2x2 8 bit voxels... then far away we could splat only one point per 64bit supervoxels, and position it with the same bias logic (create an 8bit mask from the 64bits, then use the lookup).

10 April, 2023

From the archive: Notes on GGX parallax correction.

As for all my "series" - this might very well the first and last post about it, we'll see. I have a reasonable trove of solutions on my hard-drive that were either shipped, but never published, not even shipped or were, shipped, "published" but with minimal details, as a side note of bigger presentations. Wouldn't it be a shame if they spoiled?

Warning! All of what I'm going to talk about next probably is not very meaningful if you haven't been implementing parallax-corrected cubemaps before (or rather, recently), but if you did, it will (hopefully) all make sense.
This is not going to be a gentle introduction to the topic, just a dump of some notes...

Preconvoluted specular cubemaps come with all kinds of errors, but in the past decade or so we invented a better technique, where we improve the spatial locality of the cubemap by using a proxy geometry and raycasting. 

Typically the proxy geometry is rectangular, and the technique is known as parallax-corrected specular cubemaps. This better technique comes with even more errors built-in, I did a summary of all of the problems here, back in 2015.

From Seb. Lagarde (link above)

The following is an attempt to solve one of the defects parallax correction introduces, by retrofitting some math I did for area lights to see if we can come up with a good solution.

Setup is the following: We have a cubemap specular reflection probe somewhere, and we want to use that to get the specular from a location different from the cube center. 
In order to do so, we trace a reflection ray from the surface to be shaded to the scene geometry, represented via some proxies that are easy to intersect, then we look the reflection baked in the probe towards the intersection point.

The problem with this setup is illustrated below. If you think of the specular lobe as projecting its intensity on the surfaces of the scene, you get a given footprint, which will be in general discontinuous (due to visibility) and stretched.

Think of our specular lobe like shining light from a torch on a surface.

Clearly, when we baked the cubemap, we were moving the torch in a given way, from the cubemap center all around. When we query though, we are looking for the lobe that a torch would create on the scene from the shaded point, towards the reflection direction (or well, technically not as a BRDF is not a lobe around the mirror reflection direction but you know that with preconvolved cubemaps we always approximate with "Phong"-like lobes).

By using the cubemap information, we get a given projected kernel which in general doesn't match -at all- the kernel that our specular lobe on the surface projects.
There is no guarantee that they are even closely related, because they can be at different distances, at different angles and "looking" at different scene surfaces (due to discontinuities).

Now, geometry is the worst offender here. 

Even if the parallax proxy geometry is not the real scene, and we use proxies that are convex (boxes, k-dops...), naively intersecting planes to get a "corrected" reflection lookup clearly shows in shading at higher roughness, due to discontinuities in the derivatives.

From youtube - note how the reflected corners of the room appear sharp, are not correctly blurred by the rough floor material.

The proxy geometry becomes "visible" in the reflection: as the ray changes plane, it changes the ratio of correction, and the plane discontinuity becomes obvious in the final image. 

This is why in practice intersecting boxes is not great, and you'd have to find some smoother proxy geometry or "fade" out the parallax correction at high roughness. To my knowledge, everyone (??) does this "by eye", I'm not aware of a scientific approach, motivated in approximations and errors.

Honestly today I cannot recall what ended up shipping at the time, I think we initially had the idea of "fading" the parallax correction, then I added a weighting scheme to "blend" the intersection (ray parameter) between planes, and I also "pushed away" the parallax planes if we are too near them.

In theory you could intersect something like a rounded box primitive, control the rounding with the roughness parameter, and reason about Jacobians (derivatives, continuity of the resulting filtering kernel, distortion...) but that sounds expensive and harder to generalize to k-dops.

The second worst "offender" with parallax correction is the difference in shape of the specular lobes, the precomputed one versus the "ideal" one we want to reconstruct, that happens even when both are projected on the same plane (i.e. in absence of visibility discontinuities).

The simplest correction to make is in the case where the two lobes are both perpendicular to a surface, the only difference being the distance to it.

This is relatively easy as increasing the distance looks close enough to increasing the roughness. Not exactly the same, but close enough to fit a simple correction formula that tweaks the roughness we fetch from the cubemap based on the ratio between the cubemap-to-intersection distance and the surface-to-intersection one:

From this observation we know we can use numerical fitting and precomputation to find a correction factor from one model to another. 
Then, we can take that fitted data and either using a lookup for the conversion or we can find an analytic function that approximates it.

This methodology is what I described at Siggraph 2015 and have used many times since. Formulate an hypothesis: this can be approximated with that. Use brute force to optimize free parameters. Visualize the fitting and end results versus ground truth to understand if the process worked or if not, why not (where are the errors). Rinse and repeat.

Here you can see the first step. For every roughness (alpha) and distance, I fit a GGX D lobe with a new alpha', here adding a multiplicative scaling factor and an additive offset (subtractive, really, as the fitting will show).

Why we use an additive offset? Well, it helps with the fitting, and it should be clear why, if we look at the previous grid. GGX at high roughness has long tail that turns "omnidirectional", whilst a low roughness lobe that is shining far away from a plane does not exhibit that omnidirectional factor.

We cannot use it though, we employ only to help the fitting process find a good match. Why? Well, first, because we can't express it with a single fetch in a preconvolved cubemap mip hierarchy (we can only change the preconvolved lobe by a multiplicative factor), but also note that it is non-zero only in the area where the roughness maxes out (we cannot get rougher than alpha=1), and in that area there is nothing really that we can do.

Of course, next we'd want to find an analytic approximation, but also make sure everything is done in whatever exact association there is from cubemap mip level to alpha, ending up with a function that goes from GGX mip selection to adjusted GGX mip selection (given the distance). 
This is really engine-dependent, and left as an exercise to the reader (in all honesty, I don't even have the final formulas/code anymore)

Next up is to consider the case where the cubemap and the surface are not perpendicular to the intersection plane (even keeping that to be just a plane, so again, no discontinuities). Can we account for that as well?

To illustrate the problem, the following shows the absolute value of the cosine of the angle of the intersection between the reflection direction and the proxy planes in a scene.

This is much harder to fit a correction factor for. The problem is that the two different directions (the precomputed one and the actual one) can be quite different.
Same distance, one kernel hits at polar angle Pi/3,0, the second -Pi/3,Pi/3. How do you adjust the mip (roughness) to make one match the other?

One possible idea is to consider how different is the intersection at an angle and the corresponding perpendicular one.
If we have a function that goes from angle,distance -> an isotropic, perpendicular kernel (roughness', angle=0, same distance) then we could maybe go from the real footprint we need for specular to an isotropic footprint, and from the real footprints that we have in the cubemap mips to the isotropic and search for the closest match between the two isotropic projections.

The problem here is that really, with a single fetch/isotropic kernel, it doesn't seem that there a lot to gain by changing the roughness as function of the angle. 

In the following, I grapth projections at an angle compared to perpendicular lobe (GGX D term only). 
All graphs are with alpha = 0.1, distance = plane size (so it's equivalent to the kernel at the center of a prefiltered cubemap when you ignore the slant). 

Pi/6 - the two lobes seem "visually" very close:

At Pi/2.5 we get a very long "tail" but note that the width of the central part of the kernel seems still to fit the isotropic fetch without any change of roughness.

Now here "seems to fit" really doesn't mean much. What we should do is to look at rendered results, compare to ground truth / best effort (i.e. using sampling instead of prefiltering, whilst still using the assumption of representing radiance with the baked, localized cubemap), and if we want to then use numerical methods, do so with an error measure based on some perceptual metric.

And this is what I did, but failed to find any reasonable correction, keeping the limitation of a single fetch. The only hope is to turn to multiple fetches, and optimize the preconvolution specifically to bake data that is useful for the reconstruction, not using a GGX prefiltering necessarily.

I suspect that actually the long anisotropic tail created by the BRDF specular lobe is not, visually, an huge issue. 
The problem that what we get is (also) the opposite, from the point of view of the reconstruction, we get tails "baked" into the prefiltered cube at arbitrary angles (compared to the angles we need for specular on surfaces), and these long tails create artifacts.

To account for that, the prefiltering step should probably take directly into account the proxy geometry shape. I.e. if these observations are correct, they point towards the idea that parallax-corrected cubemaps should be filtered by a fixed distance (relative to projected texel size), perpendicular to the proxy plane kernel. 

That way when we query the cubemap we have only to convert the projected specular kernel to a kernel perpendicular to the surface (which would be ~ the same kernel we get at that roughness and same distance, just perpendicular), and then look in the mip chain the roughness that gives us a similar prefiltered image, by doing a distance-ratio-to-roughness adjustment as described in the first part of this text. 

15 March, 2023

Half baked: Dynamic Occlusion Culling

The following doesn't work (yet), but I wanted to write something down both to put it to rest for now, as I prepare for GDC, and perhaps to show the application of some of the ideas I recently wrote about here.

A bit of context. Occlusion culling (visibility determination) per se is far from a solved problem in any setting, but for us (Roblox) it poses a few extra complications:

  1. We don't allow authoring of "technical details" - so no artist-crafted occluders, cells and portals, and the like.
  2. Everything might move - even if we can reasonably guess what is dynamic in a scene, anything can be changed by a LuaU script.
  3. We scale down to very low-power and older devices - albeit this might not necessarily be a hard constraint here, as we could always limit the draw distance on low-end to such degrees that occlusion culling would become less relevant. But it's not ideal, of course.

That said, let's start and find some ideas on how we could solve this problem, by trying to imagine our design landscape and its possible branches. 

Image from

Real-time "vs" Incremental

I'd say we have a first obvious choice, given the dynamic nature of the world. Either we try to do most of the work in real-time, or we try to incrementally compute and cache some auxiliary data structures, and we'd have then to be prepared to invalidate them when things move.

For the real-time side of things everything (that I can think of) revolves around some form of testing the depth buffer, and the decisions lie in where and when to generate it, and when and where to test it. 

Depth could be generated on the GPU and read-back, typically a frame or more late, to be tested on CPU, it could be generated and tested on GPU, if our bottlenecks are not in the command buffer generation (either because we're that fast, or because we're doing GPU-driven rendering), or it could be both generated and tested on CPU, via a software raster. Delving deeper into the details reveals even more choices. 

On GPU you could use occlusion queries, predicated rendering, or a "software" implementation (shader) of the same concepts, on CPU you would need to have a heuristic to select a small set of triangles as occluders, make sure the occluders themselves are not occluded by "better" ones and so on.

All of the above, found use in games, so on one hand they are techniques that we know could work, and we could guess the performance implications, upsides, and downsides, and at the same time there is a lot that can still be improved compared to the state of the art... but, improvements at this point probably lie in relatively low-level implementation ideas. 

E.g. trying to implement a raster that works "conservatively" in the sense of occlusion culling is still hard (no, it's not the same as conservative triangle rasterization), or trying to write a parallelized raster that still allows doing occlusion tests while updating it, to be able to occlude-the-occluders while rendering them, in the same frame, things of that nature. 

As I wanted to explore more things that might reveal "bigger" surprises, I "shelved" this branch...

Let's then switch to thinking about incremental computation and caching.

Caching results or caching data to generate them?

The first thing that comes to mind, honestly, is just to cache the results of our visibility queries. If we had a way to test the visibility of an object, even after the fact, then we could use that to incrementally build a PVS. Divide the world into cells of some sort, maybe divide the cells per viewing direction, and start accumulating the list of invisible objects.

All of this sounds great, and I think the biggest obstacle would be to know when the results are valid. Even offline, computing a PVS from raster visibility is not easy, you are sampling the space (camera positions, angles) and the raster results are not exact themselves, so, you can't know that your data structure is absolutely right, you just trust that you sampled enough that no object was skipped. For an incremental data structure, we'd need to have a notion of "probability" of it being valid.

You can see a pattern here by now, a way of "dividing and conquering" the idea landscape, the more you think about it, the more you find branches and decide which ones to follow, which ones to prune, and which ones to shelve. 

Pruning happens either because a branch seems too unlikely to work out, or because it seems obvious enough (perhaps it's already well known or we can guess with low risk) that it does not need to be investigated more deeply (prototyping and so on). 

Shelving happens when we think something needs more attention, but we might want to context-switch for a bit to check other areas before sorting out the order of exploration...

So, going a bit further here, I imagined that visibility could be the property of an object - a visibility function over all directions, for each direction the maximum distance at which it would be unoccluded - or the property of the world, i.e. from a given region, what can that region see. The object perspective, even if intriguing, seems a mismatch both in terms of storage and in terms of computation, as it thinks of visibility as a function - which it is, but one that is full of discontinuities that are just hard to encode.

If we think about world, then we can imagine either associating a "validity" score to the PVS cells, associating a probability to the list of visible objects (instead of being binary), or trying to dynamically create cells. We know we could query, after rendering, for a given camera the list of visible objects, so, for an infinitesimal point in 5d space, we can create a perfect PVS. From there we could cast the problem as how to "enlarge" our PVS cells, from infinitesimal points to regions in space. 

This to me, seems like a viable idea or at least, one worth exploring in actual algorithms and prototypes. Perhaps there is even some literature about things of this nature I am not aware of. Would be worth some research, so for now, let's shelve it and look elsewhere!


Caching results can be also thought of as caching visibility, so the immediate reaction would be to think in terms of occluder generation as the other side of the branch... but it's not necessarily true. In general, in a visibility data structure, we can encode the occluded space, or the opposite, the open space. 

We know of a popular technique for the latter, portals, and we can imagine these could be generated with minimal user intervention, as Umbra 3 introduced many years ago the idea of deriving them through scene voxelization.

Introduction to Occlusion Culling | by Umbra 3D | Medium

It's realistic to imagine that the process could be made incremental, realistic enough that we will shelve this idea as well...

Thinking about occluders seem also a bit more natural for an incremental algorithm, not a big difference, but if we think of portals, they make sense when most of the scene is occluded (e.g. indoors), as we are starting with no information, we are in the opposite situation, where at first the entire scene is disoccluded, and progressively might start discovering occlusion, but hardly "in the amount" that would make most natural sense to encode with something like portals. There might be other options there, it's definitely not a dead branch, but it feels unlikely enough that we might want to prune it.

Here, is where I started going from "pen and paper" reasoning to some prototypes. I still think the PVS idea that we "shelved" might get here as well, but I chose to get to the next level on occluder generation for now. 

From here on the process is still the same, but of course writing code takes more time than rambling about ideas, so we will stay a bit longer on one path before considering switching. 

When prototyping I want to think of what the real risks and open questions are, and from there find the shortest path to an answer, hopefully via a proxy. I don't need at all to write code that implements the way I think the idea will work out if I don't need to - a prototype is not a bad/slow/ugly version of the final product, it can be an entirely different thing from which we can nonetheless answer the questions we have.

With this in mind, let's proceed. What are occluders? A simplified version of the scene, that guarantees (or at least tries) to be "inside" the real geometry, i.e. to never occlude surfaces that the real scene would not have occluded. 

Obviously, we need a simplified representation, because otherwise solving visibility would be identical to rendering, minus shading, in other words, way too expensive. Also obvious that the guarantee we seek cannot hold in general in a view-independent way, i.e. there's no way to compute a set of simplified occluders for a polygon soup from any point of view, because polygon soups do not have well-defined inside/outside regions.

So, we need to simplify the scene, and either accept some errors or accept that the simplification is view-dependent.  How? Let's talk about spaces and data structures. As we are working on geometry, the first instinct would be to somehow do computation on the meshes themselves, in object and world space. 

It is also something that I would try to avoid, pruning that entire branch of reasoning, because geometric algorithms are among the hardest things known to mankind, and I personally try to avoid writing them as much as I can. I also don't have much hope for them to be able to scale as the scene complexity increases, to be robust, and so on (albeit I have to say, wizards at Roblox working on our real-time CSG systems have cracked many of these problems, but I'm not them).

World-space versus screen-space makes sense to consider. For data structures, I can imagine point clouds and voxels of some sort to be attractive.

First prototype: Screen-space depth reprojection

Took a looong and winding road to get here, but this is one of the most obvious ideas as CryEngine 3 showed it to be working more than ten years ago. 

Secrets of CryEngine 3

I don't want to miscredit this, but I think it was Anton Kaplanyan's work (if I'm wrong let me know and I'll edit), and back then it was dubbed "coverage buffer", albeit I'd discourage the use of the word as it already had a different meaning (the c-buffer is a simpler version of the span-buffer, a way to accelerate software rasterization by avoiding to store a depth value per pixel). 

They simply took the scene depth after rendering, downsampled it, and reprojected - by point splatting - from the viewpoint of the next frame's camera. This creates holes, due to disocclusion, due to lack of information at the edges of the frame, and due to gaps between points. CryEngine solved the latter by running a dilation filter, able to eliminate pixel-sized holes, while just accepting that many draws will be false positive due to the other holes - thus not having the best possible performance, but still rendering a correct frame. 

Holes, in red, due to disocclusions and frame edges.

This is squarely in the realm of real-time solutions though, what are we thinking? 

Well, I was wondering if this general idea of having occluders from a camera depthbuffer could be generalized a bit more. First, we could think of generating actual meshes - world-space occluders, from depth-buffer information. 

As we said above, these would not be valid from all view directions, but we could associate the generated occluders from a set of views where we think they should hold up.

Second, we could keep things as point clouds and use point splatting, but construct a database from multiple viewpoints so we have more data to render occluder and fill the holes that any single viewpoint would create.

For prototyping, I decided to use Unity, I typically like to mix things up when I write throwaway code, and I know Unity enough that I could see a path to implement things there. I started by capturing the camera depth buffer, downsampling, and producing a screen-aligned quad-mesh I could displace, effectively like a heightfield. This allowed me to write everything via simple shaders, which is handy due to Unity's hot reloading.

Test scene, and a naive "shrink-wrap" mesh generated from a given viewpoint

Clearly, this results in a "shrink-wrap" effect, and the generated mesh will be a terrible occluder from novel viewpoints, so we will want to cut it around discontinuities instead. In the beginning, I thought about doing this by detecting, as I'm downsampling the depth buffer, which tiles can be well approximated by a plane, and which contain "complex" areas that would require multiple planes. 

This is a similar reasoning to how hardware depth-buffer compression typically works, but in the end, proved to be silly.

An easier idea is to do an edge-detection pass in screen-space, and then simply observe which tiles contain edges and which do not. For edge detection, I first generated normals from depth (and here I took a digression trying and failing to improve on the state of the art), then did two tests.

A digression...

First, if neighboring pixels are close in 3d space, we consider them connected and do not generate an edge. If they are not close, we do a second test by forming a plane with the center pixel and its normal and looking at the point-to-plane distance. This avoids creating edges connected geometry that just happens to be at a glancing angle (high slope) in the current camera view.

Depth, estimated normals, estimated edge discontinuties.

As I'm working with simple shaders, I employ a simple trick. Each vertex of each quad in my mesh has two UVs, one corresponding to the vertex location - which would sample across texels in the heightmap, and one corresponding to the center of the quad, which would sample a single texel in the heightmap. 
In the vertex shader, if a vertex is hitting an "edge" texel when sampling the first UV set, it checks the quad center UV sample as well. If this is still on an edge texel, then the whole quad is part of an edge, and I send the vertex to NaN to kill the triangles. Otherwise, I just use the height from the second sample.

In practice this is overly conservative as it generates large holes, we could instead push the "edge" quads to the farthest depth in the tile, which would hold for many viewpoints, or do something much more sophisticated to actually cut the mesh precisely, instead of relying on just quads. The farthest depth idea is also somewhat related to how small holes are filled in Crytek's algorithm if one squints enough...

What seems interesting, anyhow, is that even with this rudimentary system we can find good, large occluders - and the storage space needed is minimal, we could easily hold hundreds of these small heightfields in memory...

Combining multiple (three) viewpoints

So right now what I think would be possible is:

  • Keep the last depth and reproject plus close small holes from that, ala Crytek.
  • Then try to fill the remaining holes by using data from other viewpoints. 
  • For each view we can have a bounding hierarchy by just creating min-max depth mips (a pyramid), so we can test the volumes against the current reprojection buffer. And we need only to "stencil" test, to see how much of a hole we could cover and with what point density.
  • Rinse and repeat until happy...
  • Test visibility the usual way (mip pyramid, software raster of bounding volumes...)
  • Lastly, if the current viewpoint was novel enough (position and look-at direction) compared to the ones already in the database, consider adding its downsampled depth to the persistent database.

As all viewpoints are approximate, it's important not to try to merge them with a conventional depthbuffer approach, but to prioritize first the "best" viewpoint (the previous frame's one), and then use the other stored views only to fill holes, prioritizing views closer to the current camera.

If objects move (that we did not exclude from occluder generation), we can intersect their bounding box with the various camera frustums, and either completely evict these points of view from the database, or go down the bounding hierarchy / min-max pyramid and invalidate only certain texels - so dynamic geometry could also be handled.

The idea of generating actual geometry from depth probably also has some merit, especially for regions with simple occlusion like buildings and so on. The naive quad mesh I'm using for visualization could be simplified after displacement to reduce the number of triangles, and the cuts along the edges could be done precisely, instead of on the tiles. 

But it doesn't seem worth the time mostly because we would still have very partial occluders with big "holes" along the cuts, and merging real geometry from multiple points of view seems complex - at that point, we'd rather work in world-space, which brings to...

Second prototype: Voxels

Why all the complications about viewpoints and databases, if in the end, we are working with point sets? Could we store these directly in world-space instead? Maybe in a voxel grid?

Of course, we can! In fact, we could even just voxelize the scene in a separate process, incrementally, generating point clouds, signed distance fields, implicit surfaces, and so on... That's all interesting, but for this particular case, as we're working incrementally anyways, using the depth buffer is a particularly good idea. 

Going from depth to voxels is trivial, and we are not even limited to using the main camera depth, we could generate an ad-hoc projection from any view, using a subset of the scene objects, and just keep accumulating points / marking voxels.

Incidentally, working on this made me notice an equivalence that I didn't think of before. Storing a binary voxelization is the same as storing a point cloud if we assume (reasonably) that the point coordinates are integers. A point at a given integer x,y,z is equivalent to marking the voxel at x,y,z as occupied, but more interestingly, when you store points you probably want to compress them, and the obvious way to compress would be to cluster them in grid cells, and store grid-local coordinates at a reduced precision. This is exactly equivalent then again to storing binary voxels in a sparse representation. 

It is obvious, but it was important to notice for me because for a while I was thinking of how to store things "smartly", maybe allow for a fixed number of points/surfels/planes per grid and find ways to merge when adding new ones, all possible and fun to think about, but binary is so much easier. 

In my compute shader, I am a rebel bit-pack without even InterlockedOR because I always wanted to write code with data races that still converge to the correct result! 

As the camera moves (left) the scene voxelization is updated (left)

If needed, one could then take the binary voxel data and compute from it a coarser representation that encodes planes or SDFs, etc! This made me happy enough that even if it would be cute to figure out other representations, they all went into a shelve-mode. 

I spent some time thinking about how to efficiently write a sparse binary voxel, or how to render from it in parallel (load balancing the parallel work), how to render front-to-back if needed, all interesting problems but in practice, not worth yet solving. Shelve!

The main problem with a world-space representation is that the error in screenspace is not bounded, obviously. If we get near the points, we see through them, and they will be arbitrarily spaced apart. We can easily use fewer points farther from the camera, but we have a fixed maximum density.

The solution? Will need another blog post, because this is getting long... and here is where I'm at right now anyways!

I see a few options I want to spend more time on:

1) Draw points as "quads" or ellipsoids etc. This can be done efficiently in parallel for arbitrary sizes, it's similar to tile-based GPU particle rendering.

We could even be clever, under the assumption that splats do not overlap much: we can send them to different tiles based on their size - forming a mipmap hierarchy of buckets. In that case, we know that for each bucket there is only a small fixed number of splats that could land. Then, walking per each pixel the hierarchy from the biggest splats/fewer tiles to the smallest, you even get approximate depth sorting!

2) We could do something more elaborate to reconstruct a surface in screen-space / fill holes.

Imperfect Shadow Maps used a push-pull pyramid to fill arbitrary-sized holes for example. In our case though we would need to be more careful to only join points that are supposed to be on the same surface, and not holes that were actually present in the scene... 

A related problem would be on how to perform visibility on the point cloud itself, as clearly points father aways will poke in between closest points. That could be addressed with some kind of depth layers or a similar heuristic, allowing a near point to "occlude" a large number of background points, farther than a few voxels from it... 
These ideas have some research in the point cloud literature, but none is tailored to occlusion, which has different requirements.

3) We could reconstruct a surface for near voxels, either by producing an actual mesh (which we could cache, and optimize) or by raymarching (gives the advantage of being able to stop at first intersection). 

We'd still points at a distance, when we know they would be dense enough for simple dilation filters to work, and switch to the more expensive representation only for voxels that are too close to the camera to be treated as points.  

Inspired by MagicaVoxel's binary MC (see here a shadertoy version) - made a hack that could be called "binary sufrace nets". Note that this is at half the resolution of the previous voxel/point clouds images, and still holds up decently.

4) We could hybridize with the previous idea, and use the depth from the last frame as an initial reprojection, while then fetching from the point cloud/voxel representation for hole-filling (we'd still need some way of dealing with variable point density, but it might matter less if it's only for a few holes).

I think this is the most promising direction, it makes caching trivial, while side-stepping the biggest issues with world-space occluders, which is the fact that even a tiny error (say, 1 centimeter) if seen up close enough (in front of your virtual nose) would cause huge mis-occlusions. 

If we used the previous screenspace Z as an initial occlusion buffer, and then augment that with the world-space point cloud, we could render the latter with a near plane that is pushed far enough for the approximation error not to be problematic, while still filling the holes that the reprojection would have. Yes, the holes will still miss some occluders, as now we're not using the cache until a given distance, and worst case we could peek behind a wall causing lots of objects to be rendered... but realtime rendering is the art of finding the best compromises...