It never ceases to amaze me how slow compute shaders are comparing to a plain old fullscreen pass. Granted, I probably don’t know something but every time I tried to get some performance boost porting a standard VS-PS to compute the result was not great. The latest example was a relatively simple temporal filter, where we need to sample the entire neighborhood, 9 texels, from the current buffer. I was being very smart and decided that in compute we could store these samples in shared memory instead of fetching them again and again, rewrote the shader and… it became 1.5 slower. With no rasterization, with fewer taps, with the same math. Mystery.

Steve posted a video of what we’ve been working on recently =)

We were heavily inspired by this awesome Josh Hobson’s talk and the first test of this technology was based on pretty much a straightforward implementation of what they did on God of War =)

If you google what ” X3676: typed UAV loads are only allowed for single-component 32-bit element types” is, the first links refer to it as to the most annoying dx11 compute error. I’d been lucky and avoided hitting it before but nothing, including luck, lasts forever, so now I kind of see why the search results are like that. Seems that buffers with manual packing/unpacking is the only alternative that works with all formats (typeless r32 and r11g11b10 apparently don’t get along well).

This GBuffer normals encoding deserves more attention than it has:

I did some quick tests and it looks compact, relatively cheap in term of encoding and decoding and (tada!) it’s blendable! 10:10 is not much, of course, but I didn’t notice severe artefacts. Not everything is great — hardware blending support is awesome, for sure, but apparently decals will have to use alpha blending, simply because other blend modes will break the unpacking (we can’t normalize XY correctly). Plus, we need to read the basis bits when rendering decals and I don’t know another way of doing that, except of copying that basis info into a separate texture.

A really good thread on F+ vs deferred. As a guy who’s never worked with deferred rendering professionally (I used it for my pet project) I’m kind of tempted to try it out myself even though I might get disappointed. Deferred (tiled deferred) = “free” buffers for SSAO, SSR; deferred decals. F+ — for me it seems like an easier path to implement even though Angelo Pesce’s opinion is different.

We’re getting more and more decent game or rendering game engines which can be used for free.

Google release a rendering engine, filament — reportedly very well documented:

Xenko has become a community project under permissive MIT license (and it contains an implementation of the Bowyer-Watson tetrahedralization algorithm, I was looking for one to toy with probe based lighting).

I’m dumb. Have a view-projection matrix and points, which are behind the camera. Transform to clip space — and getting positive z values. Dumbly staring at this trying to figure out what’s wrong. And then realize that w values are basically view-space depth and w is negative for those points. Divide negative z by that negative w — voila, we got positive clip space z!