Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wouldn't even try emulating this properly with the standard GPU rasterizer, except as a fallback mode for underpowered systems. It would be fun to try using Image Load/Store and Shader Storage Buffer Objects, though, in OpenGL 4.6. Just bind a framebuffer object with no color buffer and do all your writes using atomic operations to image objects and SSBOs in the fragment shader. The fragment shader interlock extension might be helpful if it's available (note: it's unavailable in Vulkan!) This is similar to how order-independent transparency or voxel rasterization works.

One possibility might be to do two passes: one to build up linked lists of per-fragment data (polygon ID, color, depth, etc.) and a second pass to sort all the linked lists into the proper order and determine a final color. This is the standard order-independent transparency trick.

You could build up tables as well--for instance, you could emulate the "one span per scanline/polygon" behavior by allocating a table of scanlines for each polygon that you fill with the lowest X coordinate for that scanline and discard fragments that don't belong to the triangle contributing the lowest such X coordinate.

I have no idea if this will actually work--if I had to guess I'd put a 50% probability on it not working out at all. The fallback would be a SIMD scanline renderer. The Image Load/Store GPU implementation would be really fun though :)



I was thinking about actually going straight to compute shaders and forgoing the the normal GPU rendering pipeline completely. The quirks outlined in the article are such a big deviation from normal pipeline behavior that I don't know if it is worth trying to use that at all. Performance should be a no-brainer on any halfway modern system anyway.


Well, compute shaders have the drawback that you have to know how many work items (in this case, fragments) you need to dispatch in advance. You can figure this out for triangles and quads, but it's a pain, and it basically involves doing the same thing the rasterization hardware already does in software. It's much simpler and faster to just use the rasterization hardware built in to GPUs to dispatch fragment work groups dynamically via a triangle draw call and only override the sample processing step.


You know that GPUs can dispatch work dynamically themselves to overcome this problem, right? So you totally can have one computation step determine how many instances it requires for the next one.


Yes, it's possible, but why bother when the hardware can rasterize triangles in silicon? :)


Because it can't handle them in the prculiar fashion that was described? And it is worse for quads.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: