I wouldn't even try emulating this properly with the standard GPU rasterizer, ex...

gmueckl · on Oct 28, 2018

I was thinking about actually going straight to compute shaders and forgoing the the normal GPU rendering pipeline completely. The quirks outlined in the article are such a big deviation from normal pipeline behavior that I don't know if it is worth trying to use that at all. Performance should be a no-brainer on any halfway modern system anyway.

pcwalton · on Oct 28, 2018

Well, compute shaders have the drawback that you have to know how many work items (in this case, fragments) you need to dispatch in advance. You can figure this out for triangles and quads, but it's a pain, and it basically involves doing the same thing the rasterization hardware already does in software. It's much simpler and faster to just use the rasterization hardware built in to GPUs to dispatch fragment work groups dynamically via a triangle draw call and only override the sample processing step.

gmueckl · on Oct 28, 2018

You know that GPUs can dispatch work dynamically themselves to overcome this problem, right? So you totally can have one computation step determine how many instances it requires for the next one.

pcwalton · on Oct 29, 2018

Yes, it's possible, but why bother when the hardware can rasterize triangles in silicon? :)

gmueckl · on Oct 29, 2018

Because it can't handle them in the prculiar fashion that was described? And it is worse for quads.