Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Two differences:

First, stackful coroutines use the coroutine stack for everything they do. Stackless coroutines can use the normal thread stack for synchronous calls, and that stack can be shared across any number of coroutines. Per-coroutine allocation is only needed for asynchronous calls.

Second, for stackful coroutines you need to allocate the entire stack up front, and usually you have no way of knowing how much stack might be needed, so you need a conservative upper bound. Normal thread stacks have sizes in megabytes. (That doesn't necessarily correspond to actual memory consumption, since the OS will only reserve physical memory as needed, but the physical reservation for a given stack can only grow, not shrink. And even just allocating the virtual space has a cost.) Most of the time you can get away with stacks that are much smaller, only a few kilobytes, but at the cost of potentially crashing when you've consumed too much stack; it's hard to statically analyze maximum stack usage.

Stackless coroutines will, in general, only allocate memory as needed for each coroutine invocation, so not only are you wasting less memory, you don't have to worry about hitting an arbitrary limit. Allocation elision makes things more complicated since, as the blog post notes, you can end up wasting some memory, but compared to stackful coroutines it's peanuts. But they have the downside that heap allocations and deallocations are expensive; plus, splitting a "stack" of nested calls into separate heap allocations, usually far away from each other in memory, is worse for cache locality.



Technically you could manually grow coroutine stacks the same way the kernel does for thread stacks, by mapping on fault and periodically unmapping everything beyond the red zone. But the complexity would be significant and hard to make it efficient without kernel support.


For a while there was an exciting patch for gcc called split stacks that provided a little thunk for every function -- one normally bypassed, but which stackless coroutines could opt in to call -- that would check if more stack had to be allocated, but I think the story was that Go was the primary potential customer for it and they decided to just give up on the dream :(.


You can use segmented stacks in c++ just fine I think. I believe boost.coroutine supports it. The problem is the additional overhead and the impossibility to link against any non-split stack code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: