You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:
1. C++20 coros are stackless, in the general case every async "function call" heap allocates.
2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.
3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all
> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly
I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.
(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)
Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.
That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.
You can do a lot of horrible things with setjmp and friends. I actually implemented some exception throw/catch macros using them (which did work) for a compiler that didn't support real C++ exceptions. Thank god we never used them in production code.
This would be about 32 years ago - I don't like thinking about that ...
> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.
Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.
(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)
Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.
Not necessarily. A coroutine encapsulates the entire state machine, which might pe a PITA to implement otherwise. Say, if I have a stateful network connection, that requires initialization and periodic encryption secret renewal, a coroutine implementation would be much slimmer than that of a state machine with explicit states.
Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.
Much of the original motivation for async was for single threaded event loops. Node and Python, for example. In C# it was partly motivated by the way Windows handles a "UI thread": if you're using the native Windows controls, you can only do so from one thread. There's quite a bit of machinery in there (ConfigureAwait) to control whether your async routine is run on the UI thread or on a different worker pool thread.
In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.
Thankfully they are actively working towards upgrading, Unity 6.8 (they're currently on 6.4) is supposed to move fully towards CoreCLR, and removing Mono. We'll then finally be able to move to C# 14 (from C# 9, which came out in 2020), as well as use newer .NET functionality.
One annoying piece of Unity's CoreCLR plan is there is no plan to upgrade IL2CPP (Unity's AOT compiler) to use a better garbage collector. It will continue to use Boehm GC, which is so much worse for games.
Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.
It's ancient. The latest version of Unity only partially supports C# 9. We're up to C# 14 now. But that's just the language version. The Mono runtime is only equivalent to .NET Framework 4.8 so all of the standard library improvements since .NET (Core) are missing. Not directly related to age but it's performance is also significantly worse than .NET. And Unity's garbage collector is worse than the default one in Mono.
Oh I totally missed this, thanks! I was overly confident they wouldn't have bothered, given how long it was taking. The last time I used Unity was 2022.3, which was apparently the last version without Awaitable.
As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.
That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.
(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)
In all of my years of professional game dev, I can verify that this is not even remotely true. They're used basically everywhere. They're very common when you need something to update for a set period of time but managing the state outside a very local context would just make the code a mess.
Unity's own documentation for changing scenes uses coroutines
Echoing the thoughts of the only current sibling comment: lots of "serious" developers (way to gatekeep here) definitely use coroutines, when they make sense. As mentioned, it's one of the best ways to have something update each frame for a short period of time, then neatly go away when it's not needed anymore. Very often, the tiny performance hit you take is completely outweighed by the maintanability/convenience.
...and then crash when the object it was using gets deleted while it's still running, but there is no way to track down and stop all the co-routines holding on to it.
Unity coroutines are a huge pain in the ass, and a lazy way to sloppily do things that are perfectly fine to do without them, using conventional portable programming techniques that make it possible to prevent edge conditions where things fall through the crack and get forgotten like "fire and forget" coroutine foot-guns.
More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.
It can easily and often does lead to messy rube goldberg machines.
There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.
This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.
I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.
> There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
Sounds interesting. If it's not too much of an effort, could you dig up a reference?
You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly. It's a matter of saving a few registers and switching the stack pointer, minicoro [1] is a pretty good C library that does it. I like this model a lot more than C++20 coroutines:
1. C++20 coros are stackless, in the general case every async "function call" heap allocates.
2. If you do your own stackful coroutines, every function can suspend/resume, you don't have to deal with colored functions.
3. (opinion) C++20 coros are very tasteless and "C++-design-commitee pilled". They're very hard to understand, implement, require the STL, they're very heavy in debug builds and you'll end up with template hell to do something as simple as Promise.all
[1] https://github.com/edubart/minicoro
> You can roll stackful coroutines in C++ (or C) with 50-ish lines of Assembly
I'm not normally keen to "well actually" people with the C standard, but .. if you're writing in assembly, you're not writing in C. And the obvious consequence is that it stops being portable. Minicoro only supports three architectures. Granted, those are the three most popular ones, but other architectures exist.
(just double checked and it doesn't do Windows/ARM, for example. Not that I'm expecting Microsoft to ship full conformance for C++23 any time soon, but they have at least some of it)
Boost has stackful coroutines. They also used to be in posix (makecontext).
Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly. But maybe not.
That's the problem with register machines, I guess. Interestingly enough, BCPL, its main implementation being a p-code interpreter of sorts, has pretty trivially supported coroutines in its "standard" library since the late seventies — as you say, all you need to save is the current stack pointer and the code pointer.
You can do a lot of horrible things with setjmp and friends. I actually implemented some exception throw/catch macros using them (which did work) for a compiler that didn't support real C++ exceptions. Thank god we never used them in production code.
This would be about 32 years ago - I don't like thinking about that ...
> Hmm. I'm fairly certain that most of that assembly code for saving/restoring registers can be replaced with setjmp/longjmp, and only control transfer itself would require actual assembly.
Actually you don't even need setjmp/longjmp. I've used a library (embedded environment) called protothreads (plain C) that abused the preprocessor to implement stackful coroutines.
(Defined a macro that used the __LINE__ macro coupled with another macro that used a switch statement to ensure that calling the function again made it resume from where the last YIELD macro was encountered)
Wouldnt that be stackless (shared stack)
setjmp + longjump + sigaltstack is indeed the old trick.
Simon Tatham, author of Putty, has quite a detailed blog post [0] on using the C++20's coroutine system. And yep, it's a lot to do on your own, C++26 really ought to give us some pre-built templates/patterns/scaffolds.
[0] https://web.archive.org/web/20260105235513/https://www.chiar...
Coroutines is just a way to write continuations in an imperative style and with more overhead.
I never understood the value. Just use lambdas/callbacks.
Not necessarily. A coroutine encapsulates the entire state machine, which might pe a PITA to implement otherwise. Say, if I have a stateful network connection, that requires initialization and periodic encryption secret renewal, a coroutine implementation would be much slimmer than that of a state machine with explicit states.
Not an expert in game development, but I'd say the issue with C++ coroutines (and 'colored' async functions in general) is that the whole call stack must be written to support that. From a practical perspective, that must in turn be backed by a multithreaded event loop to be useful, which is very difficult to write performantly and correctly. Hence, most people end up using coroutines with something like boost::asio, but you can do that only if your repo allows a 'kitchen sink' library like Boost in the first place.
Much of the original motivation for async was for single threaded event loops. Node and Python, for example. In C# it was partly motivated by the way Windows handles a "UI thread": if you're using the native Windows controls, you can only do so from one thread. There's quite a bit of machinery in there (ConfigureAwait) to control whether your async routine is run on the UI thread or on a different worker pool thread.
In a Unity context, the engine provides the main loop and the developer is writing behaviors for game entities.
ASIO is also available outside of boost! https://github.com/chriskohlhoff/asio
For anyone wondering; this isn't a hack, that's the same library, just as good, just without boost dependencies.
Always jarring to see how Unity is stuck on an ancient version of C#. The use of IEnumerable as a "generator" mechanic is quite a good hack though.
>The use of IEnumerable as a "generator" mechanic is quite a good hack though.
Is that a hack? Is that not just exactly what IEnumerable and IEnumerator were built to do?
Thankfully they are actively working towards upgrading, Unity 6.8 (they're currently on 6.4) is supposed to move fully towards CoreCLR, and removing Mono. We'll then finally be able to move to C# 14 (from C# 9, which came out in 2020), as well as use newer .NET functionality.
https://discussions.unity.com/t/coreclr-scripting-and-ecs-st...
One annoying piece of Unity's CoreCLR plan is there is no plan to upgrade IL2CPP (Unity's AOT compiler) to use a better garbage collector. It will continue to use Boehm GC, which is so much worse for games.
Not that ancient, they just haven't bothered to update their coroutine mechanism to async/await. The Stride engine does it with their own scheduler, for example.
Edit: Nevermind, they eventually bothered.
It's ancient. The latest version of Unity only partially supports C# 9. We're up to C# 14 now. But that's just the language version. The Mono runtime is only equivalent to .NET Framework 4.8 so all of the standard library improvements since .NET (Core) are missing. Not directly related to age but it's performance is also significantly worse than .NET. And Unity's garbage collector is worse than the default one in Mono.
Unity has async too [1]. It's just that in a rare display of sanity they chose to not deprecate the IEnumerator stuff.
[1] https://docs.unity3d.com/6000.3/Documentation/ScriptReferenc...
Oh I totally missed this, thanks! I was overly confident they wouldn't have bothered, given how long it was taking. The last time I used Unity was 2022.3, which was apparently the last version without Awaitable.
Unity is currently on C# 9 and that IEnumerable trick is no longer needed in new codebases. async is properly supported.
IIRC generators and co-routines are equivalent in a sense that you can implement one with the other.
Not too different from C++'s iterator interface for generators, I guess.
As the author lays out, the thing that made coroutines click for me was the isomorphism with state machine-driven control flow.
That’s similar to most of what makes C++ tick: There’s no deep magic, it’s “just” type-checked syntactic sugar for code patterns you could already implement in C.
(Occurs to me that the exceptions to this … like exceptions, overloads, and context-dependent lookup … are where C++ has struggled to manage its own complexity.)
If you need to implement an async state machine, couldn't that just as easily be done with std::future? How do coroutines make this cleaner/better?
No serious devs even uses Unity coroutines. Terrible control flow and perf. Fine for small projects on PC.
In all of my years of professional game dev, I can verify that this is not even remotely true. They're used basically everywhere. They're very common when you need something to update for a set period of time but managing the state outside a very local context would just make the code a mess.
Unity's own documentation for changing scenes uses coroutines
Echoing the thoughts of the only current sibling comment: lots of "serious" developers (way to gatekeep here) definitely use coroutines, when they make sense. As mentioned, it's one of the best ways to have something update each frame for a short period of time, then neatly go away when it's not needed anymore. Very often, the tiny performance hit you take is completely outweighed by the maintanability/convenience.
...and then crash when the object it was using gets deleted while it's still running, but there is no way to track down and stop all the co-routines holding on to it.
Unity coroutines are a huge pain in the ass, and a lazy way to sloppily do things that are perfectly fine to do without them, using conventional portable programming techniques that make it possible to prevent edge conditions where things fall through the crack and get forgotten like "fire and forget" coroutine foot-guns.
Just out of interest, how many serious unity devs have you talked to?
In Haskell this technique has been called ‘reinversion of control’: http://blog.sigfpe.com/2011/10/quick-and-dirty-reinversion-o...
More broadly the dimension of time is always a problem in gamedev, where you're partially inching everything forward each frame and having to keep it all coherent across them.
It can easily and often does lead to messy rube goldberg machines.
There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
This is more evident in games/simulations but the same problem arises more or less in any software: batch jobs and DAGs, distributed systems and transactions, etc.
This what Rich Hickey (Clojure author) has termed “place oriented programming”, when the focus is mutating memory addresses and having to synchronize everything, but failing to model time as a first class concept.
I’m not aware of any general purpose programming language that successfully models time explicitly, Verilog might be the closest to that.
This timing additions to a language is also at the core of imperative synchronous programming languages like Este rel, Céu or Blech.
> There was a game AI talk a while back, I forget the name unfortunately, but as I recall the guy was pointing out this friction and suggesting additions we could make at the programming language level to better support that kind of time spanning logic.
Sounds interesting. If it's not too much of an effort, could you dig up a reference?