A Few Words About Async

yoric.github.io

64 points by vinhnx 15 hours ago

That was “quite a few” words. I wish the author had taken more time with Elixir/Erlang.

Languages like rust/python that use lots of reserved keywords, especially for control flow seem to have reached for that arrow to solve the “event loop” problem as described.

In BEAM languages, that very event loop stays front and center, you don’t have this awkward entanglement between reserved keywords and event loop. If you want another chunk of thing to happen later because of an event, you just arrange for an event of that nature to be delivered. No callbacks. No async coloring. Just events. The solution to the event problem is to double down and make your event loop more generally usable.

evnc 2 hours ago

Interesting. Does that mean if you want to say, make an asynchronous http request, you do something like “fire_event(HttpRequestEvent(…))” which returns immediately, and somewhere else define a handler like “on_event(HttpResponseEvent, function (event) { … })” ? So you kind of have to manually break your function up into a state machine composed of event handlers? How do you associate a given HttpResponseEvent with a specific HttpRequestEvent?
- koakuma-chan 2 minutes ago
  
  Isn't that just callback

vlovich123 11 hours ago

> In practice, things are a bit more complicated. In fact, I don’t know of any async/await embedding on top of io_uring in any language yet, because it doesn’t quite match this model. But generally, that’s the idea.

Glommio and monoio are async runtimes in rust on top of io_uring and Tokio has an optional io_uring backend. Does that not count? This is such a well researched article that this kind of statement makes me think I’m missing something - surprising the author would get this wrong.

koakuma-chan 9 hours ago

As far as I know those libraries only implement basic things. They don't use registered buffers, registered file descriptors, etc, and don't implement advanced features like chained operations.
- ozgrakkurt 6 hours ago
  
  They are async libraries built on io-uring though. Other mainstream async libraries also don’t go as deep as possible on epoll or other things either afaik

mont_tag an hour ago

Do you want to take advantage of having multiple cores?

* Processes do this right out of the box. * Threads only do this on Python's new GIL builds. * Async, not so much.

valcron1000 9 hours ago

> async/await is also available in a bunch of other languages, including F#, C#8, Haskell[...]

Haskell (GHC) does not provide async/await but uses a green thread model.

kaoD 8 hours ago

Aren't green threads and async-await orthogonal concepts?
As I understand it async-await is syntax sugar to write a state machine for cooperative multitasking. Green "threads" are threads implemented in user code that might or might not use OS threads. E.g.:
- You can use Rust tokio::task (green threads) with a manually coded Future with no async-await sugar, which might or might not be parallelized depending on the Tokio runtime it's running on.
- ...or with a Future returned by an async block, which allows async-await syntax.
- You can have a Future created by an async function call and poll it manually from an OS thread.
- Node has async-await syntax to express concurrency but it has no parallelism at all since it is single-threaded. I think no green threads either (neither parallel or not) since Promises are stackless?
Is this a new usage of the term I don't know about? What does it mean? Or did I misinterpret the "but"?
As a non-Haskeller I guess it doesn't need explicit async-await syntax because there might be some way to express the same concept with monads?
- valcron1000 6 minutes ago
  
  You don't need "monads" (in plural) since GHC provides a runtime where threads are not 1:1 OS threads but rather are managed at the user level, similar to what you have in Go. You can implement async/await as a library though [1]
  [1] https://www.cambridge.org/core/services/aop-cambridge-core/c...
LtWorf 9 hours ago

How are green threads implemented?
- whatevaa 4 hours ago
  
  A runtime with it's own scheduling. Something rust doesn't want to require.

unscaled 10 hours ago

This is a pretty in depth overview of a complex topic, which unfortunately most people tends to dumb down considerably. Commonly cited articles such as "What Color is Your Function?" or Revisiting Coroutines by the de Moura and Ierusalimschy are insightful, but they tend to pick on a a subset of the properties that make up this complex topic of concurrency. Misguided commentators on HN often recommends these articles as reviews, but they are not reviews and you are guaranteed to learn all the wrong lessons if you approach them this way.

This article looks like a real review. I only have one concern with it: It oversells M:N concurrency with green threads over async/await. If I understand correctly, it claims that async/await (as implemented by Rust, Python C# and Kotlin - not JavaScript) is less efficient (both in terms of RAM and CPU) than M:N concurrency using green threads. The main advantages it has is that No GC is required, C library calls carry no extra cost and the cost of using async functions is always explicit. This makes async/await great for a systems language like Rust, but it also pushes a hidden claim that Python, C# and Kotlin all made a mistake by choosing async/await. It's a more nuanced approach than what people take by incorrectly reading the articles I mentioned above, but I think it's still misguided. I might also be reading this incorrectly, but then I think the article is just not being clear enough about the issues of cost.

To put it shortly: Both green threads and async/await are significantly costlier than single-threaded code, but their cost manifests in different ways. With async/await the cost mostly manifests at "suspension points" (whenever you're writing "await"), which are very explicit. With green threads, the cost is spread everywhere. The CPU cost of green threads includes not only the wrapping C library calls (which is mentioned), but also the cost of resizing or segmenting the stack (since we cannot juts preallocate a 1MiB stack for each coroutine). Go started out with segmented stacks and moved on to allocating a new small stack (2KiB IIRC) for each new goroutine and copying it to a new stack every time it needs to grow[1]. That mechanism alone carries its own overhead.

The other issue that is mentioned with regards to async/await but is portrayed as "resolved" for green threads is memory efficiency, but this couldn't be farther from the truth: when it's implemented as a state machine, async/await is always more efficient than green threads. Async/await allocates memory on every suspension, but it only saves the state that needs to be saved for this suspension (as an oversimplification we can say it only saves the variables already allocated on the stack). Green threads, on the other hand, always allocate extra space on the stack, so there would always be some overhead. Don't get me wrong here: green threads with dynamic stacks are considerably cheaper than real threads and you can comfortably run hundreds of thousands of them on a single machine. But async/await state machines are even cheaper.

I also have a few other nitpicks (maybe these issues come from the languages this article focuses on, mainly Go, Python, Rust and JavaScript)

- If I understand correctly, the article claims async/await doesn't suffer from "multi-threading risks". This is mostly true in Rust, Python with GIL and JavaScript, for different reasons that have more to do with each language than async/await: JavaScript is single-threaded, Python (by default) has a GIL, and Rust doesn't let you have write non-thread-safe code even if you're using plain old threads. But that's not the case with C# or Kotlin: you still need to be careful with async/await in these languages just as you would be when writing goroutines in Go. On the other hand, if you write Lua coroutines (which are equivalent to Goroutines in Go), you can safely ignore synchronization unless you have a shared memory value that needs to be updated across suspension points.

- Most green thread implementations would block the host thread completely if you call a blocking function from a non-blocking coroutine. Go is an outlier even among the languages that employ green threads, since it supports full preemption of long-running goroutines (even if no C library code is called). But even Go only added full support for preemption with Go 1.14. I'm not quite since when long-running Cgo function calls have been preemptible, but this still shows that Go is doing its own thing here. If you have to use green threads on another language like Lua or Erlang, you shouldn't expect this behavior.

[1] https://blog.cloudflare.com/how-stacks-are-handled-in-go/

zozbot234 6 hours ago

> But that's not the case with C# or Kotlin: you still need to be careful with async/await in these languages just as you would be when writing goroutines in Go.
C# and Kotlin are safe from data races; Go is not. If you do not explicitly synchronize in C#/Kotlin you may see torn writes and other anomalies, but these will not directly impact safety unlike in Go.

jiggunjer 12 hours ago

Recently had to familiarize myself with python async because a third party SDK relies on it.

In many cases the lib will rely on threads to handle calls to synchronous functions, got me wondering if there's a valid use case for running multiple async threads on a single core.

conradludgate 9 hours ago

I frequently use single threaded async runtimes in Rust. Particularly if it's background processing that doesn't need to be particularly high throughput.
Eg in a user application you might have the performance sensitive work (eg rendering) which needs to be highly parallel - give it a bunch of threads. However when drawing the UI, handing user input, etc you usually don't need high throughput - use only 1 thread to minimise the impact on the rendering threads
In my work with server side code, I use multiple async runtimes. One runtime is multithreaded and handles all the real traffic. One runtime is singlethreaded and handles management operations such as dispatching metrics and logs or garbage collecting our caches
electroglyph 7 hours ago

i would say: probably not
if your async thread is so busy that you need another one, then it's probably not an async workload to begin with.
i work on a python app which uses threads and async, but only have one async thread because it's more than enough to handle all the async work i throw at it.

sema4hacker 12 hours ago

> I’ll try and write a followup with benchmarks.

That would definitely keep the story from being all-hat-and-no-cattle. I can't recall reading something with so many alternate versions of how to implement something but with zero benchmarks.

anonymoushn 8 hours ago

it is frustrating that the post opens by describing latency and then saying that it is called throughput.

littlestymaar 8 hours ago

In a single-task setting (the situation described in the intro) throughput and latency are just the inverse of one another (in the mathematical sense of “inverse”: throughput = nb task per seconds = 1/time taken to process the task = 1/latency).
They only diverge when you consider multiple tasks.
- derriz 7 hours ago
  
  That’s not the way “latency” is commonly used in my experience.
  Latency numbers always include queuing time - so the measures are not related or derivable from each other.
  A process might have a throughput of 1 million jobs per second but if the average size of the queue is 10 million then your job latency is going to be 10 seconds on average and not 1 microsecond.

ballpug 6 hours ago

raw synchronisation costs, 2-5µs during each context-switch to replace registers, pointers, interrupt handlers, etc.

for python syntax to enumerate the fibonacci sequence:

#fibonacci(n - 1) + fibonacci(n - 2)

Which computes event.arg

maxtin_ivin 13 hours ago

[dead]

neonsunset 14 hours ago

Thank you for the article. I noticed the statement

> A second drawback is that async/await has a performance cost. CPU-bound code written with async/await will simply never be as fast or as memory-efficient as the equivalent synchronous code.

If you are interested, .NET is actively improving at this and .NET 11 will ship with "Runtime Async" which replaces explicitly generated state machines with runtime suspension mechanism. It's not """zero-cost""" for now (for example it can block object escape analysis), and the async calling convention is different to sync, but the cost is massively reduced, the calls can be inlined, optimized away, devirtualized and more in the same way standard sync calls can. There will be few drawbacks to using async at that point, save for the syntax noise and poor default habit in .NET to append Async suffix to such methods. In your own code you can write it tersely however.

As for Rust, it also can optimize it quite well, the "call-level overhead" is much less of a problem there, although I have not studied compiler output for async Rust in detail so hopefully someone with more familiarity can weight in.