A few notes on message passing

javajosh · on March 22, 2021

>Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build.

I do so wish there was a reference or link around "tried in the past".

ramchip · on March 22, 2021

There’s a few papers about it such as:

- “Heap architectures for concurrent languages using message passing”

- “Exploring Alternative Memory Architectures for Erlang: Implementation and Performance Evaluation”

- “A Case for the Unified Heap Approach to Erlang Memory Management”

You can find those and more discussion by googling shared heap erlang.

dnautics · on March 22, 2021

So this isn't entirely true; binaries above a certain size have their own recounting, process independent GC going on in the BEAM; I wonder why the VM doesn't use a similar system for highly mobile data structures (copy to a common refcounted "scratchpad" on first send).

knome · on March 22, 2021

binaries never contain references to anything else, and therefore can never be a part of a reference loop. so it's safe to toss them out into a shared space and use reference counting to track them. you don't want to do it with small binaries, as the overhead of shared-allocating/ locking/ incrementing/ locking/ decrementing would probably be worse than just coping the small binary in the first place.

if you put complex reference bearing data out in the void, now whatever memory arena that allocated it may contain data that it references, and now you have to keep track of those references and use all of the shared items as new roots into your data. and if some other memory arena receives it and plucks out a value, do you only then copy it or do you reference back to the original. if you send a message back and form a loop, the gc process for either now has to jump through both, and you've destroyed your cheap gc cycles. if you shunt all of the data into shared memory to avoid multi-gc reference loops, you've now just created a big shared gc that will have all the problems of big shared gcs.

the per-process small gcs with full copying between them mean you can do things like just dropping the arena when the process dies without even checking liveness for anything in it ( you'll need to run any destructors if anything needs cleanup ( like references to shared large binaries needing dereferencing ), but you can keep track of items that need that in a bitmap or something, and avoiding the trace is a win on its own )

KMag · on March 22, 2021

> shared-allocating/ locking/ incrementing/ locking/ decrementing

Minor nit: shared allocation, counter incrementing, and counter decrementing can all be done lock-free. They'd still need memory fence operations (and retries in case of contention), and the associated performance hits, but not actual locking.

_0w8t · on March 22, 2021

Lwn.net just published an article where a comment https://lwn.net/Articles/849239/ described how there were no really lock-free data structures on modern CPUs:

A couple of decades of writing concurrent algorithms has taught me that scalability is really defined by the frequency the concurrent algorithm accesses/updates shared data, not whether the software algorithm is considered "lockless" or not.

KMag · on March 22, 2021

Nobody is claiming that memory fences are free or that livelock isn't possible with lockfree algorithms. (Except in corner cases, such as carefully constructed RISC-V code where the architectural specification does guarantee progress in short tight ll-sc loops with proper instruction alignment.)

The LWN comment mentions spinlocks to improve worst-case performance of some lockfree algorithms, but that criticism doesn't apply to atomic increments / decrements on multi-issue CPUs. The latency of contended memory operations would completely hide the overhead of the add/subtract and make the atomic add/subtract equivalent to a spinlock. (Also, the LWN comment doesn't apply where the algorithm can be written within guaranteed progress corner-cases of the architecture, where applicable.)

toast0 · on March 22, 2021

It would seem off heap messages could be a base for this. I've not gone all the way into the weeds on those, but it seems like it could be possible to start with sending a whole message to another process by bumping a refcount and adding a reference to that process's mailbox without needing to make a new copy.

If that works, and is useful, you could extend by bumping the refcount, and adding a new type of subreference to a process's mailbox.

I don't think you'd want to send something referencing multiple off-heap messages or adding some terms and a reference (although maybe, who knows), because that would be a lot of complexity. And you would also want to be sensitive to subreference memory leaks, like can happen with binaries where a binary is large, but only small subreferences stick around, leading to lots of wasted space (this is extra fun when you put a small subbinary into ets/mnesia and it lives forever (well until you restart beam, but that can be a long time).

strmpnk · on March 22, 2021

This is sort of what Pony did with its ORCA based GC. It has some tricky edge cases to optimize in practice but it can be made to work especially well if data is immutable.

strmpnk · on March 22, 2021

The analog for that would be shared literals which are not copied (super important for small map overhead reduction if the constructor initialized a fix set of keys like elixir structs as the key index is shared). The persistent_term module allows any value to be registered as such at runtime and avoids a lot of caveats that came with mochi_global and the like. While they aren’t ref counted it can definitely help cut down on copying in my experience.

moonchild · on March 22, 2021

Linear types (as an implementation detail) seem helpful here; if escape analysis shows a message is never touched after being sent, you can move it into the queue instead of copying it.

joppy · on March 22, 2021

Would this help? My understanding is that the copy was so that the new object exists in the heap of the receiving process, and therefore the GC can have a strong guarantee that all objects within a process belong to the GC for that process.

Whether moving an object between two GC’d heaps is even possible (let alone simple or complicated) surely depends a lot on the GC implementation. The implementation of concurrent GCs is not easy...

dominicl · on March 22, 2021

A definitive reference would be great, for me the "tried in the past" quote has always been a pointer to the Java VM. Which has great benefits of it's own but always had the problem of global GC induced freezes. Working with Cassandra instances this has always been a problem since 2013 till today. It just seems to be a very hard problem to solve when object references flow everywhere. But it's really unpleasant when your whole DB instance with all it's parallel requests locks up for GC.

[0] https://stackoverflow.com/questions/21992943/persistent-gc-i...

secondcoming · on March 22, 2021

Try out Scylla instead. We moved to it from Cassandra and have had no stability issues at all. Our use case is several 100k requests per second with occasional batch writes. Request latency reduced, as did the number of nodes we required.

alexhutcheson · on March 22, 2021

DataStax had good results testing the new Shenandoah garbage collector with Cassandra 4.0: https://www.datastax.com/blog/apache-cassandra-benchmarking-...

srijan4 · on March 22, 2021

There's also an old discussion on this here: http://erlang.2086793.n4.nabble.com/Why-are-messages-between...

_vvhw · on March 22, 2021

Just an aside, but message passing also makes a huge difference if you want to implement a distributed consensus protocol such as Viewstamped Replication or Raft.

Compared to typical request/response connection-oriented protocols (e.g. TCP or HTTP), message passing maps 1-to-1 to the distributed consensus domain, where the client can send a request message to the leader and then fully expect to get the response message back from another subsequent leader (after a view change or leader election).

I ran into this while implementing Viewstamped Replication in Zig for https://github.com/coilhq/tigerbeetle, where we had a typical TCP request/response protocol to begin with, and then realized that a message bus would make things much easier. It's just really nice to be able to say, send this message to process A or process B (if you can) and not worry about much else beyond timeouts and retries, which are needed for distributed consensus anyway, and straight away it opens up the door for alternative underlying transport protocols such as QUIC or UDT, multicast etc. Message passing is just a really good abstraction to have in place above all this in general.

choeger · on March 22, 2021

Out of curiosity: What happens with messages that are never received? Do they just pile up until the (not-so) receiving process OOMs? How does one debug such a program?

srijan4 · on March 22, 2021

Yes, messages keep piling up and memory usage keeps increasing until the Erlang VM goes out of memory.

To debug, there's a max_heap_size parameter which can be configured to log an error message when heap size of any process goes beyond a threshold, or it can even kill the process[0]. Although I don't know if this takes the "off-heap" messages into account.

Also, there are lots of libraries which can provide a "top" like view of the running system.

[0] - http://erlang.2086793.n4.nabble.com/Max-heap-size-td4716974....

bullen · on March 22, 2021

I think erlangs strength is also it's weakness: structuring everything around messages instead of having complex memory model and concurrency.

I always post these quotes until someone comes across that can explain them, so far 2 years later nobody have been able to:

"While I'm on the topic of concurrency I should mention my far too brief chat with Doug Lea. He commented that multi-threaded Java these days far outperforms C, due to the memory management and a garbage collector. If I recall correctly he said "only 12 times faster than C means you haven't started optimizing"." - Martin Fowler https://martinfowler.com/bliki/OOPSLA2005.html

"Many lock-free structures offer atomic-free read paths, notably concurrent containers in garbage collected languages, such as ConcurrentHashMap in Java. Languages without garbage collection have fewer straightforward options, mostly because safe memory reclamation is a hard problem..." - Travis Downs https://travisdowns.github.io/blog/2020/07/06/concurrency-co...

yetihehe · on March 22, 2021

And erlang outperforms multi-threaded java single-handedly. I've made a system in java, which caps at about 6k threads, server starts to slow to a crawl. Similar system in erlang (newer version of server doing essentially the same tasks) is currently also servicing about 6k connections, but cpu utilisation is about 10%. Yeah, we could rewrite that old system to use some async library, but that's not multi threaded, just scheduled (like in erlang).

bullen · on March 22, 2021

You need non-blocking IO, if you have more threads than cores you are doing it wrong.

You cannot build a MMO server with erlang.

toast0 · on March 22, 2021

People were doing in 2009 and posting it here [1] (project link is dead though).

The massively multiplayer online part would be clearly in favor of Erlang; although the RPG part maybe not so much. It would depend in my mind on how you model the environment and the game objects in it and their interactions. I think there would need to be too much communication to model game objects as processes, you'd need to make a simulation zone a process, and the game objects data for the simulation processes, and immutability makes that not a whole lot of fun probably.

https://news.ycombinator.com/item?id=981597

yetihehe · on March 22, 2021

Game objects can be simple state machines. Also, you have ets for "mutable" storage.

yetihehe · on March 22, 2021

Why not? I have such server in plans, but that won't be soon (due to lack of time). If you have only as many threads as cores, that's not very MULTI threading, that's avoiding threads because of performance reasons.

bullen · on March 22, 2021

I forgot to mention _Action_ MMO server... you can probably make a slow paced MMO with Erlang...

JetAlone · on March 21, 2021

"Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process. Non-copying implementations have been tried in the past, but they turned out to be a bad fit as low latency is more important than sheer throughput for the kind of soft-realtime systems that Erlang is designed to build."

So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message... I write a lot of higher level code than this and would be prone to overlook these kinds of details, but I want to get better at noticing performance gotchas. The architecture of the compiler and target assembly would have to be key to that.

bcrosby95 · on March 22, 2021

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

At what point do you decide to make a copy? What if you pass a subset of the data to process C, which stores a subset of that data, then later passes it to process D? If you don't copy the data at any point in this series of steps, you need full blown cross-process GC which is counter to their goal.

dnautics · on March 22, 2021

> So the alternative to copying the message would be to use a cross-process reference in memory that GC has to take the extra step of looking at two processes per message

To be frank, erlang already does this with binaries over some size limit (I think 64B)

amelius · on March 22, 2021

> Messages are always copied before being inserted into the queue. As wasteful as this may sound it greatly reduces garbage collection (GC) latency as the GC never has to look beyond a single process.

This means there is no structural sharing, which can be troublesome with huge data structures. (Edit: not entirely true, see comments below)

srijan4 · on March 22, 2021

The only sharing is for large binaries (>64 bytes)[0].

Like all messaging mechanisms, this can be handled by passing a reference or id, and the receiver fetching the actual large data from db/file/source of truth.

[0] - http://erlang.2086793.n4.nabble.com/how-message-size-effects...

amelius · on March 22, 2021

> The only sharing is for large binaries

Can you also share small objects by reference? (I.e. prevent serialization of a large tree composed of small objects)

And refer to those objects through pointers (avoiding an expensive db layer)?

srijan4 · on March 22, 2021

> Can you also share small objects by reference?

No.

But, you can use a cheap cache layer instead of expensive db layer - ETS.

amelius · on March 22, 2021

Ok, thanks!

chrisseaton · on March 21, 2021

The big problem nobody talks about with actors and message passing is non-determinism and state.

If two otherwise independent processes A and B are sending messages to a single process C, it's non-deterministic which arrives at C first. This can create a race condition. C may respond differently depending on which messages arrives first - A's or B's. This is also effectively an example of shared mutable state - the state of C, which can be mutated whenever a message is received - is shared with A and B because they're communicating with it and their messages work differently based on the current state.

Non-determinism, race-conditions, shared mutable state: the absolute opposite of what we want when dealing with concurrency and parallelism.

dnautics · on March 22, 2021

Yeah but the way you're framing the problem is a bit crazy.

If such a race condition is critical, then making A and B "otherwise independent" is a poorly designed architecture (and, yes, novices will do this, which is why there is a big admonition in the Elixir docs: https://hexdocs.pm/elixir/master/GenServer.html#module-when-...). Maybe they should all be branching logic or state machine in the same actor and you shouldn't be coordinating with message passing?

If for some reason A and B TRULY need to be asynchronous and independent (let's say they are tracking real-world things), AND order matters, then no amount of language or VM architecture will get you out of building a transactional lock, consensus system, or CRDT to guarantee sane results. That is just how the universe works.

toast0 · on March 21, 2021

Often, the order of messages doesn't matter that much. When it does matter, there are a couple patterns to consider.

a) be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time). C's mailbox enforces an ordering on requests (all calls will be handled in the order they are received, not in the order they are sent)

b) rather than A sending to B and C, and B also sends to C, send only from A to B and from B to C. B sends along A's information. Downside is if B crashes before sending to C, A's information is lost. You could also have A send to C, and C replies to B, which then does work and sends back to C.

c) when multiple processes are working on a single thing, try to combine them. A single process will find it hard to race with itself, and will have a clear order of events. I'm in the middle of cleaning up some code that had three processes around one thing (a state process, a reader and a writer), and there were so many weird and racy things because of the separation; putting it into one process means things can't change unexpectedly.

yellowapple · on March 22, 2021

> be sure to give C whole requests, don't ask C for current data, modify it in A and then send it back to C (which is begging for B to send a modification in the mean time)

A simple mitigation against these sorts of race conditions would be for C to maintain a version number alongside its data, then have any update requests send back that version number. If the message's version matches the current, then do the update and change the version number; else, reject the message with a reply directing the sender to reobtain the current data and try again. This way, if both A and B grab data from C and then simultaneously try to update that data in C, you won't lose any data.

This is (relevantly to Erlang) exactly how CouchDB works.

catlifeonmars · on March 22, 2021

I believe this is called “optimistic locking”

ProfHewitt · on March 22, 2021

Indeterminacy (using arbiters) is crucial to the performance of digital computation. (Arbiters are used in the interconnect of many-core chips.)

See the following for an explanation:

"Physical Indeterminacy in Digital Computation" https://papers.ssrn.com/abstract=3459566

napsterbr · on March 22, 2021

I'm sorry to go completely off-topic here, but I just wanted to say thank you! Your discussion[0] with Meijer and Szyperski was mind-blowing and taught me a lot about the actor model. This was, hands down, one of the best classes I've ever had. Thanks for sharing this knowledge.

[0] - https://www.youtube.com/watch?v=7erJ1DV_Tlo

ProfHewitt · on March 22, 2021

Thanks for your interest!

Cheers, Carl

chrisseaton · on March 22, 2021

You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

j-pb · on March 22, 2021

The guy you just responded to invented the Actor model.

I'd guess he knows...

Besides, such a fork join model might be theoretically free of timing and coordination in a system with infinite resources (bus widths, no cache coherence, an infinitely fast coordinator e.t.c.) but that's not what silicon looks like.

The nice thing about the Actor model is exactly that it immediately exposes you to non-determinism, distributed state, and in Erlangs case, failure.

It's what allows Erlang systems to "effortlessly" (you had to put the effort upfront) span many node clusters with arbitrary reliability in the face of failures.

Somewhere else you write:

> The problem is this corner case opens up all the fundamental issues I mention. When we include it we've got a big non-deterministic, racey, stateful, problem.

That's not caused by the Actor model, but by the fundamental laws governing distributed systems. Distributed systems are inherently, non-deterministic, racey and statefull.

Two generals dictates that you can't solve these issues, unless you're on a single writer system (which your scatter-gather algorithm is an example of, in which you will always be limited by the throughput of the process that organises your scatter-gather operations / your control plane). In which case, you might as well ignore all forms of concurrency and do one thing at a time, which will be faster anyways.

chrisseaton · on March 22, 2021

If I ask an undergraduate to implement a simple parallel merge sort using fork-join I know they cannot get it wrong - because they cannot receive results in an order they were not expecting. They cannot possibly introduce a race condition under any circumstances. If they try to implement it using Erlang-style actors and message passing, they can get it wrong and expect the half they sub-sorted first to respond with the result first. I know this because I've seen it in undergraduates for real. Worst of all - it can work every time they test it but then fail in production, so it's extremely hard to explain and teach them to do it correctly.

j-pb · on March 22, 2021

The kind of parallelism relevant to Merge Sort does not really relate to what message passing and particularly Erlang (the topic of this thread) are about.

Erlang is about distributed coordination. It's all about concurrency, and parallelism is a means to the end of increased reliability. Merge Sort is data parallel SIMD with a single coordinator.

It's like the difference between multi factory industrial food production (distributed), and you handling multiple pots and pans on the same stove (parallel).

You'd never use message passing for that, because it has way too much overhead in and of itself. And if you had more data than could fit a single machine, you'd choose a different algorithm anyways.

It's not that hard to find a simpler implementation for a problem that's not really distributed by nature.

You're also missing the huge learning opportunity for undergrads here. It's totally possible and actually easy to implement merge sort with message passing (although nobody would do it in the real world), and it provides a good lesson.

The first way to implement something is often not the correct one.

The naive but wrong implementation is to send messages with each value to be sorted. So the mailboxes act like a list. However since there's no ordering guarantee, the issues you mentioned will crop up.

Since messages are our unit of coordination, the naively fixed version simply sends the entire array to its parent node which then merges the arrays it received.

However there is also the possibility to send a tuple containing the index in the subarray as well as the value, which allows the parent to buffer the value and wait for the gaps to be filled. Which teaches you something about TCP as well.

Seriously. Why only ask your undergrads questions, that they can't get wrong? Seems a waste of everybody's time. You should rather teach them about the tradeoffs between different concurrency and parallelism models. SIMD for data parallel high perf stuff. Actor model, for distributed coordination and reliability. Dataflow for FPGA programming. There's no one fits all model.

alpaca128 · on March 22, 2021

"Undergraduates have trouble learning it" does not indicate how viable a technology is in the real world, especially once you have parameters approaching the extremes of what's possible.

You can write something using fork-join that works correctly in theory, but expecting it to always work while running on undeterministic hardware is exactly the kind of blind optimism that causes so much software to come down crashing if you look at it wrong.

chrisseaton · on March 22, 2021

> You can write something using fork-join that works correctly in theory, but expecting it to always work while running on undeterministic hardware is exactly the kind of blind optimism that causes so much software to come down crashing if you look at it wrong.

This seems backwards to me - a fork-join task has no state and can access no other task's state. You can restart it if it fails as many times as you want and the result is always exactly the same. You can't say the same with running an actor twice or sending a message twice - you may get a different result the second time!

africanboy · on March 22, 2021

it's very easy to order the responses once you received them, it's a different problem from the original one, parallel sorting doesn't depend on the order, every chunk is ordered independently and returned to the caller, unless you mean sorting the list in place (mutating the list) which is not supported in Erlang

dnautics · on March 22, 2021

> You can program a completely deterministic parallel system using a simple fork-join model, can't you? No physical or timing factors interfere with that, can they?

How simple is "simple" What happens if process 1 crashes? What if it suffers a network disconnection and looks like it crashes, but then tries to reconnect later? What is the correct logic for completing the combined task?

africanboy · on March 22, 2021

probably you haven't realized it, but the person you are replying to is Carl Hewitt.

https://en.m.wikipedia.org/wiki/Carl_Hewitt

ProfHewitt · on March 22, 2021

Thanks!

Wikipedia article is way out of date :-(

There is an up-to-date blog here with links to current articles:

https://professorhewitt.blogspot.com/

africanboy · on March 24, 2021

it's me that have to thank you Professor for your work!

I was at Code Mesh in 2018 in London to attend your "ULTRACONCURRENCY IS THE FUTURE OF PROGRAMMING" presentation

Meeting you and Joe Armstrong in the same place has been one of the greatest pleasure of my professional life.

dragonwriter · on March 21, 2021

> The big problem nobody talks about with actors and message passing is non-determinism and state.

People talk about that all the time with actors and message passing, since the actor model is largely a tool for structuring the management of those things (not usually for eliminating them, because for many problem domains those are inherent to the domain.)

westurner · on March 22, 2021

> Luckily, global orders are rarely needed and are easy to impose yourself (outside distributed cases): just let all involved parties synchronize with a common process.

When there are multiple agents/actors in a distributed system, and the timestamp resolution is datetime64, and clock synchronization and network latency are variable, and non-centralized resilience is necessary to eliminate single points of failure, global ordering is impractical to impossible because there is no natural unique key with which to impose a [partial] preorder [1][2]: there are key collisions when you try and merge the streams.

Just don't cross the streams.

[1] https://en.wikipedia.org/wiki/Preorder_(disambiguation)

[2] https://en.wikipedia.org/wiki/Partially_ordered_set

The C in CAP theorem is for Consistency [3][4]. Sequential consistency is elusive because something probably has to block/lock somewhere unless you've optimally distributed the components of the CFG control flow graph.

[3] https://en.wikipedia.org/wiki/Consistency_model

[4] https://en.wikipedia.org/wiki/CAP_theorem

FWIU, TLA+ can help find such issues. [5]

[5] https://en.wikipedia.org/wiki/TLA%2B

codebje · on March 22, 2021

Wouldn't a partial order be possible using logical clocks? Vector clocks probably don't satisfy "practical" but a hybrid logical clock is practical if you can assume clock synchronisation is within some modest delta.

toast0 · on March 22, 2021

> if you can assume clock synchronisation is within some modest delta.

From experience with clocks, you should really be actively testing the delta, and not just assuming it.

codebje · on March 22, 2021

The assumption would be the limit you actively test for.

westurner · on March 23, 2021

The Lamport timestamp: https://en.wikipedia.org/wiki/Lamport_timestamp :

> The Lamport timestamp algorithm is a simple logical clock algorithm used to determine the order of events in a distributed computer system. As different nodes or processes will typically not be perfectly synchronized, this algorithm is used to provide a partial ordering of events with minimal overhead, and conceptually provide a starting point for the more advanced vector clock method.

bmitc · on March 22, 2021

> The big problem nobody talks about with actors and message passing is non-determinism and state.

I have written a lot of actor-based systems, and I’m pretty sure this is heavily talked and thought about.

The race condition situation you described is only a problem for people not used to concurrency and actors. It is bad form to have your actor depend on the order of messages it receives if it is intended to receive messages from multiple actors. So the situation you’ve described is simply bad actor programming.

Often, a good way to program actors is to have them be finite state machines.

Also, your shared mutable state reference does not meet any definition I have ever seen of “mutable”. In actor systems, yes, actors can receive the state of other actors, but that state is a copy. The internal state of an actor is not allowed to be mutated by an external process, typically.

mrkeen · on March 22, 2021

> your shared mutable state reference does not meet any definition I have ever seen of "mutable"

Of course it is!

> The internal state of an actor is not allowed to be mutated by an external process

Having A tell C to mutate its state instead of A just mutating C's state is just like using a 'private' keyword in another language - insofar as mutation is concerned. Race conditions do not care if you ran:

    a() {
       kindlyInformCToIncreaseXWheneverYouGetThisMessage();
    }

versus

    a() {
      c.x++;
    }

It's still a mutation.

srijan4 · on March 22, 2021

But, the internal state is still changed by the owner process, in the owner process' context. The external process cannot change it's state behind it's back.

Effectively, mutations to the internal state are "serialized" by the owner process, so it's easier to handle possible race conditions without using explicit locks/semaphores.

hnedeotes · on March 22, 2021

This shows exactly the problem but perhaps not in the context you're thinking of.

One of the biggest strengths in processes in Erlang is that they serialise access, this serialisation though isn't simply that they serialise or are written in a way of "run this once". It's more like processes are (or can be used) basically state machines. This pattern is pervasive and you can find it in the OTP library, the most common construct (although there's proper finite state machine OTP abstractions) is the gen_server. The gen_server, is a generic server interface that uses message passing. It can be described as the simplest of agents that has a baked in generic interface.

Where your sample falls short is that, when you code the interface for a gen_server I'm (or can, usually will) codify any needed relationship between the state a process is in and what it can do with the "messages" it receives according to the state it is in. This allows me to completely forgo of any concern about how someone will use it without having to explicitly define handling of cases I didn't intend.

If I have (pseudo code but completely codifiable):

- when msg and msg == "get_status" and state == "receiving" -> reply, function_that_calculates_response(state), new_state = make_new_state(state)

- when msg and state == "not_available" -> reply unavailable, new_state = state, continue_to: "recheck_assumptions"

- when "recheck_assumptions" and state -> noreply, new_state = check_assumptions(state)

- when msg, state -> reply bad_request, new_state = state

Now I don't have to worry about any nook, escape hatch or other such in the language, that someone will rewrite the state of this process from another thread, or function call or whatever. Nowhere will a process be able to change the state of this process from outside the process meaning I can clearly define the states it can assume. I also don't need to write overly defensive code, or find out that in some given situation something changed the state outside of what I explicitly coded.

In the previous example this means that if two processes (running in different schedulers, nodes, or wtv) both send a msg, the mental model of the receiving gen_server will be a consistent mailbox from which it picks msgs to process one at a time. If one of the msgs would force the gen_server to change its state, and another asks for the status, it won't happen that both of them can have their conditions evaluated at the same time, but the states differing.

So if msg_1 changes the state, msg_2 will be interpreted on this new state. It can't happen that msg_1 is being processed, changes the state in some intermediate step, while msg_2 is interpreted in the same exact conditions as msg_1 (unless msg_1 leaves the state as it was). This is not true for some mutable code, you can have two functions from different threads or whatever, call a function on X concurrently, and the evaluations for these calls being done based on the same state, while then fun1 and fun2 operates on the state. Perhaps fun2 assumed that it was allowed because of being in a given state, but while doing its thing that assumed that, the state had been changed by fun1 that was running at the same time.

It also means, that this process can do whatever it wants, including spawning new processes and whatever, to do their thing in different contexts but its state can only be mutated by itself. This then forces those things (or me when writing code) to explicitly define their interface and if I want to rely on the result of those things changing the state, I need to explicitly message pass it back, for which I will have the previously defined interfaces.

This then when coupled with supervision, links or monitors can make for extremely resilient processes even when they encode complex flows between multiple parts in the presence of possible unknowns.

Can you still bork it? Yes, but the area is much smaller and I guess after a while of working with it it's a very simple model that once understood is always the same. Are there areas where this is not the right away to do it and mutable is right? Probably yes, people point out performance issues but I think there's also some overstating about it, in most cases it seems it can be solved, but I can believe there are possible issues there so not a silver bullet.

Sorry for the long winded post.

zimpenfish · on March 21, 2021

> The big problem nobody talks about with actors and message passing is non-determinism and state.

Seems to be mentioned in the article, with an example: > If more than one process sends signals to a common process, they can arrive in any order even when you “know” that one of the signals was sent first.

chrisseaton · on March 21, 2021

Right - that's what triggered my comment - but I think it's a bigger footgun than the article does. The problem is this corner case opens up all the fundamental issues I mention. When we include it we've got a big non-deterministic, racey, stateful, problem.

uwiger · on April 3, 2021

Experience shows that it really isn't. The fundamental issues of non-deterministic ordering are inherent in the domain that Erlang was designed for, and the VM characteristics pretty closely reflect that. It turns out that people have comfortably written complex and scalable commercial systems in Erlang for decades, and many have noted that for people who understand the problem domain, the Erlang model feels intuitive.

The problem of message order non-determinisim is real, but for orchestration logic in scenarios with multiple uncoordinated message sources, it's inescapable. What is important is that the programming environment helps you deal with it, and doesn't introduce additional - "accidental" - complexity.

I discussed this topic at QCon 2010, and ran through some demos to illustrate where many commercial projects actually DO go wrong, and why the textbook Erlang model keeps you safe.

https://www.infoq.com/presentations/Death-by-Accidental-Comp...

The key guarantees that Erlang gives you here are: * Messages from A to B will all arrive (if they arrive) in the order in which they were sent, and you can detect if there is an interruption * A receiving process can decide in which order messages are processed.

It turns out that people intuitively understand that if Alice and Bob each send me a message, independently of each other, I cannot know which message reaches me first. This mirrors a fundamental real-world constraint, and in a distributed processing environment, there really isn't much you can do about it.

There are other concurrency-related problem domains, some for which the Erlang implementation, at least, is not ideal.

jerf · on March 22, 2021

In practice, I don't think it is for real Erlang programs. If you need total order, route everything that needs that order through one process, and you have it. For instance, you can use that to impose total order on messages in a chat room. As long as "one chat room" is small enough for that to be practical even at your largest size, it works.

If that doesn't work, don't use Erlang. No sarcasm. The process model is nice but does preclude certain high-performance patterns.

ramy_d · on March 21, 2021

Erlang has a bunch of tools to help enforce determinism or at least suss out concurrency issues. For example: you can specify multiple receive statements to process your queued messages in a specific order, there is also a process manager (pman:start()) that allows you to explore concurrency related events, and not to mention Mutex Semaphores as a pattern to mitigate those issues.

There doesn't seem to be a real solution to deadlocks. I got the oreilly book in front of me and it's basically saying: be careful, but don't worry.

And I don't think erlang has a shared mutable states. In fact a google search for "erlang shared mutable state" has you as the first result.

notamy · on March 21, 2021

You have ETS and Mnesia that are technically shared mutable state, and people have built Raft / Paxos / DeltaCRDT / etc. which also provide some form of it -- albeit a safe one -- but while it does exist, actually sharing the state between PIDs is a very rare thing ime. I don't have super in-depth experience in the Erlang ecosystem, so take this with a pinch -- or handful! -- of salt ofc.

toast0 · on March 22, 2021

ETS is isomorphic with a data table process that processes get and set (and etc) messages, but happens to be quite a bit faster.

If ETS is shared mutable state, so is a process with state that accepts changes, as chrisseaton posits. Of course, shared mutable state with clear boundaries between mutations is a lot better than shared memory where everyone can write and read things in bits and pieces.

ramy_d · on March 22, 2021

No I agree, there's ways of getting a global shared mutable state, but they are explicit and not a structure of the language itself IMO.

dragonwriter · on March 22, 2021

> And I don't think erlang has a shared mutable states.

Arguably (well, of you ignore the process dictionary, which everyone seems to) Erlang lacks local mutable state, but has shared mutable state through actors.

jerf · on March 22, 2021

I feel like this whole thread could stand to have people more clearly define the way they are using the term. I don't know of any "immutable" system that literally never changes or something. If Erlang is "mutable" in t sense, what system isn't?

I've seen such ideas tossed around, but I don't think I've ever seen an implementation.

chrisseaton · on March 21, 2021

> And I don't think erlang has a shared mutable states

An actor has state. It's mutable - each message can modify it. It's shared - two process working with a shared actor share that state. To me, that's shared mutable state. It causes classic race conditions.

(I know this is an unpopular opinion, and somewhat deliberately contrary and provocative, but ultimately truthful.)

dnautics · on March 22, 2021

While this is technically "shared state" (in the same way that a "database", or "the console/log output" is "shared state"), it's not a problem in the classical data race sense. And because you're forced to think about it in an explicit fashion, it's generally not a problem for the working programmer in the BEAM vm.

Also note that erlang is not a pure actor system. It is relatively hard to accidentally write a deadlock in erlang with OTP. In about 3 years of programming elixir I've only written a deadlock three or four times (and caught them all in tests). Even so if I had pushed it to prod, it would have timed out certain operations that would have triggered dynamic restart of the relevant processes.

ramy_d · on March 22, 2021

  two process working with a shared actor share that state

I think this is where I disagree. They may share an actor but the state of the actor belongs to the actor. To say they share it implies they can always change it independently but that's not the case. Actors don't trample on each other's internal states. For example a simple mutex semaphore pattern fixes your race condition.

lmm · on March 22, 2021

> They may share an actor but the state of the actor belongs to the actor. To say they share it implies they can always change it independently but that's not the case. Actors don't trample on each other's internal states. For example a simple mutex semaphore pattern fixes your race condition.

The actor's state is shared in the sense that both processes can change it in ways that are visible to each other. The actor can control access to it, but unless the actor never changes state in response to messages (in which case why bother with actors at all?) there's still something that changes. Constrained state is still state.

ProfHewitt · on March 22, 2021

For many practical applications, deterministic parallelism is hundreds of times slower than indeterminate computation.

For a proof, see the following article:

"Epistemology Cyberattacks" https://papers.ssrn.com/abstract=3603021

jiehong · on March 22, 2021

[flagged]

ProfHewitt · on March 22, 2021

Current auto generators cannot yet deal with such esoteric content. So it must be real ;-)

BlackFly · on March 22, 2021

Suppose something goes on auction on a website. Two clients on opposite sides of the globe launch bids at the same time. Of course the response depends on which bid arrives at the auction server first. This is exactly what you want when dealing with concurrency and parallelism: one of those two clients will be assigned the bid at that price and the auction will continue.

The same thing happens in real life. You cannot solve this problem by using a shared memory model: it isn't inherent to the actor model, it is inherent to the reality of some situations.

I have seen some people avoid referring to scenarios like this as race condtions, but in general there are some situations where you generally need to assign a winner (you have a serial actor decide or you use a lock). It could be real (like the auction example) or it could be bad design where a system has been architected to be unnecessarily concurrent and events that are in reality serial are fired off in parallel and the network (or even the thread scheduler or something else) reorders them. Try to avoid creating unnecessary races, learn to identify real races.

africanboy · on March 22, 2021

it's true, but that's a problem every other implementation suffer from.

synchronization is a basic requirement in concurrency.

in my experience the actor model implemented in Erlang provides simpler solutions to the problem as highlighted in this comment https://news.ycombinator.com/item?id=26535230

for example if B sends a message to C in response to a message from A, it could send back a specific response to A that will contact C on its behalf and avoid the non determinism

the fact that the actor model provides strong guarantees about messaging order between two actors, makes it easier to reason about it and avoid the pitfalls associated with it

it is also a well known problem

citing the Pony lang FAQ

So for example, if actors X and Y send to actor Z, there is no guarantee what order the messages will be processed by Z except that we can say, all messages sent by X will be processed in the order they were sent, as will all messages sent by Y. However, messages from X and Y can arrive at Z interleaved in any order that maintains the causal ordering of X to Z and Y to Z.

tpetry · on March 22, 2021

This behaviour is intended in erlang and it‘s very good. You don‘t start to develop the application with wrong concepts. Your ordering from different producers will work on a single machine, but the second you want to upgrade to a distributed system everything will fail. In erlang you start designing everything with a distributed system in mind.