Friday, January 8, 2010

Thoughts on Actors

This discussion (started today) on the scala mailing list relates to understanding the usefulness of Akka, and more generally, actors.

Somebody suggested comparing using actors to using locks directly. The following are my comments, intially meant as a response, but I ended summarizing many of my current concerns/questions regarding the actor programming model.

This contrast between actors and low-level concurrent programming (e.g. locks) is misleading. It's not like that there are actors, then a huge void, and then locks, where we get to choose an extreme. There are tons of things in-between. For example, message passing in a single VM is trivial to implement on top of BlockingQueues (or, soon, TransferQueues). There already exists the executor framework, and the fork/join framework, to provide thread pools and fine-grained parallelism.

My take is that actors provide a simplified, more elegant programming model than using the underlying tools directly. At their core, typical actors are a Runnable accompanied with a BlockingQueue (mailbox), while reactors are event listeners. A strong point of actors, as the Haller/Odersky paper shows, is that they unify thread-based and event-based models - one can use and combine either under the same framework. This programming model is still young and requires exploration to find its best use cases and fully appreciate it. As much as anything, this too needs an "Effective Actors" type of book. It is easy to go wrong too, especially for beginners trying to wrap their heads around MPI-like programming. Deadlocks are still possible (actors waiting forever for messages that will not come), race conditions are still possible (an actor giving up on waiting a reply, right before the actual reply arrives), it's not like the usual suspects of concurrent programming have magically vanished. (Edit: Probably I'm wrong to classify the last case as a race condition, it's likely just a data race, following the nomeclature of JCiP).

Moreover, the simplification has its cost too - it's not easy, at least for me, to reason about performance implications. For example, assuming scala actors that depend on ForkJoinScheduler (i.e. using the fork/join framework), this quotation from the javadocs of ForkJoinPool is interesting:

A ForkJoinPool may be constructed with a given parallelism level (target pool size), which it attempts to maintain by dynamically adding, suspending, or resuming threads, even if some tasks are waiting to join others. However, no such adjustments are performed in the face of blocked IO or other unmanaged synchronization.

This leads to some obvious questions which I can't answer easily at all:

  1. What are the (performance) implications of using (blocking) IO in actors? (I haven't seen similar warnings given to actors users).
  2. Noting that tasks are never joined, all receive() blocking calls fall under "unmanaged synchronization" as per the javadoc, so what are the implications of this fact?

So, simplification also seems to come at the cost of hiding possible important optimizations, like having a thread that needs to block in order to join() subtasks, to go and execute other tasks while waiting (via helpJoin()).

I'm not sure what the conclusion should be. Hopefully in 3-4 years collective experience will be substantial and we will better understand how these shiny new tools are best used, and when the underlying concurrency utilities should be used instead. Personally, as of now, while I am eager to experiment with actors, I feel more at home with more low-level tools, so I can more easily reason about the performance characteristics of my code. Hopefully someone will submit to the task of writing a good scala actors book - current books are OK, but Scala is new, so they are devoted to Scala mostly, and perhaps have a chapter on actors, which is too little to go anywhere beyond the very basics.