If you are new to Ruby, you probably clicked on this article because you may have wondered, "what is a ractor?" We will go into the details of that soon, but, basically, ractors are a new feature that were included with Ruby v3.0 and enable true parallelism within the language. Now, your next question might be, "what is parallelism?" Before we get into the nitty gritty of ractors, let's back up a bit and define a few key terms.
One of the criticisms that you might have heard about Ruby is that it is difficult to scale compared to other languages like Golang, Elixir, and Scala. Why is this? Well, it comes down to two terms that you’ve probably have heard of but may not completely understand: concurrency and parallelism.
Let's begin with concurrency. I really like this definition Wikipedia:
"Concurrency is the coordination and management of independent lines of execution. These executions can be truly parallel or simply be managed by interleaving. They can communicate via shared memory or message passing."
Technically speaking, you can achieve concurrency with Ruby using threads. You can think of a "thread" as a worker or a unit of execution. Every process has at least one thread, and you can create more threads on demand. Threads are used to divide cooperating tasks within a program, whereas processes are used to split up tasks among different programs. Okay, so we have threads that we can utilize to build applications with concurrency in mind. Sounds great, right? Well, there's a bit of a gotcha moment!
The standard Ruby interpreters—Matz's Ruby Interpreter (MRI) and Yet Another Ruby Interpreter (YARV)—utilize a global interpreter lock, which means that only one thread can be executed in the interpreter at any given time. This is not to say that threads are never useful in Ruby. One commonly cited example is if you are making API calls to an external site. You could utilize multiple threads to make more than one request simultaneously and handle responses as they come back vs. having to sit and wait for the external server to respond with a singular call/thread. However, you'd still need to guard against race conditions and deadlocks.
To read more about threads, I recommend this article from The Pragmatic Programmer's Guide to Ruby.
Let's move on to discuss parallelism, which can be defined as
Truly simultaneous execution -- when two tasks run at the same time.
It is important to note how concurrency is NOT the same as parallelism. While concurrent tasks can begin, run, and finish in overlapping time periods, it doesn't necessarily mean that they'll ever both be running at the same instant (i.e., multiple threads on a single core machine). This is in contrast to parallelism, where we have two cores working simultaneously.
Now that we've reviewed some key terminology, we can finally discuss ractors!
A little bit of history
The desire to improve upon Ruby's concurrency model dates back to 2016, when Koichi Sasada (designer of the Ruby Virtual Machine and garbage collection) first conducted a presentation on what was then called "guilds." Guilds enabled multi-threading, where threads in two different guilds could run in parallel, but threads in the same guild could not. Eventually, the concept of a "guild" was renamed "ractor" because the ultimate implementation was very similar to the "Actor" model in other languages such as Erlang and Elixir.
An "Actor" is capable of
- receiving messages,
- responding to the message sender,
- sending messages to other actors,
- creating other actors, and
- performing actions such as mutating data in a database
As is evident, actors communicate via messages and maintain their own private state. The received messages are processed in the first in, first out (FIFO) order, and the message sender is decoupled, which is what provides async communication.
Thus, ractors (Ruby + Actors) were born!
What are ractors?
Ractors provide parallelism without all thread-safety concerns: unlike threads, ractors do not share everything and most objects cannot be shared. However, if/when they are shared, objects are protected through a locking mechanism. You may recall that we also discussed the issue of possible race conditions when utilizing threads. Another benefit of utilizing ractors is that they are unable to access any objects through variables that are not defined within their scope, which protects against those sneaky race conditions.
If you read the official docs, there are additional caveats regarding thread safety. For example, "there are several blocking operations (waiting send, waiting yield, and waiting take) so you can create a program that has dead-lock and live-lock issues."
An example of a ractor—let's look at some code
Creating a ractor is super simple!
r = Ractor.new name: 'my_ractor' do
puts "I just made a ractor!"
end
r.name
# => "my_ractor"
You can also pass a block, like this:
some_numbers = [1,2,3]
Ractor.new some_numbers do |arr|
puts arr.each(&:to_s)
end
# 1
# 2
# 3
You may be thinking, "Okay, this is cool, but what's the point? How can ractors help my code?" To answer this question, let's look at the benchmark example provided in the release notes.
The following code executes the Tak function (more on Wikipedia here) four times sequentially or four times in parallel with ractors.
def tarai(x, y, z) =
x <= y ? y : tarai(tarai(x-1, y, z),
tarai(y-1, z, x),
tarai(z-1, x, y))
require 'benchmark'
Benchmark.bm do |x|
# sequential version
x.report('seq'){ 4.times{ tarai(14, 7, 0) } }
# parallel version
x.report('par'){
4.times.map do
Ractor.new { tarai(14, 7, 0) }
end.each(&:take)
}
end
Here are the results:
Benchmark result:
user system total real
seq 64.560736 0.001101 64.561837 ( 64.562194)
par 66.422010 0.015999 66.438009 ( 16.685797)
Pretty impressive! The parallel version is 3.87× faster than the sequential. Note that this result was measured on Ubuntu 20.04, Intel(R) Core(TM) i7-6700 (four cores, eight hardware threads)
How to send messages using ractors
Ractors communicate via messages—each ractor has an incoming message queue of unending size linked with it. You can visualize how this works by considering how the actual mail works. For example, let's say you want to mail a letter to your grandma. Using code, we could do something like this:
my_ractor = Ractor.new do
msg = Ractor.receive
puts "I received #{msg}"
end
my_ractor.send("Hello") # the 'mailman' puts the message in the mailbox, but it has not been opened yet.
my_ractor.take # mailman takes the actual message
# => I received "Hello"
Here are the takeaways from this example:
- The
.send
method is like the mailman delivering the message, but it is not opened yet - The
.receive
method enables the ractor to open the door and receive the message - The
.take
method enables the mailman to take the response. Note that this return message is an outgoing message and goes to the outgoing port.
Recap
In this article, we discussed the difference between concurrency and parallelism, as well as Ruby's shortcomings prior to Ractors (available in Ruby v3.0+). Moreover, we discussed the motivation for creating Ractors, examined a benchmarking example to show the power of utilizing Ractors, and then discussed the basic syntax for how to create Ractors and utilize them to send and receive messages.
Further reading
To dive deeper, I recommend the following reading:
- Official docs
- This gist provides another benchmark example
- Ruby 3.0 release notes
- A deep dive into concurrency via parallelism on Toptal.