We Rubyists love our hashes. But hashes have a few well known flaws. As Richard pointed out in Hashie Considered Harmful, they can be TOO flexible at times - a quick typo and you can assign and reference keys that you never intended.
a = { type: "F150" }
a[:typo] # nil
Some Common Hash Alternatives
If you're using hashes to store actual structured data, you may decide that you don't really need flexibility. That it's only going to get you into trouble.
There are a few alternatives. Imagine that you need to store a pair of XY coordinates. One approach might be to define a class, whose only job is to hold a pair of XY coordinates.
class PointClass # I don't recommend ending class names with Class :)
attr_accessor :x, :y
def initialize(args)
@x = args.fetch(:x)
@y = args.fetch(:y)
end
end
point_class = PointClass.new(x: 1, y: 2)
point_class.x # 1
Since in this case we only need to encapsulate data, a more concise choice might be to use a Struct. Here's what that looks like:
PointStruct = Struct.new(:x, :y)
point_struct = PointStruct.new(1, 2)
point_struct.x # 1
A third option might be to use OpenStruct. OpenStruct looks kind of like a struct, but lets you set arbitrary values like a hash. Here's an example:
point_os = OpenStruct.new(x: 1, y: 2)
point_os.x # 1
Performance Implications
[UPDATE 7/10/2015: It appears that my benchmarking script was unfair to hashes. As Patrick Helm pointed out, I was using an inefficient method of initializing them. So please disregard the results for hashes. Though my main point about openstruct being super slow is still valid. You can see his changes to my benchmark script here]
Looking at these four options, I began to wonder what the performance implications were. It's pretty obvious that any of these options is fast enough if you're only dealing with a little bit of data. But if you have thousands or millions of items to process, then the performance impact of a hash vs OpenStruct vs struct vs class could begin to matter.
At Honeybadger, we have thousands of exceptions being reported to our API each second, so understanding performance implications like this is always on our minds.
So, I wrote a simple benchmark script. I like to use the benchmark-ips gem for experiments like this because it automatically figures out a good sample size, and reports standard deviation.
Initialization
When I benchmarked initialization times for PointClass, PointStruct, Hash, and OpenStruct I found that PointClass and PointStruct were the clear winners. They were about 10x faster than OpenStruct, and about 2x faster than the hash.
PointClass and PointStruct were nearly 10x faster than OpenStruct
These results make sense. Structs are the simplest, so they're fastest. OpenStruct is the most complex (it's a wrapper for Hash) so it's the slowest. However the magnitude of the difference in speed is kind of surprising.
After running this experiment, I'd be really hesitant to use OpenStruct in any code where speed is a concern. And I'll be casting a wary eye at any hashes that I see in performance-critical code.
Read / Write
Unlike initialization, all four options are roughly the same when it comes to setting and accessing values.
Reading and writing benchmarks show no huge difference between Struct, class, hash and OpenStruct
The Benchmarking Script
If you'd like to run the benchmark on your own system, you can use the script below. I ran it on MRI 2.1 on OSX. If you're curious about performance on other ruby interpreters, Michael Cohen has created an awesome gist with results for MRI 2.2, JRuby and others.
require 'benchmark/ips'
require 'ostruct'
data = { x: 100, y: 200 }
PointStruct = Struct.new(:x, :y)
class PointClass
attr_accessor :x, :y
def initialize(args)
@x = args.fetch(:x)
@y = args.fetch(:y)
end
end
puts "\n\nINITIALIZATION =========="
Benchmark.ips do |x|
x.report("PointStruct") { PointStruct.new(100, 200) }
x.report("PointClass") { PointClass.new(data) }
x.report("Hash") { Hash.new.merge(data) }
x.report("OpenStruct") { OpenStruct.new(data) }
end
puts "\n\nREAD =========="
point_struct = PointStruct.new(100, 200)
point_class = PointClass.new(data)
point_hash = Hash.new.merge(data)
point_open_struct = OpenStruct.new(data)
Benchmark.ips do |x|
x.report("PointStruct") { point_struct.x }
x.report("PointClass") { point_class.x }
x.report("Hash") { point_hash.fetch(:x) }
x.report("OpenStruct") { point_open_struct.x }
end
puts "\n\nWRITE =========="
Benchmark.ips do |x|
x.report("PointStruct") { point_struct.x = 1 }
x.report("PointClass") { point_class.x = 1 }
x.report("Hash") { point_hash[:x] = 1 }
x.report("OpenStruct") { point_open_struct.x = 1 }
end