
Ruby shops a lot of its personal inside information in hash tables.
I’ve began engaged on a brand new version of Ruby Beneath a
Microscope that covers Ruby 3.x. I am engaged on this in my spare time, so it
will take some time to complete. Depart a remark or drop
me a line and I will e mail you when it is completed.
Within the meantime, right here’s an excerpt from a rewrite of Chapter 7 about hash
tables. Vladimir Makarov and the Ruby workforce redesigned Ruby’s hash desk
implementation again in 2015 to make use of open addressing, shortly after I printed
the primary version of RUM. This appeared like a very good place to start out.
Chapter 7: The Hash Desk: The Workhorse Of Ruby Internals
Hash Tables in Ruby | 3 |
Saving a Worth in a Hash Desk | 3 |
Retrieving a Worth from a Hash Desk | 5 |
Experiment 7-1: Retrieving a Worth from Hashes of Various Sizes | 7 |
How Hash Tables Increase to Accommodate Extra Values | 9 |
Hash Collisions and Open Addressing | 9 |
Looking out For an Factor Utilizing Open Addressing | 11 |
Increasing a Hash Desk | 14 |
How Does Ruby Resolve How Many Entries And Bins To Use? | 15 |
Experiment 7-2: Inserting One New Factor into Hashes of Various Sizes | 17 |
Optimization for Small Hashes | 20 |
How Does Ruby Convert A Packed Hash Into A Hash Desk? | 22 |
How Ruby Implements Hash Capabilities | 23 |
Experiment 7-3: Utilizing Objects as Keys in a Hash | 25 |
Abstract | 30 |
Hash Tables in Ruby
Hash tables are a generally used, well-known, age-old idea in laptop
science. They set up values into teams, or bins, primarily based on an integer worth
calculated from every worth — a hash. When it is advisable discover a worth, you’ll be able to
determine which bin it’s in by recalculating its hash worth, thus dashing up
the search.

Each time you write a technique, Ruby creates an entry in a hash desk.
Saving a Worth in a Hash Desk
Determine 7-1 reveals a single hash object and its hash desk.
Determine 7-1: A Ruby hash object with an empty hash desk
On the left is the RHash
(brief for Ruby hash) construction. On the fitting, you see
the hash desk utilized by this hash, represented by the st_table
construction. This C
construction incorporates the fundamental details about the hash desk, together with the
variety of entries saved within the desk and tips to the entries and bins. Every
RHash
construction incorporates a pointer to a corresponding st_table
construction. For
many hashes, Ruby initially creates 32 entries and 64 bins. (Hashes with 8 or
fewer entries work considerably otherwise; see “Optimization for Small Hashes” on
web page 20.) One of the best ways to know how a hash desk works is by stepping
by means of an instance. Suppose I add a brand new key/worth to a hash known as my_hash
:
my_hash[:key] = "worth"
Whereas executing this line of code, Ruby saves the important thing and worth into the primary
entry, as proven in Determine 7-2.
Determine 7-2: A Ruby hash object containing a single worth
Right here you’ll be able to see the brand new key/worth pair inside the primary entry slot, known as an
st_table_entry
in Ruby’s C supply code. Ruby saves the keys and values within the
entries array within the order you add them. This makes it simple for Ruby to return
them again to you in the identical order. Additionally see that Ruby saved an entry index of 0
within the third bin, quantity 2. Ruby did this by taking the given key — on this
instance, the image :key
— and passing it to an inside hash operate that
returns a pseudorandom integer:
some_value = internal_hash_function(:key)
Subsequent, Ruby takes the hash worth — on this instance, some_value
— and calculates the modulus by the variety of bins, which is the rest after dividing by the variety of bins.
some_value % 64 = 2
NOTE: In Determine 7-2, I assume that the precise hash worth for :key divided by 64 leaves a the rest of two. Later on this
chapter, I’ll discover in additional element the hash capabilities that Ruby really makes use of.
We’ll see how utilizing 64 bins (an influence of two) hastens this calculation.
Now let’s add a second factor to the hash:
my_hash[:key2] = "value2"
This time let’s think about that the hash worth of :key2
divided by 64 yields a
the rest of 5.
internal_hash_function(:key2) % 64 = 5
Determine 7-3 reveals that Ruby fills in a second st_table_entry
construction within the
entries array, and the entry index 1 in bin quantity 5, the sixth bin.
Determine 7-3: A Ruby hash object containing two values
Retrieving a Worth from a Hash Desk
The good thing about utilizing a hash desk turns into clear whenever you ask Ruby to retrieve
the worth for a given key. For instance:
p my_hash[:key]
=> "worth"
If Ruby had saved the entire keys and values in an array or linked record, it
must iterate over all the weather in that array or record, on the lookout for
:key. This may take a really very long time, relying on the variety of components. However
utilizing a hash desk, Ruby can leap straight to the important thing it wants to search out by
recalculating the hash worth for that key. To recalculate the hash worth for a
explicit key, Ruby merely calls the hash operate once more: some_value = internal_hash_function(:key)
.
Then, it redivides the hash worth by the variety of bins to get the rest, or
the modulus. some_value % 64 = 2
At this level, Ruby is aware of to look in bin
quantity 2 and finds the entry index 0, and in flip finds the entry with the important thing
of :key
in entry quantity 0 or the primary entry slot. Ruby can later discover the
worth for :key2
by repeating the identical hash calculation
internal_hash_function(:key2) % 64 = 5
.
Determine 7-4: Discovering Values in a Hash Desk
Determine 7-4 explains how Ruby would discover these two values within the hash desk: On
the left aspect, the primary arrow begins from the third bin (bin index 2 =
internal_hash_function(:key) % 64
) and factors to the primary key/worth pair, at
index 0. To the fitting, the second arrow begins from the sixth bin (bin index 5 =
internal_hash_function(:key2) % 64
) and factors to the second key/worth pair, at
index 1.
NOTE: The C library utilized by Ruby to implement hash tables was written within the
Eighties by Peter Moore from the College of California, Berkeley. Later in 2015
Vladimir Makarov rewrote the hash desk code, utilizing a knowledge construction which saves
the entry and bin arrays in contiguous reminiscence segments. By saving all of the
entries and bins close by one another in reminiscence, fashionable CPUs are in a position to cache all
of the info within the hash desk extra effectively, dashing up the general course of.
Yow will discover Makarov’s hash desk code within the C code information st.c and
embrace/ruby/st.h. The entire operate and construction names in that code use the
naming conference st_. The definition of the RHash construction that represents
each Ruby Hash object is within the embrace/ruby/ruby.h file. Together with RHash,
this file incorporates the entire different major object buildings used within the Ruby
supply code: RString, RArray, RValue, and so forth.