stacked rocks
Photo by Jeremy Thomas on Unsplash

From time-to-time I end up wanting to make a nested hash in Ruby, maybe to build up a data object for a complex API or a result for a service object.

Let’s set up an example. Say that I’m a distributor that’s selling books, and I want to create a dump of the monthly sales like so:

{
  <book_id>: {
    <date>: <count>,
  },
}

Assuming I had an model that represented the sales:

# represents 1 sale of 1 book
class Sale
  attr_accessor :book_id
  attr_accessor :date
end

I might do something like this:

def aggregate_by_date(all_sales)
  all_sales.each_with_object({}) do |sale, sales_by_date|
    sales_by_date[book_id] ||= {}
    if sales_by_date[book_id][sale.date]
      sales_by_date[book_id][sale.date] += 1
    else
      sales_by_date[book_id][sale.data] = 1
    end
  end
end

We can clean this up a little bit by providing a default for the inside hash that will initialize a new value to 0. Then we can just always treat it as a number:

def aggregate_by_date(all_sales)
  all_sales.each_with_object({}) do |sale, sales_by_date|
    sales_by_date[book_id] ||= Hash.new(0)
    sales_by_date[book_id][sale.date] += 1
  end
end

Nice. That feels a little bit better. What about the initial hash, can we do something there?

There’s a little trick to making a hash that has a default value of a nested hash:

Hash.new { |hash, key| hash[key] = {} }

The block is called to generate the default value when a new key is accessed. Using this we can simplify our method again:

def aggregate_by_date(all_sales)
  all_sales.each_with_object(Hash.new { |h,k| h[k] = Hash.new(0) }) do |sale, sales_by_date|
    sales_by_date[book_id][sale.date] += 1
  end
end

That’s a little dense for me, maybe I’d extract a method:

def aggregate_by_date(all_sales)
  all_sales.each_with_object(nested_hash) do |sale, sales_by_date|
    sales_by_date[book_id][sale.date] += 1
  end
end

private def nested_hash
  Hash.new { |h,k| h[k] = Hash.new(0) }
end

Cool.


But what happens if our interface is even more complicated. Let’s say we have multiple stores now and want to also show which store sold the book on which date.

{
  <book_id>: {
    <date>: {
      <store_id>: <count>,
    },
  },
}

This changes our model to something like:

# represents 1 sale of 1 book
class Sale
  attr_accessor :book_id
  attr_accessor :date
  attr_accessor :store_id
end

And it might change our brute force algorithm to look like:

def aggregate_by_date(all_sales)
  all_sales.each_with_object({}) do |sale, sales_by_date|
    sales_by_date[book_id] ||= {}
    sales_by_date[book_id][sale.date] ||= Hash.new(0)
    sales_by_date[book_id][sale.date][store_id] += 1
  end
end

So. Many. Hashes.

Let’s try something a bit out there, what if we could create a Hash that had a default of a Hash that had a default of Hash that… (and turtles all the way down). With a little bit of recursion we can set this up:

def nested_hash
  Hash.new { |h, k| h[k] = nested_hash }
end

The initial instantiation of the hash doesn’t have infinite loop problems because the block is only run when key is accessed.

Let’s see what that does to our algorithm:

def aggregate_by_date(all_sales)
  all_sales.each_with_object(nested_hash) do |sale, sales_by_date|
    if sales_by_date[book_id][sale.date][store_id].blank?
      sales_by_date[book_id][sale.date][store_id] = 0
    end

    sales_by_date[book_id][sale.date][store_id] += 1
  end
end

def nested_hash
  Hash.new { |h, k| h[k] = nested_hash }
end

We have to bring back our initial case for 0 now, but we don’t have to mess with the nested hashes.


Overall, I think this is an interesting solution, but I’m not sure I’d use it in practice. I think there’s a little bit too much hidden complexity vs the straight-forward solution of initializing every step explicitly.

Shout out to David Bai for helping me work through this thought exercise!