Set, uniq, Object comparison
Sun, 07/14/2019 - 20:57

This is the first article of my small series of Node.js performance benchmark tinkering.

Have you ever wondered what is the fastest way to get Array with unique items?

What to keep in mind

I wanted to compare 3 ways to make Array with unique items/values. I’ve used Set(), lodash.uniq, and plain js Object. Array was filled by semi random strings Math.floor(Math.random() * 100) + ‘asdf’.

I’ve tested this only on Node.js v12, and everything is just benchmark, so on production data it can look completely different. Only ops per second performance was tested, nothing else, no memory and so on.

How I tested this

I’ve used benchmark module to run test scenarios. The code snippets benchmarked look like this

// Set
const set = d => () => [...new Set(d)]
// lodash.uniq
const lodash = d => () => uniq(d)
// Object
const object = d => () => {
  const n = {}
  for (const a of d) {
    if (!n[a]) {
      n[a] = true
    }
  }
  return Object.keys(n)
}

First I’ve tried to try this just with one length of an array, like array with 1000 items. Then I figured out, it might be interesting to run it on different lengths and see how performance changes. So in the end I’ve run everything in arrays of length 1 to ~200 000 items (see the gist).

And now, what I found

The graph shows performance comparison to Set as a baseline. How to read this? For 1 item uniq is 2x faster than Set, and for ~100 items in array, Set is 5x faster than uniq and more than 2x faster than Object.

Set, uniq, Object comparison

What strikes me as interesting is that drop in uniq and object performance somewhere between ~10 and ~1000 items. Any learnings from this? If you need to uniq array with primitive values, it’s safe to use Set() and you don’t need to care about the length of the array. When you want to optimize for the size of an array, for small ones [1;~10], use uniq, for middle [~10, ~500] size use Set, and for anything larger use Object.

Some Open questions:

  • How will this behave on different version of Node.js, is the performance comparison the same?
  • What is the impact on memory, what’s the heap size of Set compared to Object?
  • Is there a difference if the array is presorted?

You can find full benchmark here on my gist. And raw results values in google spreadsheet (tab set uniq object).