Performance of Javascript Array Ops

Miguel Albrecht
4 min readJan 20, 2019

Medium served me recently this article by Samantha Ming on various approaches in ES6 to deduplicate an array. Removing duplicates from an array is an operation typical for how the underlying engine deals with transversing large data sets. Benchmarking a solution is good advice when the task requires handling large lists. Sometime ago I had to solve a similar problem with a large array and was intrigued to learn how my solution compared with the new wits.

An hour later, the results in: I was stunned. And probably so will you.

The bechmarks are in milliseconds for an array of 7 million members. But, let’s take one thing at a time.

Samantha explained three methods:

Using .filter()

In a nutshell, use the array.filter() to iterate through the array and select only the first occurrence of an element in the array.

This is the least intuitive of all, but wins the performance price on all platforms.

Using Set()

This approach uses the newly available Set object in ES6. Sets have the property by definition to have no duplicated members. Good fit for the task. We just need to convert the Set back into an array which is accomplished with the spread operator.

This is the definitely the most beautiful code but only Node.js pays due respect. Safari must have a very bad implementation of Set() to deliver such a bad result.

Using .reduce()

Another array function introduced in ES6. array.reduce() allows us to iterate with an aggregator that collects, in this case array elements that are not yet in the aggregator array. Clever solution.

This solution is not really intuitive either and brings bad performance accross the board.

But I had my own (old) solutions as well and had to swallow hard to see how badly performant they are. Especially my favourite: use associative arrays.

Associative arrays

This is an old school solution inspired by java collections. The idea is simple: use the strings in the array as keys in an associative array. Duplicate strings simply overwrite previous entries resulting in a clean collection without duplicates. We then use Object.keys() to return an array of the keys.

This solution has the worst performance in all platforms. And I thought it was clever coding. Damn 😰

I also tried out another solution I will call the brute force approach.

Using a loop and .includes()

In this approach we push() into a new array only those elements that are not already there.

Second worst performance. Chakka!

UPDATE

Some readers commented that the choice of the test data could have a big impact in the results. This is true. The test array I came up with in the beginning reflected a scenario where the data has many duplicates. I finally had some time to look at this and decided to run the same tests with a complementary scenario, namely hardly any or no duplicates at all. This is the worst case for an algorithm since it has to provide the maximum effort.

Here are the results

Here the absolute winner is useSet(). My useAssociative() rates second, ha!

So, there you have it. If you have to deduplicate an array use the Set method!

The code is available at GitHub. Claps and comments are welcome.

This story is published in Noteworthy, where 10,000+ readers come every day to learn about the people & ideas shaping the products we love.

Follow our publication to see more product & design stories featured by the Journal team.

--

--

Miguel Albrecht

Scientist by training, creative spirit by choice. web: zapalote.com instagram: @zapalotee