Faster Test Execution Using MongoDB in Our Backend Services

by Georg Gadinger, Backend Developer

We backend developers at Runtastic make heavy use of automated testing using frameworks like rspec to ensure that a (simple) change will not break anything else.
Sometimes tests take forever to run for different reasons. This post explains how I realized faster test execution using MongoDB for a service from about 23 minutes to just 2 minutes.

On my Mid-2015 MacBook Pro, running all the tests for one of our backend services took quite a long time:

% time bundle exec rspec
(rspec output omitted)
Finished in 22 minutes 42 seconds (files took 10.57 seconds to load)
4818 examples, 0 failures
bundle exec rspec  124.18s user 7.20s system 9% cpu 22:54.03 total

That didn’t bother me too much because most of the time, I would just push the code to the git server so that our Jenkins would run the tests for me. At some point, however, I got tired of running the tests that way and also waiting for all the tests to pass locally. I knew that some other developers could run the tests much faster on their machines, which have almost the same power as mine. Therefore, I started to investigate why the tests were taking so long.

I noticed that the rspec process only used 9% of my precious CPU, which made me suspect that the process was waiting for I/O most of the time. For further investigation, I used a tool that comes with every XCode installation: Instruments. Under the hood, it uses DTrace, which is a very powerful dynamic tracing framework. Originally created by Sun Microsystems for Solaris, it has been ported to many other Unices such as FreeBSD and macOS.
By using Instruments, I found out that mongod makes a lot of unlink(2) calls (this is the Unix syscall for deleting a file) and also a lot of writes inside the MongoDB data directory:

In several of our backend services, we are using MongoDB for storing data. When we run rspec, a test helper is loaded which cleans the databases before each test to ensure a clean state of the database. The MongoDB is cleaned by recreating the collection, including the indexes.
Since version 3.2 MongoDB uses the wiredTiger storage engine as default, it promises to be much faster and reliable than the previous default, which is true in production. This is, however, not the case when you are constantly destroying and creating the indexes.

There are also some other storage engines available for MongoDB, such as mmapv1 and ephemeralForTest.  mmapv1 was the previous default engine, and ephemeralForTest is an in-memory storage engine which should be used only for testing.

Comparing the storage engines

The following Ruby snippet allows us to compare two different methods of cleaning the database:

1. Deleting multiple documents at once
2. Dropping and recreating the index

Before each run, we will start mongod with a different storage engine.

require 'bundler/inline'

gemfile false do
  source 'https://rubygems.org'
  gem 'mongo'
  gem 'benchmark-ips'
end

::Mongo::Logger.logger.level = Logger::INFO
client = ::Mongo::Client.new(['127.0.0.1:27017'], database: 'bench')
puts format('storageEngine: %s', client.command(serverStatus: 1).first.dig('storageEngine', 'name'))

indexes = [
  { key: { user_id: 1, year: 1, month: 1 }, unique: true },
  { key: { 'science.id'  => 1 } },
  { key: { 'science.lid' => 1 } },
  { key: { user_id: 1, u: 1 } }
]

Benchmark.ips do 'x'
  x.report('delete_many') { client[:scores].delete_many }
  x.report('drop+recreate index') do
    client[:scores].drop
    client[:scores].indexes.create_many(indexes)
  end
  x.compare!
end

On different machines, I get the following results (instructions per second, higher is better):

OS, FilesystemStorage enginedelete_many  drop+recreate index
macOS 10.12, HFS+wiredTiger3674,0 i/s3,8 i/s
mmapv13507,6 i/s621,6 i/s
ephemeralForTest3656,8 i/s767,8 i/s
FreeBSD 11.0, ZFSwiredTiger4624,6 i/s4,4 i/s
mmapv14650,5 i/s797,9 i/s
ephemeralForTest4663,9 i/s914,8 i/s
Linux 4.11, btrfs +LZOwiredTiger3972,8 i/s12,1 i/s
mmapv14175,1 i/s527,8 i/s
ephemeralForTest4154,1 i/s740,6 i/s
Linux 4.11, ext4wiredTiger4938,8 i/s36,6 i/s
mmapv14852,1 i/s866,4 i/s
ephemeralForTest4814,0 i/s1076,8 i/s

Here we can clearly see that constantly dropping and recreating the index (fourth column) when using the wiredTiger storage engine is much more expensive than with the other two, especially on non-Linux® systems. The performance for just deleting all documents (third column) in the database is roughly the same on all three storage engines.

Changing the storage engine

Time to change the storage engine. For running tests, you usually want to use an in-memory engine, such as ephemeralForTest. However, I am switching to mmapv1 here because I want to run the tests on my local machine and have a persistent database for development so I don’t have to run a second instance of MongoDB just with the in-memory engine.

In order to change the engine, stop the MongoDB service first. On macOS using Homebrew with homebrew-services, you can do that by running brew services stop mongodb. You then need to edit the MongoDB configuration file. This can be done by opening /usr/local/etc/mongod.conf in your favorite text editor. Once opened, set the storage engine to mmapv1 like this:

# /usr/local/etc/mongod.conf
systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb
  engine: mmapv1
net:
  bindIp: 127.0.0.1

Now comes the interesting part: removing the old data directory. This needs to be done; otherwise, MongoDB will continue using the previously used engine. Before removing it, we will dump the contents of the database though.  

mongodump -o ./dump
# replace /usr/local/var/mongodb with the value of `storage.dbPath`
rm -rf /usr/local/var/mongodb

Last but not least, we start the MongoDB service again and restore the old data from the dump we made before. With homebrew-services on macOS, starting the service is as easy as brew services start mongodb. After MongoDB has finished starting, we can restore the database by running mongorestore ./dump.

Once all of that is done, we run our tests again and see if it got any better:

% time bundle exec rspec
(rspec output omitted)
Finished in 2 minutes 6 seconds (files took 10.45 seconds to load)
4818 examples, 0 failures
bundle exec rspec  108.41s user 6.93s system 83% cpu 2:17.64 total

Well, 23 minutes down to two minutes. Not bad!

I also did a small comparison of the file activity between the two storage engines. On the top, you see the wiredTiger engine, and on the bottom, the mmapv1.

Conclusion

We learned two things. First, constantly dropping and recreating the indexes in MongoDB is – depending on the filesystem – way more expensive using the wiredTiger engine than using mmapv1.
Second, even though most developers do not need to know how to fine tune a (database) server for production, it sometimes pays off to know something about how it works internally.

***

Tech Team We are made up of all the tech departments at Runtastic like iOS, Android, Backend, Infrastructure, DataEngineering, etc. We’re eager to tell you how we work and what we have learned along the way. View all posts by Tech Team