Benchmark databases in Docker: MySQL, PostgreSQL, SQL Server
Why should you do database benchmark?
I was a long-term SQL Server user. But it was not my decision to use SQL Server. I was young and had to listen older colleagues. If they said, SQL Server is the best, then it was.
Then I was a long-term MySQL user. But it was not my decision to use MySQL. I was older, but my other colleagues were much more experienced. If they said, MySQL is the best, then it was.
I never searched for any other opinions, because at that time, I was learning. How can I argue with my much more experienced colleagues?
Then I found MariaDB. Then I found Percona. And I asked my much more experienced colleagues something like: “Hey, what are they? Why are they?”. The answer was almost always something like: “MySQL is the best, I know MySQL. Do you suppose to use something else? Do you want to break things?”
And of course, I didn’t want to break things.
But curiosity is my second name, so I decided to find out, which database should suits the best for our application. And I hit the perfect time, because Docker was arriving to the scene, so it looked like, it would be easy to start my work.
So, to answer the question “Why should you do database benchmark?” is to say: “Because it turns YOU into that much more experienced colleague”.
How I did my database benchmark?
To benchmark a database can be a tricky thing. Just look around the internet. There are different benchmarks, measuring different things, sometimes parameters, that do not tell you anything meaningful.
From my perspective, I see database as something, that exactly matches its name. A base for a data. Nothing more. So no application logic in database. Database should hold the data and make two main operations as quick as possible: read and write.
I see read as something not changing the database and write as something changing the database. For me, delete and update are both subset of write.
Also, when I read from database, I tend to make my
selects as simple as possible. I do not use joins on joins on joins on joins. I prefer reading multiple different tables as quick as possible, and then process the data outside the database. But I use
sum a lot. I use simple things a lot and complicated things as little as possible.
From my perspective, my ideal database should be calmly humming all the time, serving reads and writes as quickly, as possible. Nothing more.
When I decided to make my benchmark I was hunting three things:
- Which database has the fastest writes
- Which database has the fastest reads
- Which database has the smallest memory and CPU usage
As I did my benchmark two years ago in 2019 (resulted into switching to PostgreSQL), it will be wise to repeat it now, in 2021.
At first, we need to run all databases, we want to benchmark. We can run all the major databases in Docker with commands (and a little bit more information) from this article, but you find all those commands below.
I added some popular forks from all three major engine families.
Postgres engine family: PostgreSQL, TimescaleDB
MySQL engine family: MySQL, MariaDB, Percona
SQL Server engine family: SQL Server
Here are the Docker commands to run every database we need to test.
docker run --name postgres -e POSTGRES_PASSWORD=password -p 5433:5432 -v postgres_data:/var/lib/postgresql/data -d postgres:alpine
docker run --name timescale -e POSTGRES_PASSWORD=password -p 5434:5432 -v timescale_data:/var/lib/postgresql/data -d timescale/timescaledb:latest-pg12
docker run --name mysql -e MYSQL_ROOT_PASSWORD=password -p 3306:3306 -v mysql_data:/var/lib/mysql -d mysql:latest
docker run --name mariadb -e MYSQL_ROOT_PASSWORD=password -p 3307:3306 -v mariadb_data:/var/lib/mysql -d mariadb:latest
docker run --name percona -e MYSQL_ROOT_PASSWORD=password -p 3308:3306 -v percona_data:/var/lib/mysql -d percona:ps-8
docker run --name sqlserver -e 'ACCEPT_EULA=Y' -e 'MSSQL_SA_PASSWORD=passw0rd.' -p 1433:1433 -v sqlserver_data:/var/opt/mssql -d mcr.microsoft.com/mssql/server:2019-latest
Unfortunately, thank to Microsoft’s EULA for SQL Server 2019, it is not possible to present benchmark for SQL Server, but I can tell you, it is a shame. You have to do your benchmarks yourself.
By now, you should have all six databases running. All six in their default states. No fine tuning.
We can do our first comparison. We can compare database image size, initial memory usage in Docker and initial CPU usage.
By these results, it looks like PostgreSQL is the winner, and SQL Server is the loser. But we didn’t make any read and/or write benchmarks yet.
To do write benchmarks, I developed a simple Go program (github repository at the end of article). This program creates one table called
benchmark_data with 6 columns. You need to create database
benchmark manually using
create database benchmark.
This program inserts 10 000 rows into this table, one-by-one. No batch inserts, just simple inserts.
Benchmark will be done on Macbook Pro 2019, running Docker desktop, all unnecessary applications are shut down.
This first part of benchmark will measure time, it will take those inserts to complete. From my perspective, time versus some operation is the only reasonable measurement one can do. In the end you always want to know, how long does it take, or how much was done in specified time.
In this benchmark, I did 5 consecutive insert batches, each of 10 000 rows. Clearly, PostgreSQL is the winner and Percona is the loser.
Let’s add comparison in terms of total duration and average duration.
PostgreSQL is about twice as fast as Percona, in case of inserts. What strikes me here, is the difference between MariaDB and MySQL, because they are both from one engine family. Looks like folks behind MariaDB did some magic.
A simple conclusion: Postgres engine family is about twice as fast as MySQL engine family, except MariaDB.
This read benchmark was done this way: 2 000 loops with two reads in every loop:
data column. Again, I did everything 5 times.
Below are the results. PostgreSQL is the winner, Percona is the loser. MariaDB second worse.
Comparing total duration and average duration, just to have a better picture. Postgres engine family is about twice as fast as MySQL engine family.
Clearly the winner here is PostgreSQL. Smallest size, smallest CPU and memory usage, fastest write speed and fastest read speed.
Second place goes to TimescaleDB. TimescaleDB is a member of the same engine family. Thanks to information from this reddit post, TimescaleDB should perform better on bigger time-series based data, which was not purpose of this benchmark.
Third place goes to MariaDB, because of that fast reads.
Fourth is MySQL and last is Percona, to my surprise (explained below).
And something about SQL Server, whose results are not presented here (see explanation above): I have to say, they did some good work in Microsoft, because SQL Server would have been placed 3rd.
Differences between 2019 and 2021
When I did those tests two years ago, PostgreSQL was the winner, Percona second, MariaDB third, MySQL fourth and SQL Server last. I tested their latest versions in 2019 and today I tested, of course, different latest versions. I did not test TimescaleDB in 2019.
Looks like they did some very bad work at Percona, running in Docker. And they are still doing the best work at PostgreSQL, running in Docker.
Those two years ago, we did a switch from MySQL to PostgreSQL and looks like we did the right thing — for our application of use.
Those two years ago, we were thinking (and testing) about MariaDB and Percona. Good decision, we did not switch to Percona.
Some of you may argue, that one can fine-tune database of his/her choice to make it perform better. And you will be absolutely right. But bear in mind, you can fine-tune every database. So, if you fine-tune MySQL to catch PostgreSQL, you can surely fine-tune PostgreSQL.
Speaking about this, PostgreSQL has a very nice tool for fine-tuning the database. I don’t know of any similar tool for MySQL family or for SQL Server. If you know, please let me know in comments.
PostgreSQL only benchmark
Reading comments, I made an additional benchmark, this time comparing running PostgreSQL in Docker, versus running PostgreSQL natively on all three major systems. On MacOS, using different PGTune settings.