Our oldest project, the Empfehlungsbund user dashboard, was running on MySQL until recently. For various reasons that I will go into below, we decided to migrate to PostgreSQL. This is a short write-up of the steps we took to migrate the database.
Why PostgreSQL for us?
- Almost all of our other projects are using PostgreSQL, so having a different SQL dialect in the mix is a bit annoying and confusing when diving into specific vendor-specific SQL features.
- Transactional DDL, meaning even the migrations are running in a transaction. This is a huge benefit because it means that if a migration fails, the database is rolled back to the state before the migration. Especially useful if you have more than one change in a migration, and one of them fails. With MySQL that is usually a pain, because you have to manually revert the changes that were already applied or comment them out. With PostgreSQL, we even pair our DDL migrations (e.g. adding a column) with a data migration (e.g. filling the column with data) frequently, so that we can be sure that the data is in a consistent state after the migration.
- We like to use specific SQL-Features, such as JSONB with indexed columns, Text-Search with TS-Vector or Reverse-Tagging-Search with TS-Query.
Migration vs pgloader
Fortunately, there is already an awesome tool: pgloader that can migrate from various Databases to PG. It will create tables, indices etc. and copy over the data.
PGloader is available in most package managers, BUT we needed to compile from source because there had been issues with the old version.
# compile from source apt remove pgloader -y apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev cd /root git clone https://github.com/dimitri/pgloader.git cd pgloader make pgloader ./build/bin/pgloader --help cp build/bin/pgloader /usr/local/bin/pgloader
Now, also make sure that your database users have access to the two databases:
mysql -u root mysql> CREATE USER 'admin'@'localhost'; mysql> GRANT ALL PRIVILEGES ON *.* TO 'admin'@'localhost' WITH GRANT OPTION;
sudo su - postgres createuser root psql GRANT ALL ON SCHEMA public TO root; GRANT ALL PRIVILEGES ON DATABASE "YOUR_DB" to root ;
PG-Loader can be run directly from the command-line but also supports a config file in SQL-like format. In our case:
LOAD DATABASE FROM mysql://admin:email@example.com/your_db INTO postgresql:///target_db WITH preserve index names,prefetch rows = 100 alter schema 'dbname' rename to 'public'
Run it with:
/usr/local/bin/pgloader -v pgloader.load
The options that we used:
preserve index names- this will try to keep the names of indices the same, otherwise, you will get indices in PG with a number suffix like
prefetch rows = 100- this will limit the row length to 100 which fixed an import heap error for us
alter schema: it seems that
pgloaderlikes to put the DB into a PG schema, so you might want to rename it to the public schema. That might be only possible if your access rights are right.
Differences in Dialect
The effort to change SQL access patterns is totally dependent on your codebase. In our case, we had a few things that we had to change, especially around date-time handling or JSON access:
|Rand vs Random||
|group by year of a datetime col||
|Format Month in DB||
|extract JSON keys||
|Truncate to date to group by date||