Python HOW: migrate Prodigy database to MySql or Postgres

Learn how to save Prodigy annotations in a remote database for collaborative annotating

Image for post
Image for post
Photo edited by the author from prodi.gy

rodigy is developed by Explosion AI, the folks behind spaCy, so it integrates with it organically 🍻. This article describes in details how to migrate the default local SQLite database schema into a remote MySql or Postgres database

TL;DR: full code

Default Database 📍

The first time you run Prodigy, it will create a folder .prodigy in your home directory with 2 files:

structure of .prodigy folder

By default, Prodigy looks for its configuration in prodigy.json which has the default database "db": "sqlite”. So, by default, annotations are saved in the SQLite database prodigy.db. This is great to start labelling your data straight away without having to worry about were to save them

Remote Server 🌍

If the task requires multiple annotators, or you want to save your annotations on a remote MySQL or PostgreSQL server, you can change the db name and settings in prodigy.json:

For PostgreSQL, replace mysql with postgresql, change port to 5432, and database to dbname. Replace certificate in line 12 according to your server (if you don’t have SSL enabled, remove lines 10–14)

Don’t forget to pip install the database driver! Use pymysql for MySQL or psycopg2 for PostgreSQL

his, however, assumes that you have already created a remote database called prodigydb and migrated the SQLite schema with all the tables that prodigy expects 😶 keep reading to learn how to do that 👇

Create remote database 🌍🔨

To create the prodigydb database in the remote server, we can use SqlAlchemy to create an engine then pass it the query to create the database (check 👉 article to learn more about executing queries using SqlAlchemy):

For PostgreSQL, change port to 5432, and replace mysql+pymysql with postgresql+psycopg2. Replace certificate in line 10 according to your server (if you don’t have SSL enabled, remove line 10)

Migrating SQLite schema 🚀

Now we have the database ready, we need to migrate the schema with all the tables that prodigy expects to it. We can do this using one of the following 2 options 👇

Option 1: Using peewee

This is my favorite option and is the simplest/quickest

Prodigy uses peewee to manage database integration. Peewee is “a simple and small ORM (Object-Relational Mapping)” package that supports SQLite, MySQL and Postgres and is installed with Prodigy

Our goal is to use peewee to generate a Model class for each table found in the source SQLite database ▶️ use these models to create the same tables in the target MySql or Postgres

  • Step 1

Create a source SQLite database object from prodigy.db , Introspect the database object, then generate the models:

models is a dictionary that maps each table found in the source database to a model class

Useful to know: there’re 2 helper functions to print a user-friendly description of the generated models and their SQL tables schema:

Uncomment line 5 to print the SQL CREATE TABLE for each model
  • Step 2

Create a target database from your remote MySql or Postgres server, and use it to create the tables using their model classes:

For a Postgres server use PostgresqlDatabase. Uncomment line 13 to check that the tables were created. Replace certificate in line 9 according to your server (if you don’t have SSL enabled, remove line 9)

And that is it! pretty simple 😆

Option 2: Using pwiz

Pwiz is “a little script that ships with peewee and is capable of introspecting an existing database and generating model classes code”

Our goal is still the same, use pwiz to generate a Model class as a code for the source SQLite database ▶️ modify the code to create the same tables in the target MySql or Postgres

  • Step 1

Specify the database back-end as SQLite using the -e flag, connect to prodigy.db and print all model classes as a python script:

Run in your Python REPL and make sure you’re in the same directory as prodigy.db

sqlite_models.py contains a Model class for each of the found tables. This’s how it looks like:

sqlite_models.py
  • Step 2

In sqlite_models.py , modify the database object to connect to the target MySql or Postgres server, and use it to create the tables using their model classes:

For a Postgres server use PostgresqlDatabase. Uncomment line 28 to check that the tables were created. Replace certificate in line 9 according to your server (if you don’t have SSL enabled, remove line 9)

And that is it! not as simple as first option but simple enough 😅

What is next?

Deploy Prodigy as a Docker Web App using:

  • Azure Web App and MySQL services (coming soon)
  • AWS Elastic Beanstalk and RDS MySQL services (coming soon)

In the meantime, start using Docker 🐋 for Python and learn how to take full control over your development environment 🎯

Happy coding!

I’m an End-to-End data scientist and a Python educator. Most of my articles start after saying “I wish someone has written about this!”, maybe I should?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store