Python HOW: migrate Prodigy database to MySql or Postgres
Learn how to save Prodigy annotations in a remote database for collaborative annotating

Prodigy is developed by Explosion AI, the folks behind spaCy, so it integrates with it organically 🍻. This article describes in details how to migrate the default local SQLite database schema into a remote MySql or Postgres database
Default Database 📍
The first time you run Prodigy, it will create a folder .prodigy
in your home directory with 2 files:
By default, Prodigy looks for its configuration in prodigy.json
which has the default database "db": "sqlite”
. So, by default, annotations are saved in the SQLite database prodigy.db
. This is great to start labelling your data straight away without having to worry about were to save them
Remote Server 🌍
If the task requires multiple annotators, or you want to save your annotations on a remote MySQL or PostgreSQL server, you can change the db
name and settings
in prodigy.json
:
Don’t forget to pip install the database driver! Use pymysql for MySQL or psycopg2 for PostgreSQL
This, however, assumes that you have already created a remote database called prodigydb
and migrated the SQLite schema with all the tables that prodigy expects 😶 keep reading to learn how to do that 👇
Create remote database 🌍🔨
To create the prodigydb
database in the remote server, we can use SqlAlchemy to create an engine then pass it the query to create the database (check 👉 article to learn more about executing queries using SqlAlchemy):
Migrating SQLite schema 🚀
Now we have the database ready, we need to migrate the schema with all the tables that prodigy expects to it. We can do this using one of the following 2 options 👇
Option 1: Using peewee
This is my favorite option and is the simplest/quickest
Prodigy uses peewee to manage database integration. Peewee is “a simple and small ORM (Object-Relational Mapping)” package that supports SQLite, MySQL and Postgres and is installed with Prodigy
Our goal is to use peewee to generate a Model
class for each table found in the source SQLite database ▶️ use these models to create the same tables in the target MySql or Postgres
- Step 1
Create a source SQLite database object from prodigy.db
, Introspect
the database object, then generate the models:
Useful to know: there’re 2 helper functions to print a user-friendly description of the generated models and their SQL tables schema:
CREATE TABLE for each model
- Step 2
Create a target database from your remote MySql or Postgres server, and use it to create the tables using their model classes:
And that is it! pretty simple 😆
Option 2: Using pwiz
Pwiz is “a little script that ships with peewee and is capable of introspecting an existing database and generating model classes code”
Our goal is still the same, use pwiz to generate a Model
class as a code for the source SQLite database ▶️ modify the code to create the same tables in the target MySql or Postgres
- Step 1
Specify the database back-end as SQLite
using the -e
flag, connect to prodigy.db
and print all model classes as a python script:
sqlite_models.py
contains a Model
class for each of the found tables. This’s how it looks like:
sqlite_models.py
- Step 2
In sqlite_models.py
, modify the database object to connect to the target MySql or Postgres server, and use it to create the tables using their model classes:
And that is it! not as simple as first option but simple enough 😅
What is next?
Deploy Prodigy as a Docker Web App using:
- Azure Web App and MySQL services (coming soon)
- AWS Elastic Beanstalk and RDS MySQL services (coming soon)
In the meantime, start using Docker 🐋 for Python and learn how to take full control over your development environment 🎯
Happy coding!