Learn how to save Prodigy annotations in a remote database for collaborative annotating

Image for post
Image for post
Photo edited by the author from prodi.gy

Prodigy is developed by Explosion AI, the folks behind spaCy, so it integrates with it organically 🍻. This article describes in details how to migrate the default local SQLite database schema into a remote MySql or Postgres database

TL;DR: full code

Default Database 📍

The first time you run Prodigy, it will create a folder .prodigy in your home directory with 2 files:

structure of .prodigy folder

By default, Prodigy looks for its configuration in prodigy.json which has the default database "db": "sqlite”. So, by default, annotations are saved in the SQLite database prodigy.db. …


Image for post
Image for post
Photo by panumas nikhomkhai from Pexels

This is an extension to a previous article 👀 that covers the low-level methods of establishing a connection to a SQL database and executing queries. We cover here the equivalent high-level methods in SqlAlchemy and Pandas to do the same in fewer lines of code

TL;DR: full code

Prerequisites

Install pyodbc, sqlalchemy and pandas using your preferred package manager

You might want to create a virtual environment first

Then, install the required driver for the DBMS-database you want to connect to. …


Image for post
Image for post
Photo by cottonbro from Pexels

A requirements.txt file lists all the Python dependencies required for a project. It’s a snapshot of all the packages you’ve used. You will need this for building a Docker image for example, or for creating Serverless Functions or Web Apps

What’s pipenv?

pipenv is currently the recommended dependency manager for collaborative projects by Python. It uses pip and virtualenv under the hood. However, unlike pip, it attempt to install sub-dependencies that satisfy all the requirements from core dependencies

Instilling pipenv

Run the following in your terminal for a user installation:

It’s recommended to install using the user scheme by specifying the — user option

After installation, open a new terminal and run…


Image for post
Image for post
Photo by Kaique Rocha from Pexels

Docker helps you to package up your project with all of the dependencies needed to run it from anywhere

Build, share and run any application, anywhere!”

Is it a Docker image or a container? 😕

Let’s clear this out straight away. You first build a Docker image by reading a set of instructions from a Dockerfile. Once you run this image, it’s called a container

Docker Engine 🚒

To do any thing Docker, you first need to install the Docker Engine. Docker Engine is available on a variety of Linux platforms, Mac and Windows through Docker Desktop, Windows Server, and as a static binary installation. …


Image for post
Image for post
Photo by Pixabay from Pexels

At the end of this article you’ll be a master Sklearn plumber. You’ll know how to pipe in numerical and categorical attributes without having to use Pandas get_dummies or Sklearn FeatureUnion

TL;DR: full code

1. Install/Update

At the time of writing this post, 0.21.2 was the latest release of sklearn. Check the docs for dependencies and either install or update

conda install scikit-learn==0.21.2
conda update scikit-learn==0.21.2

2. Toy dataset

We’ll use a sample dataset of audience churn with 1000 instances, and 19 attributes, 10 numerical and 9 categorical. You can download AudienceChurn.dataSample.csv


Image for post
Image for post
Photo by Pixabay from Pexels

At the end of this article you will learn how to get valuable data and insights of a page you’re an admin. It assumes you’ve already obtained a permanent page token. If you’ve not, check my 👉 article first

TL;DR: full code

The Graph API 🍇

Data in FB is represented using the idea of a “social graph”. To interact with a graph we use an HTTPS-based API called the Graph API. To return a graph object using the obtained page token:

Check the latest version of the official facebook-sdk release

A graph is made up of 3 hierarchical components:

  1. A node which is an individual object with a unique…

Learn how to obtain a permanent Page Access Token for your pages using the official facebook-sdk package. Updated Oct 2020

Image for post
Image for post
Photo captured by the author from developers.facebook.com

Our aim in this article is to get a Permanent Token. We will get a Short-Lived Token manually only once, then use Python to exchange that Token with a Permanent one. FB states that:️

You should not depend on these tokens lifetimes remaining the same as Access Tokens can be invalidated or revoked anytime 💀

This might be the case if we make frequent requests using the same Token. …


Image for post
Image for post
Photo by panumas nikhomkhai from Pexels

I’ve put my best effort to provide you with a clear, concise, and detailed description on how to connect to, and manage a SQL/SQLite database using Python

TL;DR: full code

Introduction

A database model determines the logical structure of a database. This in turn determines how data can be stored, organized and manipulated. The Relational Model (RM) is the most popular database model since the 1980s. RM uses a table-based format, where tables are related by common columns

Database management system (DBMS)

DBMS is the software that you -or applications- use to interact with the database to create, read, update and manage data. The Relational DBMS (RDBMS) is the DBMS based on RM. According to DB-Engines, the most widely used RDBMS are: Oracle, MySQL, Microsoft SQL Server, PostgreSQL, IBM DB2, Microsoft Access, and…

About

Gabriel Harris Ph.D.

I’m a full-stack data scientist and a Python educator. Most of my articles start after saying “I wish someone has written about this!”, maybe I should?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store