Using Django with an existing legacy database

The Django web framework is well suited for creating medium sized research databases. It allows rapid development of a convenient data administration backend (using the Django Admin Site) as well as appealing frontends for published data (as done in the LATINNO project at the WZB). This works well when you build a database from ground up by defining model classes at first and then let Django generate the database schema itself (Django models → Database schema). Often enough however, it is necessary to revise an existing database or at least the data administration interface. In this scenario, the database schema is already defined and hence it is necessary to create Django models from the schema (Database schema → Django models). Django can handle this situation pretty well but some advises have to be followed which I’ll explain here.

Prerequisites

I’ll be using the current LTS release of Django, version 1.8. All other recent versions of Django should work the same however.

Our legacy database

First of all, let’s set up a database with some existing data in order to illustrate the necessary steps with a real world example. We’ll be using MySQL here and some example databases can be found in the documentation. We can use the “world” example database because it features a fairly simple schema to give an overview but still has some interconnected tables so that it makes a good “real world” example. It contains some general information about many countries, their cities and the languages spoken in these countries and is provided by Statistics Finland. The SQL file should be imported into a database called world. This is our “legacy” database. Furthermore, we need to create a database world_django which will later store all kinds of information that is required by the Django framework (e.g. the user account data for the admin site).

Lets have a look at the legacy database. It consists of three tables:

mysql> describe city;
+-------------+----------+------+-----+---------+----------------+
| Field       | Type     | Null | Key | Default | Extra          |
+-------------+----------+------+-----+---------+----------------+
| ID          | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name        | char(35) | NO   |     |         |                |
| CountryCode | char(3)  | NO   | MUL |         |                |
| District    | char(20) | NO   |     |         |                |
| Population  | int(11)  | NO   |     | 0       |                |
+-------------+----------+------+-----+---------+----------------+
mysql> describe country;
+----------------+----------------------------+------+-----+---------+-------+
| Field          | Type                       | Null | Key | Default | Extra |
+----------------+----------------------------+------+-----+---------+-------+
| Code           | char(3)                    | NO   | PRI |         |       |
| Name           | char(52)                   | NO   |     |         |       |
| Continent      | enum('Asia','Europe', ...) | NO   |     | Asia    |       |
| Region         | char(26)                   | NO   |     |         |       |
| SurfaceArea    | float(10,2)                | NO   |     | 0.00    |       |
| IndepYear      | smallint(6)                | YES  |     | NULL    |       |
| Population     | int(11)                    | NO   |     | 0       |       |
| LifeExpectancy | float(3,1)                 | YES  |     | NULL    |       |
| GNP            | float(10,2)                | YES  |     | NULL    |       |
| GNPOld         | float(10,2)                | YES  |     | NULL    |       |
| LocalName      | char(45)                   | NO   |     |         |       |
| GovernmentForm | char(45)                   | NO   |     |         |       |
| HeadOfState    | char(60)                   | YES  |     | NULL    |       |
| Capital        | int(11)                    | YES  |     | NULL    |       |
| Code2          | char(2)                    | NO   |     |         |       |
+----------------+----------------------------+------+-----+---------+-------+
mysql> describe countrylanguage;
+-------------+---------------+------+-----+---------+-------+
| Field       | Type          | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+-------+
| CountryCode | char(3)       | NO   | PRI |         |       |
| Language    | char(30)      | NO   | PRI |         |       |
| IsOfficial  | enum('T','F') | NO   |     | F       |       |
| Percentage  | float(4,1)    | NO   |     | 0.0     |       |
+-------------+---------------+------+-----+---------+-------+

As already said, it’s a fairly simple database schema. A country is identified by its code primary key. A city has a unique ID and a foreign key to country via its CountryCode field. The countrylanguage table also has a CountryCode foreign key definition in order to save several languages per country. However, we can see that this table uses a composite primary key consisting of CountryCode and Language (so that a language is unique per country). This doesn’t look like a big deal, however, it will cause headaches later because Django does not support composite primary keys (yet). I’ve tried several workarounds using custom Django model migrations or Django extensions. In the end, I found that the best is to tackle this problem directly by modifying the database schema slightly in order to avoid composite primary keys. In our case that means that an “ID” column should be generated and be used as only primary key. A small script like this implements the necessary steps:

import MySQLdb

db = MySQLdb.connect('<DBHOST>', '<DBUSER>', '<DBPASS>', 'world')

cur = db.cursor()

# add ID column
cur.execute('ALTER TABLE countrylanguage ADD COLUMN ID INTEGER')

# generate IDs
cur.execute('SELECT CountryCode, Language FROM countrylanguage')
rows = cur.fetchall()
for i, (cc, lang) in enumerate(rows):
    cur.execute('UPDATE countrylanguage SET ID = %s WHERE CountryCode = %s AND Language = %s', (i+1, cc, lang))

db.commit()

# drop composite primary keys, add new primary key and set as "AUTO_INCREMENT" field
cur.execute('ALTER TABLE countrylanguage DROP PRIMARY KEY')
cur.execute('ALTER TABLE countrylanguage ADD PRIMARY KEY(ID)')
cur.execute("ALTER TABLE countrylanguage CHANGE COLUMN ID ID INTEGER AUTO_INCREMENT")

db.close()

Setting up a Django project (to use two databases with a DB router)

The database is now prepared for Django so that we can create a project legacydb and a Django app worlddata:

django-admin startproject legacydb
cd legacydb
./manage.py startapp worlddata

Right at the beginning, we created two databases: Our “legacy” database world and a still empty database world_django. While it is perfectly possible to use only one database, it is advisable to modify the legacy database as little as possible. Since Django needs to create several database tables for itself (for authentication, contenttypes, etc.), these tables would normally be written into the existing legacy database. If we do not want that, we need to set up two databases in settings.py:

DATABASES = {
    'default': {  # a default database must exist. this one will contain the django related data
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'world_django',
        'USER': '...',
        'PASSWORD': '...',
        'HOST': '...'
    },
    'world_data': {   # this is the legacy database
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'world',
        'USER': '...',
        'PASSWORD': '...',
        'HOST': '...'
    },
}

We defined two database aliases: defaultworld_django MySQL DB and world_dataworld MySQL DB.

Now we need to set up a database router. This router implements which database alias to use for which model. In general, we want to use the world_data (legacy) database alias for all models in the worlddata app. All other queries should use the default database (which leads to world_django). We implemented this in the decide_on_model() helper function:

def decide_on_model(model):
    """Small helper function to pipe all DB operations of a worlddata model to the world_data DB"""
    return 'world_data' if model._meta.app_label == 'worlddata' else None


class WorldDbRouter:
    """
    Implements a database router so that:

    * Django related data - DB alias `default` - MySQL DB `world_django`
    * Legacy "world" database data (everything "non-Django") - DB alias `world_data` - MySQL DB `world`
    """
    def db_for_read(self, model, **hints):
        return decide_on_model(model)

    def db_for_write(self, model, **hints):
        return decide_on_model(model)

    def allow_relation(self, obj1, obj2, **hints):
        # Allow any relation if both models are part of the worlddata app
        if obj1._meta.app_label == 'worlddata' and obj2._meta.app_label == 'worlddata':
            return True
        # Allow if neither is part of worlddata app
        elif 'worlddata' not in [obj1._meta.app_label, obj2._meta.app_label]:
            return True
        # by default return None - "undecided"

    def allow_migrate(self, db, app_label, model_name=None, **hints):
        # allow migrations on the "default" (django related data) DB
        if db == 'default' and app_label != 'worlddata':
            return True

        # allow migrations on the legacy database too:
        # this will enable to actually alter the database schema of the legacy DB!
        if db == 'world_data' and app_label == "worlddata":
           return True

        return False

Additionally we defined that relations can only be allowed between models of the same app and database migrations are allowed with the legacy database too, at least when the migrations come from the worlddata app. The router needs to be activated with DATABASE_ROUTERS = ['legacydb.dbrouters.WorldDbRouter'] in settings.py.

Since the database connection is set up, we can start the initial database migration which will now only create the necessary Django tables using the default database alias (world_django DB). Additionally we set up an initial admin site account:

./manage.py migrate
./manage.py createsuperuser

When we start the server, we could already log in to the admin site now — however we could only change users and authentication groups of the admin backend.

Generating and adapting model classes from the legacy database schema

Model classes can be generated automatically from an existing database schema by using the inspectdb command. We need the --database parameter because otherwise Django would generate classes for the default database alias. The output will be sent to stdout so we can preview it at first:

./manage.py inspectdb --database=world_data

The output will look like this:

# This is an auto-generated Django model module.
# You'll have to do the following manually to clean this up:
#   * Rearrange models' order
#   * Make sure each model has one field with primary_key=True
#   * Remove `managed = False` lines if you wish to allow Django to create, modify, and delete the table
# Feel free to rename the models, but don't rename db_table values or field names.
#
# Also note: You'll have to insert the output of 'django-admin sqlcustom [app_label]'
# into your database.
from __future__ import unicode_literals

from django.db import models


class City(models.Model):
    id = models.AutoField(db_column='ID', primary_key=True)  # Field name made lowercase.
    name = models.CharField(db_column='Name', max_length=35)  # Field name made lowercase.
    countrycode = models.ForeignKey('Country', db_column='CountryCode')  # Field name made lowercase.
    district = models.CharField(db_column='District', max_length=20)  # Field name made lowercase.
    population = models.IntegerField(db_column='Population')  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'city'


class Country(models.Model):
    code = models.CharField(db_column='Code', primary_key=True, max_length=3)  # Field name made lowercase.
    name = models.CharField(db_column='Name', max_length=52)  # Field name made lowercase.
    continent = models.CharField(db_column='Continent', max_length=13)  # Field name made lowercase.
    region = models.CharField(db_column='Region', max_length=26)  # Field name made lowercase.
    surfacearea = models.FloatField(db_column='SurfaceArea')  # Field name made lowercase.
    indepyear = models.SmallIntegerField(db_column='IndepYear', blank=True, null=True)  # Field name made lowercase.
    population = models.IntegerField(db_column='Population')  # Field name made lowercase.
    lifeexpectancy = models.FloatField(db_column='LifeExpectancy', blank=True, null=True)  # Field name made lowercase.
    gnp = models.FloatField(db_column='GNP', blank=True, null=True)  # Field name made lowercase.
    gnpold = models.FloatField(db_column='GNPOld', blank=True, null=True)  # Field name made lowercase.
    localname = models.CharField(db_column='LocalName', max_length=45)  # Field name made lowercase.
    governmentform = models.CharField(db_column='GovernmentForm', max_length=45)  # Field name made lowercase.
    headofstate = models.CharField(db_column='HeadOfState', max_length=60, blank=True, null=True)  # Field name made lowercase.
    capital = models.IntegerField(db_column='Capital', blank=True, null=True)  # Field name made lowercase.
    code2 = models.CharField(db_column='Code2', max_length=2)  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'country'


class Countrylanguage(models.Model):
    countrycode = models.ForeignKey(Country, db_column='CountryCode')  # Field name made lowercase.
    language = models.CharField(db_column='Language', max_length=30)  # Field name made lowercase.
    isofficial = models.CharField(db_column='IsOfficial', max_length=1)  # Field name made lowercase.
    percentage = models.FloatField(db_column='Percentage')  # Field name made lowercase.
    id = models.AutoField(db_column='ID', primary_key=True)  # Field name made lowercase.

    class Meta:
        managed = False
        db_table = 'countrylanguage'

We can see that Django correctly generates three model classes from the existing three tables so lets write it to a models file:

./manage.py inspectdb --database=world_data &gt; worlddata/models.py

The generated classes definitely need some tweaking as also indicated by the hints in the auto-generated source code. This includes:

  • reorder City class (below Country)
  • set default=... values according to table definitions
  • set choices for all Enum fields
  • modify Meta subclasses:
    • set managed = True
    • set a default ordering
    • optionally set verbose_name / verbose_name_plural
  • implement __str__() methods for each model

In the end, the models.py file should look something like this:

from django.db import models


def make_choices(values):
    return [(v, v) for v in values]


class Country(models.Model):
    CONTINENTS = ('Asia', 'Europe', 'North America', 'Africa', 'Oceania', 'Antarctica', 'South America')

    code = models.CharField(db_column='Code', primary_key=True, max_length=3)
    name = models.CharField(db_column='Name', max_length=52)
    continent = models.CharField(db_column='Continent', max_length=13, choices=make_choices(CONTINENTS),
                                 default=CONTINENTS[0])
    region = models.CharField(db_column='Region', max_length=26)
    surfacearea = models.FloatField(db_column='SurfaceArea', default=0)
    indepyear = models.SmallIntegerField(db_column='IndepYear', blank=True, null=True)
    population = models.IntegerField(db_column='Population', default=0)
    lifeexpectancy = models.FloatField(db_column='LifeExpectancy', blank=True, null=True)
    gnp = models.FloatField(db_column='GNP', blank=True, null=True)
    gnpold = models.FloatField(db_column='GNPOld', blank=True, null=True)
    localname = models.CharField(db_column='LocalName', max_length=45)
    governmentform = models.CharField(db_column='GovernmentForm', max_length=45)
    headofstate = models.CharField(db_column='HeadOfState', max_length=60, blank=True, null=True)
    capital = models.IntegerField(db_column='Capital', blank=True, null=True)
    code2 = models.CharField(db_column='Code2', max_length=2)

    def __str__(self):
        return self.code + ' / ' + self.name

    class Meta:
        managed = True
        db_table = 'country'
        verbose_name_plural = 'Countries'
        ordering = ('code', )


class CountryLanguage(models.Model):
    ISOFFICIAL_CHOICES = (
        ('F', 'no'),
        ('T', 'yes'),
    )

    id = models.AutoField(db_column='ID', primary_key=True)
    countrycode = models.ForeignKey(Country, db_column='CountryCode')
    language = models.CharField(db_column='Language', max_length=30)
    isofficial = models.CharField(db_column='IsOfficial', max_length=1,
                                  choices=ISOFFICIAL_CHOICES, default=ISOFFICIAL_CHOICES[0][0])
    percentage = models.FloatField(db_column='Percentage', default=0.0)

    def __str__(self):
        return self.language + ' / ' + self.countrycode.pk

    class Meta:
        managed = True
        db_table = 'countrylanguage'
        ordering = ('language', )


class City(models.Model):
    id = models.AutoField(db_column='ID', primary_key=True)
    name = models.CharField(db_column='Name', max_length=35)
    countrycode = models.ForeignKey(Country, db_column='CountryCode')
    district = models.CharField(db_column='District', max_length=20)
    population = models.IntegerField(db_column='Population', default=0)

    def __str__(self):
        return self.name

    class Meta:
        managed = True
        db_table = 'city'
        verbose_name_plural = 'Cities'
        ordering = ('name', )

Although we don’t make any modifications to the database schema, we nevertheless have to generate an initial database migration file for the worlddata app so that Django knows the initial state of the models for optional later migrations:

./manage.py makemigrations worlddata

We need to apply this migration now to the legacy database (alias world_data) so that Django knows the migration was run. However, no actual SQL commands should be executed because the tables already exist! We take this into account by using the --fake-initial switch:

./manage.py migrate --database world_data --fake-initial worlddata

This will also generate a django_migrations table in the legacy database — the only Django “system table” in this database.

Generating a minimalist admin site

We’re now ready to set up a very minimalist admin site to administer the database. In admin.py of the “worlddata” app we register all models that should appear in the admin backend:

from . import models

admin.site.register(models.Country)
admin.site.register(models.CountryLanguage)
admin.site.register(models.City)

Although it’s minimalist, it’s already fully functional: Each element in each table can be listed, modified, deleted and new elements can be added. Foreign key relations also work without any problems:

autogenerated country list

autogenerated country form

This is already a good starting point to tweak the administration interface by defining custom ModelAdmin classes. At the end, you will have an easy to use, multi-user data administration interface for your legacy database with additional abilities like Django migrations, a frontend for published data, a possible REST interface, etc.

Comments are closed.

Post Navigation