How to Recover From Composite Primary Key With Django and MySQL

As part of my Graveyard work, I run into Django’s inability to handle composite primary keys. Recovery was way harder than I thought. Here’s the cookbook.

The Problem

Back when I was young, relational databases were designed, not generated 1 . The golden standard for such design was entity-relationship model created for the application and then beaten to its appropriate normal form 2.

Under this paradigm, entities are assigned an identifier (ID). Relationships, on the other hand, only do have foreign key columns (in the form of entity_id). For m:n relationships, those column pairs (entity_one_id, entity_two_id) form the composite primary key.

So far, fine and dandy; this use-case is supported via Django’s ManyToManyField. However, it’s fairly standard for the relationships to have additional attributes.

This can still be solved the using the through attribute. That would work—except Django doesn’t support marking those two ForeignKeys as a PrimaryKey. Relevant ticket 373 is open for over a decade and a half by now, so better not to rely on this changing.

If you are not a database purist, the easy solution is to add a standalone ID field. This will be useless except for making Django happy, but it will solve our problem.

For existing tables with existing data, that’s not as easy as it sounds.

The Solution

I assume models generated using inspectdb and you wanting to have them managed by Django, hence removing the managed=False.

For composite keys, this will leave you with a definition for unique column for the composite keys and one of them retaining the primary_key property:

class Relation(models.Model):
    entity1_id = models.IntegerField(primary_key=True)
    entity2_id = models.IntegerField()
    attribute = models.IntegerField()
    class Meta:
        db_table = "relation_table"
        unique_together = (("entity1_id", "entity2_id"),)

Add a New Integer Field

First, let’s add a new field for the identifier. I’ve opted to make it explicit it’s there for Django only.

class Relation(models.Model):
    entity1_id = models.IntegerField(primary_key=True)
    entity2_id = models.IntegerField()
    attribute = models.IntegerField()
    django_id = models.IntegerField(null=True, blank=True)

    class Meta:
        db_table = "relation_table"
        unique_together = (("entity1_id", "entity2_id"),)

Because of existing data, it needs to be nullable.

Create a migration, deploy it & run it using manage.py makemigrations and manage.py migrate.

I had some issues fiddling with primary_key attribute depending on Django version; it may or may not be safe to remove it at this point. You don’t want any changes to primary key for this migration; experiment, or manually remove it from the generated code.

Fill In Data

Now we need to fill in missing IDs. I’ve opted to do this using a data migration. Create one using python manage.py makemigrations --empty app name

In there, iterate through all fields and add a number. It looks like this:

from django.db import migrations

def fill_relation_table_ids(apps, schema_editor):
    Relation = apps.get_model("appname", "Relation")
    i = 1

    for relation in Relation.objects.all():
        # can\'t do get() or save on an instance as that would check for
        integrity of the wrongly-assumed primary key on entity1_id

        Relation.objects.filter(
            entity1_id = relation.entity1_id, entity2_id = relation.entity2_id
        ).update(django_id=i)
        i += 1

class Migration(migrations.Migration):
    dependencies = [
        ("app", "0050_something"),
    ]

    operations = [
        migrations.RunPython(fill_relation_table_ids),
    ]

Note that I can afford this strategy as I can afford a downtime between running this migration and the following schema updates. If you can’t and it’s a heavily-written table, you may need to do more magic stored procedures. This is easier for databases that handle auto increment generators independently from tables, which is unfortunately not the case for MySQL.

Apply Custom SQL Magic

Now comes the fun part. Just modifying django_id to AutoField does not work because the generated SQL looks like this (as of Django 2.0):

BEGIN;

ALTER TABLE `relation_table ` MODIFY `django_id` integer
AUTO_INCREMENT NULL;

ALTER TABLE `relation_table ` MODIFY `django_id` integer
AUTO_INCREMENT NOT NULL;

ALTER TABLE `relation_table ` ADD CONSTRAINT
relation_table_django_id_5653b681_uniq UNIQUE (`django_id`);

ALTER TABLE `relation_table ` DROP PRIMARY KEY;

ALTER TABLE `relation_table ` ADD CONSTRAINT
`relation_table_django_id_5653b681_pk` PRIMARY KEY
(`django_id`);

COMMIT;

Running it will cause the migration to fail:

django.db.utils.DataError: (1171, \'All parts of a PRIMARY KEY must be
NOT NULL; if you need NULL in a key, use UNIQUE instead\')

Let’s fix the database manually:

  • Drop old composite primary key
  • Make django_id our new sole primary key column and make it an automatically incremented integer
  • Set the increment generator to higher value than the highest integer in the table. Because there is no native support to do that in MySQL, this is done by creating a stored procedure that’s also dropped as part of the migration

Again, start with an empty migration python manage.py makemigrations --empty app name

from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ("app", "0051_datafill"),
    ]

    operations = [
        migrations.RunSQL("ALTER TABLE `relation_table` DROP PRIMARY
        KEY;"),

        migrations.RunSQL(
            "ALTER TABLE `relation_table` MODIFY django_id INTEGER
            AUTO_INCREMENT NOT NULL PRIMARY KEY;"
        ),

        migrations.RunSQL(

            """
                SET @auto_incr = (SELECT IFNULL(MAX(django_id) + 1, 1) FROM
                relation_table);

                SET @statement_string = CONCAT('ALTER TABLE relation_table
                AUTO_INCREMENT=', @auto_incr);

                PREPARE statement_command FROM @statement_string;

                EXECUTE statement_command;

                DEALLOCATE PREPARE statement_command;
            """
        ),
    ]

Wohoo, our table works!

Fix Django Migrations

One little detail: we have done all of this behind Django’s back. If you would run the migrations above and then run python manage.py migrate again, you’d see following message

Running migrations:

No migrations to apply.

Your models have changes that are not yet reflected in a migration, and
so won't be applied.

Run 'manage.py makemigrations' to make new migrations, and then re-run
'manage.py migrate' to apply them.

Altering django_id to AutoField and calling it a day will still result in the issue above.

The trick? Keep it IntegerField first, but NOT NULL. And with PRIMARY KEY.

class Relation(models.Model):
    entity1_id = models.IntegerField()
    entity2_id = models.IntegerField()
    attribute = models.IntegerField()
    django_id = models.IntegerField(primary_key=True)

    class Meta:
        db_table = "relation_table"
        unique_together = (("entity1_id", "entity2_id"),)

Generate migrations:

from django.db import migrations

class Migration(migrations.Migration):
    dependencies = [
        ("app", "0052_customsql"),
    ]

operations = [

    migrations.AlterField(
        model_name="relation",
        name="django_id",
        field=models.IntegerField(default=1, primary_key=True,
        serialize=False),
        preserve_default=False,

    ),

    migrations.AlterField(
        model_name="relation",
        name="entity1_id",
        field=models.IntegerField(),

    ),

]

If you’d have primary key problems, you may need to add obligatory migrations.RunSQL("ALTER TABLE relation_table DROP PRIMARY KEY;") at the beginning.

Now is the time for final touches! We can finally have an AutoField.

class Relation(models.Model):
    entity1_id = models.IntegerField()
    entity2_id = models.IntegerField()
    attribute = models.IntegerField()
    django_id = models.AutoField(primary_key=True)

    class Meta:
        db_table = "relation_table"
        unique_together = (("entity1_id", "entity2_id"),)

Generate a simple migration and you should be good to go. Push all of this to production and enjoy your migrate command, you’ve earned it.

Next step is left as an exercise for the reader: migrate the entity1_id and entity2_id to a ForeignKey.

Conclusion

Composite primary keys are a common occurrence in databases that are designed independently from the application. Django doesn’t handle them well and it is advised to add an additional numerical primary key.

I hope the cookbook above gives you sufficient blueprint to survive that migration well.

The above can be done as part of a single migration, but be cautious. In case if problems, you’ll end up with half-applied migration and that’s never fun. If you have a site that you can take down for maintenance, however, it helps to keep the code tidy: here is the example migration.

Django’s migration framework is one of the nicest parts of Django for me. Keeping track of schema changes is a hard problem. Django’s mini-DSL which is simple and database backend independent hits a sweet spot between simplicity and long-term sustainability and maintenance.

While I really wish #373 would be done, this is a survivable workaround.


  1. I am not saying it was a virtue. It was a different use-case: multiple applications integrated through a database. Today, storage is more abundant and more importantly, networks are both faster and do have lower latency, at least within a datacenter. This allows for more decoupling and integration through APIs instead of shared data layers. This allows data design to happen at the system layer, not a storage layer ↩︎

  2. Not exactly how it happened for this particular application, but resulting problems are same ↩︎

Published in Notes and tagged


All texts written by . I'd love to hear your feedback. If you've liked this, you may want to subscribe for my monthly newsletter, RSS , or Mastodon. You can always return to home page or read about the site and its privacy handling.