How to Recover From Composite Primary Key With Django and MySQL
As part of my Graveyard work, I run into Django’s inability to handle composite primary keys. Recovery was way harder than I thought. Here’s the cookbook.
The Problem
Back when I was young, relational databases were designed, not generated 1 . The golden standard for such design was entity-relationship model created for the application and then beaten to its appropriate normal form 2.
Under this paradigm, entities are assigned an identifier (ID).
Relationships, on the other hand, only do have foreign key columns (in
the form of entity_id
). For m:n relationships, those column pairs
(entity_one_id
, entity_two_id
) form the composite primary key.
So far, fine and dandy; this use-case is supported via Django’s ManyToManyField. However, it’s fairly standard for the relationships to have additional attributes.
This can still be solved the using the through attribute.
That would work—except Django doesn’t support marking
those two ForeignKey
s as a PrimaryKey. Relevant ticket 373 is open for
over a decade and a half by
now, so better not to rely
on this changing.
If you are not a database purist, the easy solution is to add a standalone ID field. This will be useless except for making Django happy, but it will solve our problem.
For existing tables with existing data, that’s not as easy as it sounds.
The Solution
I assume models generated using
inspectdb
and you wanting to have them managed by Django, hence removing the
managed=False
.
For composite keys, this will leave you with a definition for unique column for the composite keys and one of them retaining the primary_key property:
class Relation(models.Model):
entity1_id = models.IntegerField(primary_key=True)
entity2_id = models.IntegerField()
attribute = models.IntegerField()
class Meta:
db_table = "relation_table"
unique_together = (("entity1_id", "entity2_id"),)
Add a New Integer Field
First, let’s add a new field for the identifier. I’ve opted to make it explicit it’s there for Django only.
class Relation(models.Model):
entity1_id = models.IntegerField(primary_key=True)
entity2_id = models.IntegerField()
attribute = models.IntegerField()
django_id = models.IntegerField(null=True, blank=True)
class Meta:
db_table = "relation_table"
unique_together = (("entity1_id", "entity2_id"),)
Because of existing data, it needs to be nullable.
Create a migration, deploy it & run it using manage.py makemigrations
and manage.py migrate
.
I had some issues fiddling with primary_key
attribute depending on
Django version; it may or may not be safe to remove it at this point.
You don’t want any changes to primary key for this
migration; experiment, or manually remove it from the generated code.
Fill In Data
Now we need to fill in missing IDs. I’ve opted to do this using a data
migration. Create one using python manage.py makemigrations --empty app name
In there, iterate through all fields and add a number. It looks like this:
from django.db import migrations
def fill_relation_table_ids(apps, schema_editor):
Relation = apps.get_model("appname", "Relation")
i = 1
for relation in Relation.objects.all():
# can\'t do get() or save on an instance as that would check for
integrity of the wrongly-assumed primary key on entity1_id
Relation.objects.filter(
entity1_id = relation.entity1_id, entity2_id = relation.entity2_id
).update(django_id=i)
i += 1
class Migration(migrations.Migration):
dependencies = [
("app", "0050_something"),
]
operations = [
migrations.RunPython(fill_relation_table_ids),
]
Note that I can afford this strategy as I can afford a downtime between running this migration and the following schema updates. If you can’t and it’s a heavily-written table, you may need to do more magic stored procedures. This is easier for databases that handle auto increment generators independently from tables, which is unfortunately not the case for MySQL.
Apply Custom SQL Magic
Now comes the fun part. Just modifying django_id to AutoField does not work because the generated SQL looks like this (as of Django 2.0):
BEGIN;
ALTER TABLE `relation_table ` MODIFY `django_id` integer
AUTO_INCREMENT NULL;
ALTER TABLE `relation_table ` MODIFY `django_id` integer
AUTO_INCREMENT NOT NULL;
ALTER TABLE `relation_table ` ADD CONSTRAINT
relation_table_django_id_5653b681_uniq UNIQUE (`django_id`);
ALTER TABLE `relation_table ` DROP PRIMARY KEY;
ALTER TABLE `relation_table ` ADD CONSTRAINT
`relation_table_django_id_5653b681_pk` PRIMARY KEY
(`django_id`);
COMMIT;
Running it will cause the migration to fail:
django.db.utils.DataError: (1171, \'All parts of a PRIMARY KEY must be
NOT NULL; if you need NULL in a key, use UNIQUE instead\')
Let’s fix the database manually:
- Drop old composite primary key
- Make django_id our new sole primary key column and make it an automatically incremented integer
- Set the increment generator to higher value than the highest integer in the table. Because there is no native support to do that in MySQL, this is done by creating a stored procedure that’s also dropped as part of the migration
Again, start with an empty migration python manage.py makemigrations --empty app name
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("app", "0051_datafill"),
]
operations = [
migrations.RunSQL("ALTER TABLE `relation_table` DROP PRIMARY
KEY;"),
migrations.RunSQL(
"ALTER TABLE `relation_table` MODIFY django_id INTEGER
AUTO_INCREMENT NOT NULL PRIMARY KEY;"
),
migrations.RunSQL(
"""
SET @auto_incr = (SELECT IFNULL(MAX(django_id) + 1, 1) FROM
relation_table);
SET @statement_string = CONCAT('ALTER TABLE relation_table
AUTO_INCREMENT=', @auto_incr);
PREPARE statement_command FROM @statement_string;
EXECUTE statement_command;
DEALLOCATE PREPARE statement_command;
"""
),
]
Wohoo, our table works!
Fix Django Migrations
One little detail: we have done all of this behind Django’s back. If you
would run the migrations above and then run python manage.py migrate
again, you’d see following message
Running migrations:
No migrations to apply.
Your models have changes that are not yet reflected in a migration, and
so won't be applied.
Run 'manage.py makemigrations' to make new migrations, and then re-run
'manage.py migrate' to apply them.
Altering django_id
to AutoField
and calling it a day will still result
in the issue above.
The trick? Keep it IntegerField
first, but NOT NULL
. And with PRIMARY KEY
.
class Relation(models.Model):
entity1_id = models.IntegerField()
entity2_id = models.IntegerField()
attribute = models.IntegerField()
django_id = models.IntegerField(primary_key=True)
class Meta:
db_table = "relation_table"
unique_together = (("entity1_id", "entity2_id"),)
Generate migrations:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("app", "0052_customsql"),
]
operations = [
migrations.AlterField(
model_name="relation",
name="django_id",
field=models.IntegerField(default=1, primary_key=True,
serialize=False),
preserve_default=False,
),
migrations.AlterField(
model_name="relation",
name="entity1_id",
field=models.IntegerField(),
),
]
If you’d have primary key problems, you may need to add obligatory
migrations.RunSQL("ALTER TABLE relation_table DROP PRIMARY KEY;")
at
the beginning.
Now is the time for final touches! We can finally have an AutoField
.
class Relation(models.Model):
entity1_id = models.IntegerField()
entity2_id = models.IntegerField()
attribute = models.IntegerField()
django_id = models.AutoField(primary_key=True)
class Meta:
db_table = "relation_table"
unique_together = (("entity1_id", "entity2_id"),)
Generate a simple migration and you should be good to go. Push all of this to production and enjoy your migrate command, you’ve earned it.
Next step is left as an exercise for the reader: migrate the entity1_id
and entity2_id
to a ForeignKey
.
Conclusion
Composite primary keys are a common occurrence in databases that are designed independently from the application. Django doesn’t handle them well and it is advised to add an additional numerical primary key.
I hope the cookbook above gives you sufficient blueprint to survive that migration well.
The above can be done as part of a single migration, but be cautious. In case if problems, you’ll end up with half-applied migration and that’s never fun. If you have a site that you can take down for maintenance, however, it helps to keep the code tidy: here is the example migration.
Django’s migration framework is one of the nicest parts of Django for me. Keeping track of schema changes is a hard problem. Django’s mini-DSL which is simple and database backend independent hits a sweet spot between simplicity and long-term sustainability and maintenance.
While I really wish #373 would be done, this is a survivable workaround.
-
I am not saying it was a virtue. It was a different use-case: multiple applications integrated through a database. Today, storage is more abundant and more importantly, networks are both faster and do have lower latency, at least within a datacenter. This allows for more decoupling and integration through APIs instead of shared data layers. This allows data design to happen at the system layer, not a storage layer ↩︎
-
Not exactly how it happened for this particular application, but resulting problems are same ↩︎