How to Recover From Composite Primary Key With Django and MySQL
As part of my Graveyard work, I run into Django’s inability to handle composite primary keys. Recovery was way harder than I thought. Here’s the cookbook.
Back when I was young, relational databases were designed, not generated 1 . The golden standard for such design was entity-relationship model created for the application and then beaten to its appropriate normal form 2.
Under this paradigm, entities are assigned an identifier (ID).
Relationships, on the other hand, only do have foreign key columns (in
the form of
entity_id). For m:n relationships, those column pairs
entity_two_id) form the composite primary key.
So far, fine and dandy; this use-case is supported via Django’s ManyToManyField. However, it’s fairly standard for the relationships to have additional attributes.
This can still be solved the using the through attribute.
That would work—except Django doesn’t support marking
ForeignKeys as a PrimaryKey. Relevant ticket 373 is open for
over a decade and a half by
now, so better not to rely
on this changing.
If you are not a database purist, the easy solution is to add a standalone ID field. This will be useless except for making Django happy, but it will solve our problem.
For existing tables with existing data, that’s not as easy as it sounds.
I assume models generated using
and you wanting to have them managed by Django, hence removing the
For composite keys, this will leave you with a definition for unique column for the composite keys and one of them retaining the primary_key property:
class Relation(models.Model): entity1_id = models.IntegerField(primary_key=True) entity2_id = models.IntegerField() attribute = models.IntegerField() class Meta: db_table = "relation_table" unique_together = (("entity1_id", "entity2_id"),)
Add a New Integer Field
First, let’s add a new field for the identifier. I’ve opted to make it explicit it’s there for Django only.
class Relation(models.Model): entity1_id = models.IntegerField(primary_key=True) entity2_id = models.IntegerField() attribute = models.IntegerField() django_id = models.IntegerField(null=True, blank=True) class Meta: db_table = "relation_table" unique_together = (("entity1_id", "entity2_id"),)
Because of existing data, it needs to be nullable.
Create a migration, deploy it & run it using
I had some issues fiddling with
primary_key attribute depending on
Django version; it may or may not be safe to remove it at this point.
You don’t want any changes to primary key for this
migration; experiment, or manually remove it from the generated code.
Fill In Data
Now we need to fill in missing IDs. I’ve opted to do this using a data
migration. Create one using
python manage.py makemigrations --empty app name
In there, iterate through all fields and add a number. It looks like this:
from django.db import migrations def fill_relation_table_ids(apps, schema_editor): Relation = apps.get_model("appname", "Relation") i = 1 for relation in Relation.objects.all(): # can\'t do get() or save on an instance as that would check for integrity of the wrongly-assumed primary key on entity1_id Relation.objects.filter( entity1_id = relation.entity1_id, entity2_id = relation.entity2_id ).update(django_id=i) i += 1 class Migration(migrations.Migration): dependencies = [ ("app", "0050_something"), ] operations = [ migrations.RunPython(fill_relation_table_ids), ]
Note that I can afford this strategy as I can afford a downtime between running this migration and the following schema updates. If you can’t and it’s a heavily-written table, you may need to do more magic stored procedures. This is easier for databases that handle auto increment generators independently from tables, which is unfortunately not the case for MySQL.
Apply Custom SQL Magic
Now comes the fun part. Just modifying django_id to AutoField does not work because the generated SQL looks like this (as of Django 2.0):
BEGIN; ALTER TABLE `relation_table ` MODIFY `django_id` integer AUTO_INCREMENT NULL; ALTER TABLE `relation_table ` MODIFY `django_id` integer AUTO_INCREMENT NOT NULL; ALTER TABLE `relation_table ` ADD CONSTRAINT relation_table_django_id_5653b681_uniq UNIQUE (`django_id`); ALTER TABLE `relation_table ` DROP PRIMARY KEY; ALTER TABLE `relation_table ` ADD CONSTRAINT `relation_table_django_id_5653b681_pk` PRIMARY KEY (`django_id`); COMMIT;
Running it will cause the migration to fail:
django.db.utils.DataError: (1171, \'All parts of a PRIMARY KEY must be NOT NULL; if you need NULL in a key, use UNIQUE instead\')
Let’s fix the database manually:
- Drop old composite primary key
- Make django_id our new sole primary key column and make it an automatically incremented integer
- Set the increment generator to higher value than the highest integer in the table. Because there is no native support to do that in MySQL, this is done by creating a stored procedure that’s also dropped as part of the migration
Again, start with an empty migration
python manage.py makemigrations --empty app name
from django.db import migrations class Migration(migrations.Migration): dependencies = [ ("app", "0051_datafill"), ] operations = [ migrations.RunSQL("ALTER TABLE `relation_table` DROP PRIMARY KEY;"), migrations.RunSQL( "ALTER TABLE `relation_table` MODIFY django_id INTEGER AUTO_INCREMENT NOT NULL PRIMARY KEY;" ), migrations.RunSQL( """ SET @auto_incr = (SELECT IFNULL(MAX(django_id) + 1, 1) FROM relation_table); SET @statement_string = CONCAT('ALTER TABLE relation_table AUTO_INCREMENT=', @auto_incr); PREPARE statement_command FROM @statement_string; EXECUTE statement_command; DEALLOCATE PREPARE statement_command; """ ), ]
Wohoo, our table works!
Fix Django Migrations
One little detail: we have done all of this behind Django’s back. If you
would run the migrations above and then run
python manage.py migrate
again, you’d see following message
Running migrations: No migrations to apply. Your models have changes that are not yet reflected in a migration, and so won't be applied. Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
AutoField and calling it a day will still result
in the issue above.
The trick? Keep it
IntegerField first, but
NOT NULL. And with
class Relation(models.Model): entity1_id = models.IntegerField() entity2_id = models.IntegerField() attribute = models.IntegerField() django_id = models.IntegerField(primary_key=True) class Meta: db_table = "relation_table" unique_together = (("entity1_id", "entity2_id"),)
from django.db import migrations class Migration(migrations.Migration): dependencies = [ ("app", "0052_customsql"), ] operations = [ migrations.AlterField( model_name="relation", name="django_id", field=models.IntegerField(default=1, primary_key=True, serialize=False), preserve_default=False, ), migrations.AlterField( model_name="relation", name="entity1_id", field=models.IntegerField(), ), ]
If you’d have primary key problems, you may need to add obligatory
migrations.RunSQL("ALTER TABLE relation_table DROP PRIMARY KEY;") at
Now is the time for final touches! We can finally have an
class Relation(models.Model): entity1_id = models.IntegerField() entity2_id = models.IntegerField() attribute = models.IntegerField() django_id = models.AutoField(primary_key=True) class Meta: db_table = "relation_table" unique_together = (("entity1_id", "entity2_id"),)
Generate a simple migration and you should be good to go. Push all of this to production and enjoy your migrate command, you’ve earned it.
Next step is left as an exercise for the reader: migrate the
entity2_id to a
Composite primary keys are a common occurrence in databases that are designed independently from the application. Django doesn’t handle them well and it is advised to add an additional numerical primary key.
I hope the cookbook above gives you sufficient blueprint to survive that migration well.
The above can be done as part of a single migration, but be cautious. In case if problems, you’ll end up with half-applied migration and that’s never fun. If you have a site that you can take down for maintenance, however, it helps to keep the code tidy: here is the example migration.
Django’s migration framework is one of the nicest parts of Django for me. Keeping track of schema changes is a hard problem. Django’s mini-DSL which is simple and database backend independent hits a sweet spot between simplicity and long-term sustainability and maintenance.
While I really wish #373 would be done, this is a survivable workaround.
I am not saying it was a virtue. It was a different use-case: multiple applications integrated through a database. Today, storage is more abundant and more importantly, networks are both faster and do have lower latency, at least within a datacenter. This allows for more decoupling and integration through APIs instead of shared data layers. This allows data design to happen at the system layer, not a storage layer ↩︎
Not exactly how it happened for this particular application, but resulting problems are same ↩︎
Published in Notes and tagged APIs • Python