Skip to main content

Django: Data Migrations

Django: Data Migrations

Data migrations in Django are specialized migrations that modify or populate existing data in the database, often in response to schema changes or data restructuring. Unlike schema migrations, which alter the database structure (e.g., adding fields or tables), data migrations focus on transforming or initializing data within the existing schema. Integrated with Django’s Object-Relational Mapping (ORM), data migrations ensure data consistency during application evolution. This tutorial explores Django data migrations, covering creation, execution, best practices, and practical applications for robust web applications.


01. Why Use Data Migrations?

Data migrations are essential when introducing new fields, changing data formats, or migrating legacy data to align with updated models. They prevent data loss or inconsistencies during schema changes, such as setting default values for new fields or transforming existing records. Data migrations are critical for applications like e-commerce platforms, content management systems, or analytics dashboards where data integrity is paramount.

Example: Basic Data Migration

# myapp/models.py (Initial)
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)

    def __str__(self):
        return self.name
# myapp/models.py (Updated)
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    is_active = models.BooleanField(default=True)  # New field

    def __str__(self):
        return self.name
# Generate migration
python manage.py makemigrations

Output:

Migrations for 'myapp':
  myapp/migrations/0002_product_is_active.py
    - Add field is_active to product
# myapp/migrations/0002_product_is_active.py (Edited)
from django.db import migrations, models

def set_is_active(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        product.is_active = product.price > 0  # Set based on price
        product.save()

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0001_initial'),
    ]
    operations = [
        migrations.AddField(
            model_name='Product',
            name='is_active',
            field=models.BooleanField(default=True),
        ),
        migrations.RunPython(set_is_active),
    ]
# Apply migration
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0002_product_is_active... OK

Explanation:

  • RunPython - Executes a function to populate the new is_active field based on existing data.
  • apps.get_model - Safely accesses the model’s historical version during migration.

02. Key Data Migration Concepts and Tools

Django’s migration framework provides tools to create and manage data migrations, ensuring data transformations are applied consistently. The table below summarizes key concepts and their applications:

Concept/Tool Description Use Case
RunPython Execute custom Python code in migrations Transform or populate data
RunSQL Execute raw SQL for data changes Complex data operations
Empty Migrations Create blank migrations for data-only changes Standalone data transformations
Reversible Migrations Define forward and backward operations Support rollback scenarios


2.1 Using RunPython for Data Migration

Example: Migrating Data for a New Relationship

# myapp/models.py
from django.db import models

class Category(models.Model):
    name = models.CharField(max_length=50)

    def __str__(self):
        return self.name

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    category = models.ForeignKey(Category, on_delete=models.SET_NULL, null=True)  # New field

    def __str__(self):
        return self.name
python manage.py makemigrations
# myapp/migrations/0002_category_product_category.py (Edited)
from django.db import migrations, models
import django.db.models.deletion

def assign_default_category(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    Category = apps.get_model('myapp', 'Category')
    default_category, created = Category.objects.get_or_create(name='General')
    for product in Product.objects.all():
        product.category = default_category
        product.save()

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0001_initial'),
    ]
    operations = [
        migrations.CreateModel(
            name='Category',
            fields=[
                ('id', models.AutoField(auto_created=True, primary_key=True, serialize=False, verbose_name='ID')),
                ('name', models.CharField(max_length=50)),
            ],
        ),
        migrations.AddField(
            model_name='Product',
            name='category',
            field=models.ForeignKey(null=True, on_delete=django.db.models.deletion.SET_NULL, to='myapp.category'),
        ),
        migrations.RunPython(assign_default_category),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0002_category_product_category... OK

Explanation:

  • assign_default_category - Assigns a default category to existing products.
  • get_or_create - Ensures the 'General' category exists.

2.2 Using RunSQL for Data Migration

Example: Raw SQL Data Migration

# myapp/models.py
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    status = models.CharField(max_length=20, default='active')  # New field
python manage.py makemigrations
# myapp/migrations/0003_product_status.py (Edited)
from django.db import migrations, models

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0002_category_product_category'),
    ]
    operations = [
        migrations.AddField(
            model_name='Product',
            name='status',
            field=models.CharField(max_length=20, default='active'),
        ),
        migrations.RunSQL(
            """
            UPDATE myapp_product
            SET status = CASE
                WHEN price > 500 THEN 'premium'
                ELSE 'standard'
            END;
            """,
            reverse_sql="UPDATE myapp_product SET status = 'active';"
        ),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0003_product_status... OK

Explanation:

  • RunSQL - Executes raw SQL to set status based on price.
  • reverse_sql - Defines the rollback operation to reset status.

2.3 Creating Empty Migrations

Example: Standalone Data Migration

python manage.py makemigrations --empty myapp
# myapp/migrations/0004_normalize_names.py
from django.db import migrations

def normalize_product_names(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        product.name = product.name.strip().title()
        product.save()

def reverse_normalize_names(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        product.name = product.name.lower()
        product.save()

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(normalize_product_names, reverse_normalize_names),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0004_normalize_names... OK

Explanation:

  • --empty - Creates a blank migration for data-only changes.
  • reverse_normalize_names - Provides a reversible operation for rollback.

2.4 Incorrect Data Migration

Example: Non-Reversible Data Migration

# myapp/migrations/0004_bad_migration.py (Incorrect)
from django.db import migrations

def delete_old_products(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    Product.objects.filter(price__lt=10).delete()

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(delete_old_products),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0004_bad_migration... OK
# Attempt to rollback
python manage.py migrate myapp 0003

Output:

Traceback (most recent call last):
  ...
django.db.migrations.exceptions.IrreversibleError: Operation <RunPython> in myapp.0004_bad_migration is not reversible

Explanation:

  • Deleting data without a reverse operation prevents rollback.
  • Solution: Provide a reverse function or avoid destructive operations in migrations.

03. Effective Usage

3.1 Recommended Practices

  • Always define reversible operations for data migrations.

Example: Comprehensive Data Migration

# myapp/models.py
from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)
    category = models.ForeignKey('Category', on_delete=models.SET_NULL, null=True)
    status = models.CharField(max_length=20, default='active')

    def __str__(self):
        return self.name

class Category(models.Model):
    name = models.CharField(max_length=50)

    def __str__(self):
        return self.name
python manage.py makemigrations --empty myapp
# myapp/migrations/0004_update_product_status.py
from django.db import migrations

def update_product_status(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        if product.price > 500:
            product.status = 'premium'
        elif product.price < 50:
            product.status = 'budget'
        else:
            product.status = 'standard'
        product.save()

def reverse_product_status(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    Product.objects.all().update(status='active')

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(update_product_status, reverse_product_status),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0004_update_product_status... OK
  • update_product_status - Sets status based on price ranges.
  • reverse_product_status - Resets status to default for rollback.

3.2 Practices to Avoid

  • Avoid accessing models directly in migrations; use apps.get_model.

Example: Direct Model Import in Migration

# myapp/migrations/0004_bad_migration.py (Incorrect)
from django.db import migrations
from myapp.models import Product  # Direct import

def update_names(apps, schema_editor):
    for product in Product.objects.all():
        product.name = product.name.upper()
        product.save()

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(update_names),
    ]

Output (Potential Error):

AttributeError: 'Product' model has changed since migration was created
  • Direct model imports can break if the model changes after the migration is created.
  • Solution: Use apps.get_model('myapp', 'Product') for historical model access.

04. Common Use Cases

4.1 E-Commerce Data Normalization

Normalize product names and statuses.

Example: Normalizing Product Data

python manage.py makemigrations --empty myapp
# myapp/migrations/0004_normalize_product_data.py
from django.db import migrations

def normalize_products(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        product.name = product.name.strip().title()
        product.status = 'premium' if product.price > 500 else 'standard'
        product.save()

def reverse_normalize_products(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    Product.objects.all().update(status='active')

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(normalize_products, reverse_normalize_products),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0004_normalize_product_data... OK

Explanation:

  • Normalizes name and updates status in a single migration.
  • Provides a reversible operation for rollback.

4.2 Legacy Data Migration

Migrate legacy data into a new model structure.

Example: Migrating Legacy Data

# myapp/models.py
from django.db import models

class OldProduct(models.Model):
    title = models.CharField(max_length=100)
    cost = models.IntegerField()

    def __str__(self):
        return self.title

class Product(models.Model):
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=8, decimal_places=2)

    def __str__(self):
        return self.name
python manage.py makemigrations --empty myapp
# myapp/migrations/0004_migrate_old_products.py
from django.db import migrations

def migrate_old_products(apps, schema_editor):
    OldProduct = apps.get_model('myapp', 'OldProduct')
    Product = apps.get_model('myapp', 'Product')
    for old_product in OldProduct.objects.all():
        Product.objects.create(
            name=old_product.title,
            price=old_product.cost / 100  # Convert to decimal
        )

def reverse_migrate_products(apps, schema_editor):
    OldProduct = apps.get_model('myapp', 'OldProduct')
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        OldProduct.objects.create(
            title=product.name,
            cost=int(product.price * 100)
        )

class Migration(migrations.Migration):
    dependencies = [
        ('myapp', '0003_product_status'),
    ]
    operations = [
        migrations.RunPython(migrate_old_products, reverse_migrate_products),
    ]
python manage.py migrate

Output:

Operations to perform:
  Apply all migrations: myapp
Running migrations:
  Applying myapp.0004_migrate_old_products... OK

Explanation:

  • Migrates data from OldProduct to Product, converting field types.
  • Includes a reverse operation to restore data if rolled back.

Conclusion

Django’s data migrations provide a robust framework for transforming and populating data during application evolution. Key takeaways:

  • Use RunPython or RunSQL to perform data transformations.
  • Create empty migrations with --empty for standalone data changes.
  • Ensure reversibility with reverse operations to support rollbacks.
  • Avoid direct model imports; use apps.get_model for compatibility.

With data migrations, you can maintain data integrity and adapt to changing requirements in scalable Django applications!

Comments