Bài 2.2: Advanced Model Fields - ArrayField, HStoreField và Field Options

**Series Navigation:** - [Bài 2.1: JSONField và Advanced Field Types](/django-advance/advanced-model-fields-phan-1) - **Bài 2.2 (bài này): ArrayField, HStoreField và Field Options** 👈 - [Bài 2.3: Custom Model Fields Creation](/django-advance/advanced-model-fields-phan-3) - [Bài 2.4: Computed Fields và Model Mixins](/django-advance/advanced-model-fields-phan-4)

Mục Tiêu Bài 2.2

Sau khi hoàn thành Bài 2.2 này, bạn sẽ:

  • ✅ Làm việc thành thạo với ArrayField (PostgreSQL)
  • ✅ Sử dụng HStoreField cho key-value storage
  • ✅ Hiểu rõ field options: null, blank, default, choices
  • ✅ Tạo và sử dụng custom validators
  • ✅ Implement field-level và model-level validation
  • ✅ Biết best practices cho field configuration

2. ArrayField - Store Arrays Efficiently

Giới Thiệu ArrayField

ArrayField là PostgreSQL-specific field cho phép store arrays of values. Rất hữu ích cho simple lists không cần separate table.

Tại Sao Dùng ArrayField?

  • ✅ No need for separate ManyToMany table
  • ✅ Fast queries với GIN indexes
  • ✅ Simple data structure
  • ✅ Perfect for tags, categories, lists

Khi KHÔNG Dùng:

  • ❌ Need complex queries on array items
  • ❌ Array items cần foreign keys
  • ❌ Not using PostgreSQL
  • ❌ Need referential integrity

Basic Usage

Định Nghĩa ArrayField

from django.contrib.postgres.fields import ArrayFieldfrom django.db import models class Article(models.Model):    title = models.CharField(max_length=200)    author = models.ForeignKey('auth.User', on_delete=models.CASCADE)        # Array of strings    tags = ArrayField(        models.CharField(max_length=50),        size=10,  # Optional: max array length        default=list,        blank=True    )        # Array of integers    related_article_ids = ArrayField(        models.IntegerField(),        default=list,        blank=True    )        # Array of choices    STATUS_CHOICES = [        ('draft', 'Draft'),        ('review', 'In Review'),        ('published', 'Published'),    ]        status_history = ArrayField(        models.CharField(max_length=20, choices=STATUS_CHOICES),        default=list,        blank=True    )

Important - Base Field:

# ✅ GOOD: Specify base field properlytags = ArrayField(models.CharField(max_length=50)) # ❌ BAD: Don't forget base fieldtags = ArrayField()  # Error! # ✅ GOOD: Use callable for defaulttags = ArrayField(models.CharField(max_length=50), default=list) # ❌ BAD: Mutable defaulttags = ArrayField(models.CharField(max_length=50), default=[])  # Shared!

Storing Data

# Create với array dataarticle = Article.objects.create(    title="Django Advanced",    author=user,    tags=["django", "python", "web", "api"]) # Update arrayarticle.tags.append("rest")article.save() # Replace entire arrayarticle.tags = ["django", "drf", "advanced"]article.save() # Clear arrayarticle.tags = []article.save() # Using update()Article.objects.filter(pk=article.pk).update(    tags=["new", "tags"])

Array Operations

# Add item if not existsif "python" not in article.tags:    article.tags.append("python")    article.save() # Remove itemif "draft" in article.tags:    article.tags.remove("draft")    article.save() # Extend arrayarticle.tags.extend(["machine-learning", "ai"])article.save() # Sort arrayarticle.tags.sort()article.save() # Unique values onlyarticle.tags = list(set(article.tags))article.save()

Querying ArrayField

Contains Queries

# Contains specific valueArticle.objects.filter(tags__contains=["django"]) # Contains any of valuesfrom django.contrib.postgres.aggregates import ArrayAgg Article.objects.filter(    tags__overlap=["django", "python"]) # Contains all valuesArticle.objects.filter(    tags__contains=["django", "python", "api"])

Array Operations

# Array lengthfrom django.db.models import Ffrom django.contrib.postgres.fields.array import ArrayFieldfrom django.db.models.functions import Cast Article.objects.annotate(    tag_count=Cast(        F('tags'),        output_field=models.IntegerField()    )).filter(tag_count__gte=3) # Array index access (0-based)Article.objects.filter(tags__0="django")  # First elementArticle.objects.filter(tags__1="python")  # Second element # Array sliceArticle.objects.filter(    tags__0_2=["django", "python"]  # First 2 elements)

Advanced Queries

# Contained by (is subset of)Article.objects.filter(    tags__contained_by=["django", "python", "web", "api", "extra"]) # Not emptyfrom django.db.models import Q Article.objects.exclude(tags=[])# OrArticle.objects.filter(~Q(tags=[])) # Empty arrayArticle.objects.filter(tags=[]) # Distinct tags across all articlesfrom django.contrib.postgres.aggregates import ArrayAgg all_tags = Article.objects.aggregate(    all_tags=ArrayAgg('tags', distinct=True))['all_tags']

ArrayField Best Practices

1. Indexing for Performance

from django.contrib.postgres.indexes import GinIndex class Article(models.Model):    tags = ArrayField(models.CharField(max_length=50))        class Meta:        indexes = [            GinIndex(fields=['tags']),        ]

Performance Tips:

# ✅ GOOD: Use GIN index for containmentclass Meta:    indexes = [        GinIndex(fields=['tags']),    ] # ✅ GOOD: Limit array sizetags = ArrayField(    models.CharField(max_length=50),    size=20  # Max 20 items) # ⚠️ WARNING: Large arrays slow down queries# Keep arrays under 100 items typically

2. Validation

from django.core.exceptions import ValidationError class Article(models.Model):    tags = ArrayField(        models.CharField(max_length=50),        size=10    )        def clean(self):        # Validate array length        if len(self.tags) > 10:            raise ValidationError("Maximum 10 tags allowed")                # Validate uniqueness        if len(self.tags) != len(set(self.tags)):            raise ValidationError("Duplicate tags not allowed")                # Validate each item        for tag in self.tags:            if not tag.isalnum():                raise ValidationError(f"Tag '{tag}' contains invalid characters")            if len(tag) < 2:                raise ValidationError(f"Tag '{tag}' is too short")

3. Helper Methods

class Article(models.Model):    tags = ArrayField(models.CharField(max_length=50), default=list)        def add_tag(self, tag):        """Add tag if not exists"""        tag = tag.lower().strip()        if tag not in self.tags:            self.tags.append(tag)            self.save(update_fields=['tags'])        def remove_tag(self, tag):        """Remove tag if exists"""        tag = tag.lower().strip()        if tag in self.tags:            self.tags.remove(tag)            self.save(update_fields=['tags'])        def has_tag(self, tag):        """Check if tag exists"""        return tag.lower().strip() in self.tags        def get_tag_count(self):        """Get number of tags"""        return len(self.tags)        @classmethod    def get_popular_tags(cls, limit=10):        """Get most used tags"""        from django.contrib.postgres.aggregates import ArrayAgg        from django.db.models import Count                # Flatten all tags        tags = cls.objects.values_list('tags', flat=True)        tag_list = [tag for tags_array in tags for tag in tags_array]                # Count occurrences        from collections import Counter        return Counter(tag_list).most_common(limit)

Real-World Use Cases

Use Case 1: Product Features

class Product(models.Model):    name = models.CharField(max_length=200)        # Multiple features    features = ArrayField(        models.CharField(max_length=100),        default=list,        help_text="Product features"    )    # ["Waterproof", "Wireless", "Fast Charging", "Touch Screen"]        # Multiple colors available    available_colors = ArrayField(        models.CharField(max_length=30),        default=list    )    # ["Red", "Blue", "Black", "White"]        # Compatible models    compatible_with = ArrayField(        models.CharField(max_length=50),        default=list,        blank=True    ) # Query products with specific featureProduct.objects.filter(features__contains=["Waterproof"]) # Query products available in specific colorsProduct.objects.filter(    available_colors__overlap=["Red", "Blue"])

Use Case 2: User Skills & Interests

class UserProfile(models.Model):    user = models.OneToOneField('auth.User', on_delete=models.CASCADE)        # Skills    skills = ArrayField(        models.CharField(max_length=50),        default=list,        help_text="User skills"    )        # Interests    interests = ArrayField(        models.CharField(max_length=50),        default=list    )        # Languages spoken    languages = ArrayField(        models.CharField(max_length=30),        default=list    ) # Find users with Python skillUserProfile.objects.filter(skills__contains=["Python"]) # Find users interested in Django AND PythonUserProfile.objects.filter(    interests__contains=["Django", "Python"]) # Find multilingual usersUserProfile.objects.annotate(    language_count=Cast(F('languages'), models.IntegerField())).filter(language_count__gte=3)

Use Case 3: Multi-Category System

class BlogPost(models.Model):    title = models.CharField(max_length=200)    content = models.TextField()        # Multiple categories    categories = ArrayField(        models.CharField(max_length=50),        default=list,        size=5  # Max 5 categories    )    # ["Technology", "Programming", "Django", "Python"]        def add_category(self, category):        if len(self.categories) >= 5:            raise ValidationError("Maximum 5 categories allowed")        if category not in self.categories:            self.categories.append(category)            self.save() # Get posts in multiple categoriesBlogPost.objects.filter(    categories__overlap=["Technology", "Programming"]) # Get all unique categoriesfrom django.contrib.postgres.aggregates import ArrayAggfrom django.db.models import Func all_categories = BlogPost.objects.aggregate(    categories=ArrayAgg('categories'))unique_categories = set(    cat for cats in all_categories['categories'] if cats     for cat in cats)

3. HStoreField - Key-Value Storage

Giới Thiệu HStoreField

HStoreField (PostgreSQL) stores key-value pairs as a single field. Simpler than JSON cho flat key-value data.

HStoreField vs JSONField:

Feature HStoreField JSONField
Data Type Key-value (flat) Any JSON (nested)
Value Types Strings only Any JSON type
Performance Fast for simple KV Good for complex
Use Case Settings, metadata Complex structures
Nesting No Yes
PostgreSQL Only Yes No (Django 3.1+)

Khi Dùng HStoreField:

  • ✅ Simple key-value pairs
  • ✅ All values are strings
  • ✅ No nesting needed
  • ✅ PostgreSQL only is OK

Khi Dùng JSONField Instead:

  • ✅ Need nested data
  • ✅ Need different value types
  • ✅ Database portability
  • ✅ Complex queries

Basic Usage

Setup HStoreField

from django.contrib.postgres.fields import HStoreFieldfrom django.db import models class Product(models.Model):    name = models.CharField(max_length=200)        # Simple key-value pairs    specifications = HStoreField(default=dict, blank=True)    # {"brand": "Sony", "warranty": "2 years", "color": "Black"}        # Metadata    metadata = HStoreField(default=dict, blank=True)

Enable HStore Extension:

# In migrationfrom django.contrib.postgres.operations import HStoreExtension class Migration(migrations.Migration):    operations = [        HStoreExtension(),        # ... other operations    ]

Storing Data

# Create với hstore dataproduct = Product.objects.create(    name="Smartphone",    specifications={        "brand": "Samsung",        "model": "Galaxy S21",        "color": "Black",        "storage": "128GB",        "warranty": "1 year"    }) # Update valuesproduct.specifications["warranty"] = "2 years"product.specifications["condition"] = "New"product.save() # Delete keydel product.specifications["condition"]product.save() # Get value với defaultwarranty = product.specifications.get("warranty", "No warranty")

Important - All Values Are Strings:

# ⚠️ WARNING: Values are always strings!product.specifications = {    "price": "999",      # String, not int    "in_stock": "true",  # String, not boolean    "weight": "0.5"      # String, not float} # Need to convert when readingprice = int(product.specifications.get("price", "0"))in_stock = product.specifications.get("in_stock") == "true"weight = float(product.specifications.get("weight", "0"))

Querying HStoreField

Key Existence

# Has keyProduct.objects.filter(specifications__has_key="warranty") # Has multiple keysProduct.objects.filter(    specifications__has_keys=["brand", "model"]) # Has any of keysProduct.objects.filter(    specifications__has_any_keys=["warranty", "guarantee"])

Value Queries

# Exact value matchProduct.objects.filter(specifications__brand="Samsung") # Multiple conditionsProduct.objects.filter(    specifications__brand="Samsung",    specifications__color="Black") # Contains (subset)Product.objects.filter(    specifications__contains={"brand": "Samsung"}) # Contained by (superset)Product.objects.filter(    specifications__contained_by={        "brand": "Samsung",        "color": "Black",        "extra": "value"    })

Advanced Queries

# Keys queryProduct.objects.filter(specifications__keys=["brand", "model", "color"]) # Values queryProduct.objects.filter(    specifications__values__contains=["Samsung"]) # Combined queriesfrom django.db.models import Q Product.objects.filter(    Q(specifications__brand="Samsung") | Q(specifications__brand="Apple"))

HStoreField Best Practices

1. Type Conversion Helpers

class Product(models.Model):    specifications = HStoreField(default=dict)        def get_spec(self, key, default=None, convert_to=None):        """Get specification với optional type conversion"""        value = self.specifications.get(key, default)                if value is None:            return default                if convert_to == int:            try:                return int(value)            except (ValueError, TypeError):                return default        elif convert_to == float:            try:                return float(value)            except (ValueError, TypeError):                return default        elif convert_to == bool:            return value.lower() in ('true', '1', 'yes')                return value        def set_spec(self, key, value):        """Set specification (converts to string)"""        self.specifications[key] = str(value)        self.save(update_fields=['specifications'])        @property    def price(self):        return self.get_spec('price', 0, convert_to=int)        @property    def in_stock(self):        return self.get_spec('in_stock', False, convert_to=bool) # Usageproduct = Product.objects.first()print(product.price)  # Returns intprint(product.in_stock)  # Returns bool

2. Schema Documentation

class Product(models.Model):    # Document expected keys trong docstring hoặc comment    specifications = HStoreField(default=dict, blank=True)    """    Expected keys:    - brand: str    - model: str    - color: str    - storage: str (e.g., "128GB")    - warranty: str (e.g., "2 years")    - price: str (numeric string)    - in_stock: str ("true" or "false")    """

3. Validation

class Product(models.Model):    specifications = HStoreField(default=dict)        REQUIRED_SPECS = ['brand', 'model']    VALID_COLORS = ['Black', 'White', 'Blue', 'Red']        def clean(self):        # Check required keys        for key in self.REQUIRED_SPECS:            if key not in self.specifications:                raise ValidationError(f"Missing required specification: {key}")                # Validate specific values        if 'color' in self.specifications:            if self.specifications['color'] not in self.VALID_COLORS:                raise ValidationError(f"Invalid color")                # Validate numeric strings        if 'price' in self.specifications:            try:                price = float(self.specifications['price'])                if price < 0:                    raise ValidationError("Price cannot be negative")            except ValueError:                raise ValidationError("Price must be a number")

Real-World Use Cases

Use Case 1: Multi-Language Translations

class Article(models.Model):    slug = models.SlugField(unique=True)        # Translations as key-value    titles = HStoreField(default=dict)    # {"en": "Hello World", "vi": "Xin chào", "ja": "こんにちは"}        descriptions = HStoreField(default=dict)        def get_title(self, language='en'):        return self.titles.get(language, self.titles.get('en', ''))        def set_title(self, language, title):        self.titles[language] = title        self.save(update_fields=['titles']) # Query articles with Vietnamese translationArticle.objects.filter(titles__has_key='vi') # Query by specific language titleArticle.objects.filter(titles__vi__icontains='django')

Use Case 2: User Settings

class UserSettings(models.Model):    user = models.OneToOneField('auth.User', on_delete=models.CASCADE)        # All settings as key-value    settings = HStoreField(default=dict)    # {    #     "theme": "dark",    #     "font_size": "14",    #     "email_notifications": "true",    #     "language": "en"    # }        DEFAULT_SETTINGS = {        "theme": "light",        "font_size": "12",        "email_notifications": "true",        "language": "en"    }        def get_setting(self, key, default=None):        return self.settings.get(key,             self.DEFAULT_SETTINGS.get(key, default))        def update_setting(self, key, value):        self.settings[key] = str(value)        self.save(update_fields=['settings'])        @property    def theme(self):        return self.get_setting('theme')        @property    def font_size(self):        return int(self.get_setting('font_size', '12'))

Tóm Tắt Bài 2.2

Trong Bài 2.2 này, bạn đã học:

ArrayField:

  • Store arrays efficiently trong PostgreSQL
  • Querying: contains, overlap, array operations
  • Indexing với GIN indexes
  • Best practices và validation
  • Real-world use cases: tags, features, skills

HStoreField:

  • Key-value storage cho flat data
  • All values are strings (need conversion)
  • Querying keys và values
  • HStore vs JSON comparison
  • Use cases: translations, settings

Performance:

  • Proper indexing strategies
  • When to use each field type
  • Size limitations
  • Query optimization

Best Practices:

  • Helper methods cho type conversion
  • Validation patterns
  • Schema documentation
  • Error handling

Bài Tập Thực Hành

Bài Tập 1: Tag System với ArrayField

Build a complete tagging system:

Requirements:

  1. Articles có multiple tags
  2. Tags are unique per article
  3. Maximum 10 tags per article
  4. Tags must be lowercase, alphanumeric
  5. Get popular tags across all articles
  6. Search articles by tags

Tasks:

  • Define Article model với tags ArrayField
  • Add validation cho tags
  • Implement helper methods: add_tag, remove_tag, has_tag
  • Create method to get popular tags
  • Add GIN index
  • Write queries to search by tags

Bài Tập 2: Product Specifications với HStoreField

Create a flexible product specification system:

Requirements:

  1. Products have dynamic specifications
  2. Required specs: brand, model
  3. Optional specs: color, size, weight, warranty
  4. Price must be numeric
  5. In_stock must be boolean string
  6. Get products by specific specifications

Tasks:

  • Define Product model với specifications HStoreField
  • Add validation cho required specs
  • Implement helper methods: get_spec, set_spec với type conversion
  • Add properties: price, in_stock
  • Query products by brand
  • Find products with warranty

Bài Tập 3: Multi-Language Content

Build a translation system:

Requirements:

  1. Articles support multiple languages
  2. Each language: title, description, content
  3. Default language: English
  4. Fallback to English if translation missing
  5. Get available languages for article
  6. Search by language-specific title

Tasks:

  • Choose between JSONField or HStoreField (justify choice)
  • Define Article model
  • Implement get_translation method
  • Add set_translation method
  • Query articles với Vietnamese translation
  • Count articles per language

Next Steps

Trong Bài 2.3, bạn sẽ học:

  • Creating custom model fields from scratch
  • Field validation hooks
  • Custom field types for business logic
  • Reusable field components
  • Field serialization và deserialization

Navigation: