Bài 2.2: Advanced Model Fields - ArrayField, HStoreField và Field Options
**Series Navigation:** - [Bài 2.1: JSONField và Advanced Field Types](/django-advance/advanced-model-fields-phan-1) - **Bài 2.2 (bài này): ArrayField, HStoreField và Field Options** 👈 - [Bài 2.3: Custom Model Fields Creation](/django-advance/advanced-model-fields-phan-3) - [Bài 2.4: Computed Fields và Model Mixins](/django-advance/advanced-model-fields-phan-4)
Mục Tiêu Bài 2.2
Sau khi hoàn thành Bài 2.2 này, bạn sẽ:
- ✅ Làm việc thành thạo với ArrayField (PostgreSQL)
- ✅ Sử dụng HStoreField cho key-value storage
- ✅ Hiểu rõ field options: null, blank, default, choices
- ✅ Tạo và sử dụng custom validators
- ✅ Implement field-level và model-level validation
- ✅ Biết best practices cho field configuration
2. ArrayField - Store Arrays Efficiently
Giới Thiệu ArrayField
ArrayField là PostgreSQL-specific field cho phép store arrays of values. Rất hữu ích cho simple lists không cần separate table.
Tại Sao Dùng ArrayField?
- ✅ No need for separate ManyToMany table
- ✅ Fast queries với GIN indexes
- ✅ Simple data structure
- ✅ Perfect for tags, categories, lists
Khi KHÔNG Dùng:
- ❌ Need complex queries on array items
- ❌ Array items cần foreign keys
- ❌ Not using PostgreSQL
- ❌ Need referential integrity
Basic Usage
Định Nghĩa ArrayField
from django.contrib.postgres.fields import ArrayFieldfrom django.db import models class Article(models.Model): title = models.CharField(max_length=200) author = models.ForeignKey('auth.User', on_delete=models.CASCADE) # Array of strings tags = ArrayField( models.CharField(max_length=50), size=10, # Optional: max array length default=list, blank=True ) # Array of integers related_article_ids = ArrayField( models.IntegerField(), default=list, blank=True ) # Array of choices STATUS_CHOICES = [ ('draft', 'Draft'), ('review', 'In Review'), ('published', 'Published'), ] status_history = ArrayField( models.CharField(max_length=20, choices=STATUS_CHOICES), default=list, blank=True )Important - Base Field:
# ✅ GOOD: Specify base field properlytags = ArrayField(models.CharField(max_length=50)) # ❌ BAD: Don't forget base fieldtags = ArrayField() # Error! # ✅ GOOD: Use callable for defaulttags = ArrayField(models.CharField(max_length=50), default=list) # ❌ BAD: Mutable defaulttags = ArrayField(models.CharField(max_length=50), default=[]) # Shared!Storing Data
# Create với array dataarticle = Article.objects.create( title="Django Advanced", author=user, tags=["django", "python", "web", "api"]) # Update arrayarticle.tags.append("rest")article.save() # Replace entire arrayarticle.tags = ["django", "drf", "advanced"]article.save() # Clear arrayarticle.tags = []article.save() # Using update()Article.objects.filter(pk=article.pk).update( tags=["new", "tags"])Array Operations
# Add item if not existsif "python" not in article.tags: article.tags.append("python") article.save() # Remove itemif "draft" in article.tags: article.tags.remove("draft") article.save() # Extend arrayarticle.tags.extend(["machine-learning", "ai"])article.save() # Sort arrayarticle.tags.sort()article.save() # Unique values onlyarticle.tags = list(set(article.tags))article.save()Querying ArrayField
Contains Queries
# Contains specific valueArticle.objects.filter(tags__contains=["django"]) # Contains any of valuesfrom django.contrib.postgres.aggregates import ArrayAgg Article.objects.filter( tags__overlap=["django", "python"]) # Contains all valuesArticle.objects.filter( tags__contains=["django", "python", "api"])Array Operations
# Array lengthfrom django.db.models import Ffrom django.contrib.postgres.fields.array import ArrayFieldfrom django.db.models.functions import Cast Article.objects.annotate( tag_count=Cast( F('tags'), output_field=models.IntegerField() )).filter(tag_count__gte=3) # Array index access (0-based)Article.objects.filter(tags__0="django") # First elementArticle.objects.filter(tags__1="python") # Second element # Array sliceArticle.objects.filter( tags__0_2=["django", "python"] # First 2 elements)Advanced Queries
# Contained by (is subset of)Article.objects.filter( tags__contained_by=["django", "python", "web", "api", "extra"]) # Not emptyfrom django.db.models import Q Article.objects.exclude(tags=[])# OrArticle.objects.filter(~Q(tags=[])) # Empty arrayArticle.objects.filter(tags=[]) # Distinct tags across all articlesfrom django.contrib.postgres.aggregates import ArrayAgg all_tags = Article.objects.aggregate( all_tags=ArrayAgg('tags', distinct=True))['all_tags']ArrayField Best Practices
1. Indexing for Performance
from django.contrib.postgres.indexes import GinIndex class Article(models.Model): tags = ArrayField(models.CharField(max_length=50)) class Meta: indexes = [ GinIndex(fields=['tags']), ]Performance Tips:
# ✅ GOOD: Use GIN index for containmentclass Meta: indexes = [ GinIndex(fields=['tags']), ] # ✅ GOOD: Limit array sizetags = ArrayField( models.CharField(max_length=50), size=20 # Max 20 items) # ⚠️ WARNING: Large arrays slow down queries# Keep arrays under 100 items typically2. Validation
from django.core.exceptions import ValidationError class Article(models.Model): tags = ArrayField( models.CharField(max_length=50), size=10 ) def clean(self): # Validate array length if len(self.tags) > 10: raise ValidationError("Maximum 10 tags allowed") # Validate uniqueness if len(self.tags) != len(set(self.tags)): raise ValidationError("Duplicate tags not allowed") # Validate each item for tag in self.tags: if not tag.isalnum(): raise ValidationError(f"Tag '{tag}' contains invalid characters") if len(tag) < 2: raise ValidationError(f"Tag '{tag}' is too short")3. Helper Methods
class Article(models.Model): tags = ArrayField(models.CharField(max_length=50), default=list) def add_tag(self, tag): """Add tag if not exists""" tag = tag.lower().strip() if tag not in self.tags: self.tags.append(tag) self.save(update_fields=['tags']) def remove_tag(self, tag): """Remove tag if exists""" tag = tag.lower().strip() if tag in self.tags: self.tags.remove(tag) self.save(update_fields=['tags']) def has_tag(self, tag): """Check if tag exists""" return tag.lower().strip() in self.tags def get_tag_count(self): """Get number of tags""" return len(self.tags) @classmethod def get_popular_tags(cls, limit=10): """Get most used tags""" from django.contrib.postgres.aggregates import ArrayAgg from django.db.models import Count # Flatten all tags tags = cls.objects.values_list('tags', flat=True) tag_list = [tag for tags_array in tags for tag in tags_array] # Count occurrences from collections import Counter return Counter(tag_list).most_common(limit)Real-World Use Cases
Use Case 1: Product Features
class Product(models.Model): name = models.CharField(max_length=200) # Multiple features features = ArrayField( models.CharField(max_length=100), default=list, help_text="Product features" ) # ["Waterproof", "Wireless", "Fast Charging", "Touch Screen"] # Multiple colors available available_colors = ArrayField( models.CharField(max_length=30), default=list ) # ["Red", "Blue", "Black", "White"] # Compatible models compatible_with = ArrayField( models.CharField(max_length=50), default=list, blank=True ) # Query products with specific featureProduct.objects.filter(features__contains=["Waterproof"]) # Query products available in specific colorsProduct.objects.filter( available_colors__overlap=["Red", "Blue"])Use Case 2: User Skills & Interests
class UserProfile(models.Model): user = models.OneToOneField('auth.User', on_delete=models.CASCADE) # Skills skills = ArrayField( models.CharField(max_length=50), default=list, help_text="User skills" ) # Interests interests = ArrayField( models.CharField(max_length=50), default=list ) # Languages spoken languages = ArrayField( models.CharField(max_length=30), default=list ) # Find users with Python skillUserProfile.objects.filter(skills__contains=["Python"]) # Find users interested in Django AND PythonUserProfile.objects.filter( interests__contains=["Django", "Python"]) # Find multilingual usersUserProfile.objects.annotate( language_count=Cast(F('languages'), models.IntegerField())).filter(language_count__gte=3)Use Case 3: Multi-Category System
class BlogPost(models.Model): title = models.CharField(max_length=200) content = models.TextField() # Multiple categories categories = ArrayField( models.CharField(max_length=50), default=list, size=5 # Max 5 categories ) # ["Technology", "Programming", "Django", "Python"] def add_category(self, category): if len(self.categories) >= 5: raise ValidationError("Maximum 5 categories allowed") if category not in self.categories: self.categories.append(category) self.save() # Get posts in multiple categoriesBlogPost.objects.filter( categories__overlap=["Technology", "Programming"]) # Get all unique categoriesfrom django.contrib.postgres.aggregates import ArrayAggfrom django.db.models import Func all_categories = BlogPost.objects.aggregate( categories=ArrayAgg('categories'))unique_categories = set( cat for cats in all_categories['categories'] if cats for cat in cats)3. HStoreField - Key-Value Storage
Giới Thiệu HStoreField
HStoreField (PostgreSQL) stores key-value pairs as a single field. Simpler than JSON cho flat key-value data.
HStoreField vs JSONField:
| Feature | HStoreField | JSONField |
|---|---|---|
| Data Type | Key-value (flat) | Any JSON (nested) |
| Value Types | Strings only | Any JSON type |
| Performance | Fast for simple KV | Good for complex |
| Use Case | Settings, metadata | Complex structures |
| Nesting | No | Yes |
| PostgreSQL Only | Yes | No (Django 3.1+) |
Khi Dùng HStoreField:
- ✅ Simple key-value pairs
- ✅ All values are strings
- ✅ No nesting needed
- ✅ PostgreSQL only is OK
Khi Dùng JSONField Instead:
- ✅ Need nested data
- ✅ Need different value types
- ✅ Database portability
- ✅ Complex queries
Basic Usage
Setup HStoreField
from django.contrib.postgres.fields import HStoreFieldfrom django.db import models class Product(models.Model): name = models.CharField(max_length=200) # Simple key-value pairs specifications = HStoreField(default=dict, blank=True) # {"brand": "Sony", "warranty": "2 years", "color": "Black"} # Metadata metadata = HStoreField(default=dict, blank=True)Enable HStore Extension:
# In migrationfrom django.contrib.postgres.operations import HStoreExtension class Migration(migrations.Migration): operations = [ HStoreExtension(), # ... other operations ]Storing Data
# Create với hstore dataproduct = Product.objects.create( name="Smartphone", specifications={ "brand": "Samsung", "model": "Galaxy S21", "color": "Black", "storage": "128GB", "warranty": "1 year" }) # Update valuesproduct.specifications["warranty"] = "2 years"product.specifications["condition"] = "New"product.save() # Delete keydel product.specifications["condition"]product.save() # Get value với defaultwarranty = product.specifications.get("warranty", "No warranty")Important - All Values Are Strings:
# ⚠️ WARNING: Values are always strings!product.specifications = { "price": "999", # String, not int "in_stock": "true", # String, not boolean "weight": "0.5" # String, not float} # Need to convert when readingprice = int(product.specifications.get("price", "0"))in_stock = product.specifications.get("in_stock") == "true"weight = float(product.specifications.get("weight", "0"))Querying HStoreField
Key Existence
# Has keyProduct.objects.filter(specifications__has_key="warranty") # Has multiple keysProduct.objects.filter( specifications__has_keys=["brand", "model"]) # Has any of keysProduct.objects.filter( specifications__has_any_keys=["warranty", "guarantee"])Value Queries
# Exact value matchProduct.objects.filter(specifications__brand="Samsung") # Multiple conditionsProduct.objects.filter( specifications__brand="Samsung", specifications__color="Black") # Contains (subset)Product.objects.filter( specifications__contains={"brand": "Samsung"}) # Contained by (superset)Product.objects.filter( specifications__contained_by={ "brand": "Samsung", "color": "Black", "extra": "value" })Advanced Queries
# Keys queryProduct.objects.filter(specifications__keys=["brand", "model", "color"]) # Values queryProduct.objects.filter( specifications__values__contains=["Samsung"]) # Combined queriesfrom django.db.models import Q Product.objects.filter( Q(specifications__brand="Samsung") | Q(specifications__brand="Apple"))HStoreField Best Practices
1. Type Conversion Helpers
class Product(models.Model): specifications = HStoreField(default=dict) def get_spec(self, key, default=None, convert_to=None): """Get specification với optional type conversion""" value = self.specifications.get(key, default) if value is None: return default if convert_to == int: try: return int(value) except (ValueError, TypeError): return default elif convert_to == float: try: return float(value) except (ValueError, TypeError): return default elif convert_to == bool: return value.lower() in ('true', '1', 'yes') return value def set_spec(self, key, value): """Set specification (converts to string)""" self.specifications[key] = str(value) self.save(update_fields=['specifications']) @property def price(self): return self.get_spec('price', 0, convert_to=int) @property def in_stock(self): return self.get_spec('in_stock', False, convert_to=bool) # Usageproduct = Product.objects.first()print(product.price) # Returns intprint(product.in_stock) # Returns bool2. Schema Documentation
class Product(models.Model): # Document expected keys trong docstring hoặc comment specifications = HStoreField(default=dict, blank=True) """ Expected keys: - brand: str - model: str - color: str - storage: str (e.g., "128GB") - warranty: str (e.g., "2 years") - price: str (numeric string) - in_stock: str ("true" or "false") """3. Validation
class Product(models.Model): specifications = HStoreField(default=dict) REQUIRED_SPECS = ['brand', 'model'] VALID_COLORS = ['Black', 'White', 'Blue', 'Red'] def clean(self): # Check required keys for key in self.REQUIRED_SPECS: if key not in self.specifications: raise ValidationError(f"Missing required specification: {key}") # Validate specific values if 'color' in self.specifications: if self.specifications['color'] not in self.VALID_COLORS: raise ValidationError(f"Invalid color") # Validate numeric strings if 'price' in self.specifications: try: price = float(self.specifications['price']) if price < 0: raise ValidationError("Price cannot be negative") except ValueError: raise ValidationError("Price must be a number")Real-World Use Cases
Use Case 1: Multi-Language Translations
class Article(models.Model): slug = models.SlugField(unique=True) # Translations as key-value titles = HStoreField(default=dict) # {"en": "Hello World", "vi": "Xin chào", "ja": "こんにちは"} descriptions = HStoreField(default=dict) def get_title(self, language='en'): return self.titles.get(language, self.titles.get('en', '')) def set_title(self, language, title): self.titles[language] = title self.save(update_fields=['titles']) # Query articles with Vietnamese translationArticle.objects.filter(titles__has_key='vi') # Query by specific language titleArticle.objects.filter(titles__vi__icontains='django')Use Case 2: User Settings
class UserSettings(models.Model): user = models.OneToOneField('auth.User', on_delete=models.CASCADE) # All settings as key-value settings = HStoreField(default=dict) # { # "theme": "dark", # "font_size": "14", # "email_notifications": "true", # "language": "en" # } DEFAULT_SETTINGS = { "theme": "light", "font_size": "12", "email_notifications": "true", "language": "en" } def get_setting(self, key, default=None): return self.settings.get(key, self.DEFAULT_SETTINGS.get(key, default)) def update_setting(self, key, value): self.settings[key] = str(value) self.save(update_fields=['settings']) @property def theme(self): return self.get_setting('theme') @property def font_size(self): return int(self.get_setting('font_size', '12'))Tóm Tắt Bài 2.2
Trong Bài 2.2 này, bạn đã học:
✅ ArrayField:
- Store arrays efficiently trong PostgreSQL
- Querying: contains, overlap, array operations
- Indexing với GIN indexes
- Best practices và validation
- Real-world use cases: tags, features, skills
✅ HStoreField:
- Key-value storage cho flat data
- All values are strings (need conversion)
- Querying keys và values
- HStore vs JSON comparison
- Use cases: translations, settings
✅ Performance:
- Proper indexing strategies
- When to use each field type
- Size limitations
- Query optimization
✅ Best Practices:
- Helper methods cho type conversion
- Validation patterns
- Schema documentation
- Error handling
Bài Tập Thực Hành
Bài Tập 1: Tag System với ArrayField
Build a complete tagging system:
Requirements:
- Articles có multiple tags
- Tags are unique per article
- Maximum 10 tags per article
- Tags must be lowercase, alphanumeric
- Get popular tags across all articles
- Search articles by tags
Tasks:
- Define
Articlemodel với tags ArrayField - Add validation cho tags
- Implement helper methods: add_tag, remove_tag, has_tag
- Create method to get popular tags
- Add GIN index
- Write queries to search by tags
Bài Tập 2: Product Specifications với HStoreField
Create a flexible product specification system:
Requirements:
- Products have dynamic specifications
- Required specs: brand, model
- Optional specs: color, size, weight, warranty
- Price must be numeric
- In_stock must be boolean string
- Get products by specific specifications
Tasks:
- Define
Productmodel với specifications HStoreField - Add validation cho required specs
- Implement helper methods: get_spec, set_spec với type conversion
- Add properties: price, in_stock
- Query products by brand
- Find products with warranty
Bài Tập 3: Multi-Language Content
Build a translation system:
Requirements:
- Articles support multiple languages
- Each language: title, description, content
- Default language: English
- Fallback to English if translation missing
- Get available languages for article
- Search by language-specific title
Tasks:
- Choose between JSONField or HStoreField (justify choice)
- Define
Articlemodel - Implement get_translation method
- Add set_translation method
- Query articles với Vietnamese translation
- Count articles per language
Next Steps
Trong Bài 2.3, bạn sẽ học:
- Creating custom model fields from scratch
- Field validation hooks
- Custom field types for business logic
- Reusable field components
- Field serialization và deserialization
Navigation:
- ← Bài 2.1: JSONField và Advanced Field Types
- Bài 2.2: ArrayField, HStoreField và Field Options (current)
- Bài 2.3: Custom Model Fields →