Bài 7.2: Dictionaries - Từ Điển (Phần 2)
Mục Tiêu Bài Học
Sau khi hoàn thành bài này, bạn sẽ:
- ✅ Sử dụng dictionary comprehension hiệu quả
- ✅ Làm việc với defaultdict và Counter
- ✅ Merge và combine dictionaries
- ✅ Áp dụng advanced dictionary techniques
- ✅ Xử lý nested dictionaries phức tạp
Dictionary Comprehension
Dictionary comprehension là cách ngắn gọn để tạo dictionary mới.
Cú Pháp Cơ Bản
# Syntax: {key_expr: value_expr for item in iterable} # Tạo dict từ rangesquares = {x: x**2 for x in range(5)}print(squares) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} # Từ listfruits = ["apple", "banana", "orange"]lengths = {fruit: len(fruit) for fruit in fruits}print(lengths) # {'apple': 5, 'banana': 6, 'orange': 6} # Uppercase keysnames = ["alice", "bob", "charlie"]upper_names = {name.upper(): name for name in names}print(upper_names) # {'ALICE': 'alice', 'BOB': 'bob', 'CHARLIE': 'charlie'}
Với Điều Kiện (if)
# Lọc số chẵnnumbers = range(10)evens = {x: x**2 for x in numbers if x % 2 == 0}print(evens) # {0: 0, 2: 4, 4: 16, 6: 36, 8: 64} # Lọc theo valuescores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78}passed = {name: score for name, score in scores.items() if score >= 80}print(passed) # {'Alice': 85, 'Charlie': 92} # String dài hơn 5words = ["python", "is", "awesome", "and", "powerful"]long_words = {w: len(w) for w in words if len(w) > 5}print(long_words) # {'python': 6, 'awesome': 7, 'powerful': 8}
Với if-else
# Pass/Failscores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78}results = {name: "Pass" if score >= 80 else "Fail" for name, score in scores.items()}print(results)# {'Alice': 'Pass', 'Bob': 'Fail', 'Charlie': 'Pass', 'Diana': 'Fail'} # Grade classificationgrades = {name: "A" if score >= 90 else "B" if score >= 80 else "C" for name, score in scores.items()}print(grades)# {'Alice': 'B', 'Bob': 'C', 'Charlie': 'A', 'Diana': 'C'}
Transform Dictionary
# Swap keys và valuesoriginal = {"a": 1, "b": 2, "c": 3}swapped = {v: k for k, v in original.items()}print(swapped) # {1: 'a', 2: 'b', 3: 'c'} # Uppercase all keysperson = {"name": "alice", "city": "hanoi"}upper_keys = {k.upper(): v for k, v in person.items()}print(upper_keys) # {'NAME': 'alice', 'CITY': 'hanoi'} # Double all valuesnumbers = {"a": 1, "b": 2, "c": 3}doubled = {k: v * 2 for k, v in numbers.items()}print(doubled) # {'a': 2, 'b': 4, 'c': 6} # Filter và transformprices = {"apple": 100, "banana": 50, "orange": 150, "grape": 80}discounted = {item: price * 0.9 for item, price in prices.items() if price > 80}print(discounted) # {'apple': 90.0, 'orange': 135.0}
Nested Dict Comprehension
# 2D dictionary (matrix)matrix = {i: {j: i*j for j in range(1, 4)} for i in range(1, 4)}print(matrix)# {1: {1: 1, 2: 2, 3: 3},# 2: {1: 2, 2: 4, 3: 6},# 3: {1: 3, 2: 6, 3: 9}} # Flatten nested dictflat = {f"{k1}_{k2}": v2 for k1, v1 in matrix.items() for k2, v2 in v1.items()}print(flat)# {'1_1': 1, '1_2': 2, '1_3': 3, '2_1': 2, ...}
defaultdict - Dictionary với Default Value
defaultdict tự động tạo value mặc định khi key không tồn tại.
from collections import defaultdict # defaultdict với int (default = 0)word_count = defaultdict(int) text = "hello world hello python"for word in text.split(): word_count[word] += 1 # Không cần check key tồn tại! print(dict(word_count)) # {'hello': 2, 'world': 1, 'python': 1} # So sánh với dict thông thườngnormal_dict = {}# normal_dict["hello"] += 1 # KeyError!normal_dict["hello"] = normal_dict.get("hello", 0) + 1 # Phải check
defaultdict với list
from collections import defaultdict # Group itemsstudents = [ ("Alice", "Math"), ("Bob", "Physics"), ("Alice", "Chemistry"), ("Charlie", "Math"), ("Bob", "Math")] # Group by studentby_student = defaultdict(list)for name, subject in students: by_student[name].append(subject) print(dict(by_student))# {'Alice': ['Math', 'Chemistry'],# 'Bob': ['Physics', 'Math'],# 'Charlie': ['Math']} # Group by subjectby_subject = defaultdict(list)for name, subject in students: by_subject[subject].append(name) print(dict(by_subject))# {'Math': ['Alice', 'Charlie', 'Bob'],# 'Physics': ['Bob'],# 'Chemistry': ['Alice']}
defaultdict với set
from collections import defaultdict # Unique items per categoryitems = [ ("fruit", "apple"), ("fruit", "banana"), ("fruit", "apple"), # Duplicate ("vegetable", "carrot"), ("vegetable", "carrot") # Duplicate] categories = defaultdict(set)for category, item in items: categories[category].add(item) print(dict(categories))# {'fruit': {'apple', 'banana'},# 'vegetable': {'carrot'}}
defaultdict với lambda
from collections import defaultdict # Custom default valuecounter = defaultdict(lambda: "Unknown") counter["apple"] = 5counter["banana"] = 3 print(counter["apple"]) # 5print(counter["grape"]) # Unknown (default) # Nested defaultdictnested = defaultdict(lambda: defaultdict(int))nested["row1"]["col1"] = 10nested["row1"]["col2"] = 20nested["row2"]["col1"] = 30 print(dict(nested))# {'row1': {'col1': 10, 'col2': 20}, 'row2': {'col1': 30}}
Counter - Đếm Phần Tử
Counter là dict subclass dùng để đếm hashable objects.
from collections import Counter # Đếm từ iterablewords = ["apple", "banana", "apple", "orange", "banana", "apple"]counter = Counter(words)print(counter) # Counter({'apple': 3, 'banana': 2, 'orange': 1}) # Đếm ký tựtext = "hello world"char_count = Counter(text)print(char_count) # Counter({'l': 3, 'o': 2, 'h': 1, ...}) # Đếm sốnumbers = [1, 2, 1, 3, 2, 1, 4, 3, 1]num_count = Counter(numbers)print(num_count) # Counter({1: 4, 2: 2, 3: 2, 4: 1})
Counter Methods
from collections import Counter counter = Counter(["a", "b", "a", "c", "b", "a"]) # most_common(n) - n phần tử phổ biến nhấtprint(counter.most_common(2)) # [('a', 3), ('b', 2)] # elements() - iterate tất cả elementsprint(list(counter.elements())) # ['a', 'a', 'a', 'b', 'b', 'c'] # total() - tổng counts (Python 3.10+)# print(counter.total()) # 6 # update() - thêm countscounter.update(["a", "b", "d"])print(counter) # Counter({'a': 4, 'b': 3, 'c': 1, 'd': 1}) # subtract() - trừ countscounter.subtract(["a", "b"])print(counter) # Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})
Counter Operations
from collections import Counter c1 = Counter(["a", "b", "a", "c"])c2 = Counter(["a", "b", "b", "d"]) # Additionprint(c1 + c2) # Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1}) # Subtraction (chỉ giữ positive)print(c1 - c2) # Counter({'a': 1, 'c': 1}) # Intersection (min)print(c1 & c2) # Counter({'a': 1, 'b': 1}) # Union (max)print(c1 | c2) # Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1})
Merge Dictionaries
Dùng update()
dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # update() - modify dict1dict1.update(dict2)print(dict1) # {'a': 1, 'b': 3, 'c': 4}
Dùng ** (Unpacking) - Python 3.5+
dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # Merge into new dictmerged = {**dict1, **dict2}print(merged) # {'a': 1, 'b': 3, 'c': 4} # Multiple dictsdict3 = {"d": 5}merged = {**dict1, **dict2, **dict3}print(merged) # {'a': 1, 'b': 3, 'c': 4, 'd': 5}
Dùng | Operator - Python 3.9+
dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # Merge với |merged = dict1 | dict2print(merged) # {'a': 1, 'b': 3, 'c': 4} # Update với |=dict1 |= dict2print(dict1) # {'a': 1, 'b': 3, 'c': 4}
Merge với Custom Logic
# Merge và cộng valuesdict1 = {"a": 1, "b": 2, "c": 3}dict2 = {"b": 3, "c": 4, "d": 5} merged = {}for key in set(dict1) | set(dict2): merged[key] = dict1.get(key, 0) + dict2.get(key, 0) print(merged) # {'a': 1, 'b': 5, 'c': 7, 'd': 5} # Hoặc dùng Counterfrom collections import Countermerged = dict(Counter(dict1) + Counter(dict2))print(merged) # {'a': 1, 'b': 5, 'c': 7, 'd': 5}
Advanced Techniques
Sorting Dictionaries
# Sort by keydata = {"c": 3, "a": 1, "b": 2}sorted_by_key = dict(sorted(data.items()))print(sorted_by_key) # {'a': 1, 'b': 2, 'c': 3} # Sort by valuesorted_by_value = dict(sorted(data.items(), key=lambda x: x[1]))print(sorted_by_value) # {'a': 1, 'b': 2, 'c': 3} # Sort by value (descending)sorted_desc = dict(sorted(data.items(), key=lambda x: x[1], reverse=True))print(sorted_desc) # {'c': 3, 'b': 2, 'a': 1} # Sort with custom keystudents = { "Alice": {"age": 25, "grade": 85}, "Bob": {"age": 22, "grade": 90}, "Charlie": {"age": 23, "grade": 88}} sorted_by_grade = dict(sorted(students.items(), key=lambda x: x[1]["grade"], reverse=True))for name, info in sorted_by_grade.items(): print(f"{name}: {info['grade']}")
Filter Dictionaries
scores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78} # Filter by valuepassed = {k: v for k, v in scores.items() if v >= 80}print(passed) # {'Alice': 85, 'Charlie': 92} # Filter by keynames_with_a = {k: v for k, v in scores.items() if k.startswith("A")}print(names_with_a) # {'Alice': 85} # Filter với functiondef is_passing(item): name, score = item return score >= 80 passed2 = dict(filter(is_passing, scores.items()))print(passed2) # {'Alice': 85, 'Charlie': 92}
Invert Dictionary
# Simple invertoriginal = {"a": 1, "b": 2, "c": 3}inverted = {v: k for k, v in original.items()}print(inverted) # {1: 'a', 2: 'b', 3: 'c'} # Invert với duplicate valuesoriginal = {"a": 1, "b": 2, "c": 1, "d": 3}from collections import defaultdict inverted = defaultdict(list)for key, value in original.items(): inverted[value].append(key) print(dict(inverted)) # {1: ['a', 'c'], 2: ['b'], 3: ['d']}
Nested Dictionary Access
# Safe nested accessdata = { "user": { "profile": { "name": "Alice", "age": 25 } }} # Manual checking (verbose)if "user" in data: if "profile" in data["user"]: if "name" in data["user"]["profile"]: name = data["user"]["profile"]["name"] # Using get() (better)name = data.get("user", {}).get("profile", {}).get("name", "Unknown")print(name) # Alice # Helper functiondef get_nested(dictionary, *keys, default=None): """Safely get nested dictionary value""" for key in keys: if isinstance(dictionary, dict): dictionary = dictionary.get(key, default) else: return default return dictionary name = get_nested(data, "user", "profile", "name")print(name) # Alice email = get_nested(data, "user", "profile", "email", default="N/A")print(email) # N/A
Ví Dụ Thực Tế
1. Data Aggregation
from collections import defaultdict # Sales datasales = [ {"product": "Laptop", "category": "Electronics", "amount": 1000}, {"product": "Phone", "category": "Electronics", "amount": 500}, {"product": "Desk", "category": "Furniture", "amount": 300}, {"product": "Chair", "category": "Furniture", "amount": 150}, {"product": "Tablet", "category": "Electronics", "amount": 400}] # Aggregate by categorycategory_sales = defaultdict(int)for sale in sales: category_sales[sale["category"]] += sale["amount"] print("Sales by Category:")for category, total in sorted(category_sales.items(), key=lambda x: x[1], reverse=True): print(f" {category:<15} ${total:>6}") # Count products per categorycategory_count = defaultdict(int)for sale in sales: category_count[sale["category"]] += 1 print("\nProducts per Category:")for category, count in category_count.items(): print(f" {category}: {count} products")
2. Text Analysis
from collections import Counter text = """Python is a high-level programming language. Python is easy to learn and Python is very popular.Python is used in web development, data science, and AI.""" # Clean textwords = text.lower().split()words = [w.strip('.,!?') for w in words if len(w) > 3] # Word frequencyword_freq = Counter(words) print("Top 10 words:")for word, count in word_freq.most_common(10): print(f" {word:<15} {count:>3} {'█' * count}") # Statisticstotal_words = sum(word_freq.values())unique_words = len(word_freq) print(f"\nTotal words: {total_words}")print(f"Unique words: {unique_words}")print(f"Vocabulary richness: {unique_words/total_words:.2%}")
3. JSON Data Processing
# Simulate JSON datausers = [ {"id": 1, "name": "Alice", "role": "admin", "active": True}, {"id": 2, "name": "Bob", "role": "user", "active": True}, {"id": 3, "name": "Charlie", "role": "admin", "active": False}, {"id": 4, "name": "Diana", "role": "user", "active": True}] # Index by idusers_by_id = {user["id"]: user for user in users}print("User #2:", users_by_id[2]) # Group by rolefrom collections import defaultdictusers_by_role = defaultdict(list)for user in users: users_by_role[user["role"]].append(user["name"]) print("\nUsers by Role:")for role, names in users_by_role.items(): print(f" {role}: {', '.join(names)}") # Filter active adminsactive_admins = [u for u in users if u["role"] == "admin" and u["active"]]print(f"\nActive admins: {[u['name'] for u in active_admins]}") # Transform datauser_names = {u["id"]: u["name"] for u in users}print("\nID to Name mapping:", user_names)
4. Cache Implementation
from collections import OrderedDict class LRUCache: """Simple LRU Cache using OrderedDict""" def __init__(self, capacity): self.cache = OrderedDict() self.capacity = capacity def get(self, key): """Get value and move to end (most recent)""" if key not in self.cache: return None # Move to end self.cache.move_to_end(key) return self.cache[key] def put(self, key, value): """Put key-value and maintain capacity""" if key in self.cache: # Update and move to end self.cache.move_to_end(key) self.cache[key] = value # Remove oldest if over capacity if len(self.cache) > self.capacity: self.cache.popitem(last=False) def __str__(self): return str(dict(self.cache)) # Test cachecache = LRUCache(3)cache.put("a", 1)cache.put("b", 2)cache.put("c", 3)print("Cache:", cache) # {'a': 1, 'b': 2, 'c': 3} cache.put("d", 4) # Remove 'a' (oldest)print("After adding 'd':", cache) # {'b': 2, 'c': 3, 'd': 4} cache.get("b") # Access 'b' (moves to end)cache.put("e", 5) # Remove 'c' (oldest)print("After adding 'e':", cache) # {'d': 4, 'b': 2, 'e': 5}
5. Configuration Manager
from collections import ChainMap # Multiple config layersdefaults = { "theme": "light", "language": "en", "notifications": True, "auto_save": True} user_config = { "theme": "dark", "language": "vi"} session_config = { "theme": "blue"} # ChainMap - lookup cascadeconfig = ChainMap(session_config, user_config, defaults) print("Theme:", config["theme"]) # blue (from session)print("Language:", config["language"]) # vi (from user)print("Notifications:", config["notifications"]) # True (from defaults)print("Auto-save:", config["auto_save"]) # True (from defaults) # Update specific layerconfig.maps[1]["language"] = "en" # Update user_configprint("\nAfter update:")print("Language:", config["language"]) # en # New values go to first mapconfig["new_setting"] = "value"print("\nSession config:", dict(session_config))
Bài Tập Thực Hành
Bài 1: Dict Comprehension
Sử dụng dict comprehension:
- Tạo dict số và bình phương từ 1-20
- Filter chỉ số lẻ
- Swap keys và values của dict cho trước
Bài 2: Word Frequency
Viết chương trình phân tích text:
- Đếm word frequency
- Tìm 5 từ phổ biến nhất
- Tính vocabulary richness
- Tìm từ dài nhất
Bài 3: Group Students
Cho list students với (name, grade):
- Group by grade (A, B, C)
- Đếm số students mỗi grade
- Tìm grade có nhiều students nhất
Bài 4: Merge Configs
Merge nhiều config dicts:
- Database config
- App config
- User config
- Priority: User > App > Database
Bài 5: Inventory Aggregation
Cho list transactions:
- Aggregate total quantity per product
- Calculate total value per category
- Find top 3 products by value
- Products low in stock (<10)
Tóm Tắt
✅ Dict comprehension: {k: v for item in iterable if condition}
✅ defaultdict: Auto default values
✅ Counter: Đếm elements
✅ Merge: update(), **unpacking, | operator
✅ Advanced: sorting, filtering, inverting, nested access
✅ Collections module rất powerful!
Bài Tiếp Theo
Bài 8: Sets - Unique collections, set operations, và khi nào dùng sets.
Remember:
- Dict comprehension: concise và readable
defaultdicttránh KeyError khi groupingCountertốt nhất cho frequency counting- Python 3.9+: dùng
|để merge dicts - Collections module có nhiều tools hữu ích!
- Practice với real data để master dictionaries!