Bài 7.2: Dictionaries - Từ Điển (Phần 2)

Mục Tiêu Bài Học

Sau khi hoàn thành bài này, bạn sẽ:

  • ✅ Sử dụng dictionary comprehension hiệu quả
  • ✅ Làm việc với defaultdict và Counter
  • ✅ Merge và combine dictionaries
  • ✅ Áp dụng advanced dictionary techniques
  • ✅ Xử lý nested dictionaries phức tạp

Dictionary Comprehension

Dictionary comprehension là cách ngắn gọn để tạo dictionary mới.

Cú Pháp Cơ Bản

# Syntax: {key_expr: value_expr for item in iterable} # Tạo dict từ rangesquares = {x: x**2 for x in range(5)}print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16} # Từ listfruits = ["apple", "banana", "orange"]lengths = {fruit: len(fruit) for fruit in fruits}print(lengths)  # {'apple': 5, 'banana': 6, 'orange': 6} # Uppercase keysnames = ["alice", "bob", "charlie"]upper_names = {name.upper(): name for name in names}print(upper_names)  # {'ALICE': 'alice', 'BOB': 'bob', 'CHARLIE': 'charlie'}

Với Điều Kiện (if)

# Lọc số chẵnnumbers = range(10)evens = {x: x**2 for x in numbers if x % 2 == 0}print(evens)  # {0: 0, 2: 4, 4: 16, 6: 36, 8: 64} # Lọc theo valuescores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78}passed = {name: score for name, score in scores.items() if score >= 80}print(passed)  # {'Alice': 85, 'Charlie': 92} # String dài hơn 5words = ["python", "is", "awesome", "and", "powerful"]long_words = {w: len(w) for w in words if len(w) > 5}print(long_words)  # {'python': 6, 'awesome': 7, 'powerful': 8}

Với if-else

# Pass/Failscores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78}results = {name: "Pass" if score >= 80 else "Fail"            for name, score in scores.items()}print(results)# {'Alice': 'Pass', 'Bob': 'Fail', 'Charlie': 'Pass', 'Diana': 'Fail'} # Grade classificationgrades = {name: "A" if score >= 90 else "B" if score >= 80 else "C"          for name, score in scores.items()}print(grades)# {'Alice': 'B', 'Bob': 'C', 'Charlie': 'A', 'Diana': 'C'}

Transform Dictionary

# Swap keys và valuesoriginal = {"a": 1, "b": 2, "c": 3}swapped = {v: k for k, v in original.items()}print(swapped)  # {1: 'a', 2: 'b', 3: 'c'} # Uppercase all keysperson = {"name": "alice", "city": "hanoi"}upper_keys = {k.upper(): v for k, v in person.items()}print(upper_keys)  # {'NAME': 'alice', 'CITY': 'hanoi'} # Double all valuesnumbers = {"a": 1, "b": 2, "c": 3}doubled = {k: v * 2 for k, v in numbers.items()}print(doubled)  # {'a': 2, 'b': 4, 'c': 6} # Filter và transformprices = {"apple": 100, "banana": 50, "orange": 150, "grape": 80}discounted = {item: price * 0.9 for item, price in prices.items() if price > 80}print(discounted)  # {'apple': 90.0, 'orange': 135.0}

Nested Dict Comprehension

# 2D dictionary (matrix)matrix = {i: {j: i*j for j in range(1, 4)} for i in range(1, 4)}print(matrix)# {1: {1: 1, 2: 2, 3: 3},#  2: {1: 2, 2: 4, 3: 6},#  3: {1: 3, 2: 6, 3: 9}} # Flatten nested dictflat = {f"{k1}_{k2}": v2         for k1, v1 in matrix.items()         for k2, v2 in v1.items()}print(flat)# {'1_1': 1, '1_2': 2, '1_3': 3, '2_1': 2, ...}

defaultdict - Dictionary với Default Value

defaultdict tự động tạo value mặc định khi key không tồn tại.

from collections import defaultdict # defaultdict với int (default = 0)word_count = defaultdict(int) text = "hello world hello python"for word in text.split():    word_count[word] += 1  # Không cần check key tồn tại! print(dict(word_count))  # {'hello': 2, 'world': 1, 'python': 1} # So sánh với dict thông thườngnormal_dict = {}# normal_dict["hello"] += 1  # KeyError!normal_dict["hello"] = normal_dict.get("hello", 0) + 1  # Phải check

defaultdict với list

from collections import defaultdict # Group itemsstudents = [    ("Alice", "Math"),    ("Bob", "Physics"),    ("Alice", "Chemistry"),    ("Charlie", "Math"),    ("Bob", "Math")] # Group by studentby_student = defaultdict(list)for name, subject in students:    by_student[name].append(subject) print(dict(by_student))# {'Alice': ['Math', 'Chemistry'],#  'Bob': ['Physics', 'Math'],#  'Charlie': ['Math']} # Group by subjectby_subject = defaultdict(list)for name, subject in students:    by_subject[subject].append(name) print(dict(by_subject))# {'Math': ['Alice', 'Charlie', 'Bob'],#  'Physics': ['Bob'],#  'Chemistry': ['Alice']}

defaultdict với set

from collections import defaultdict # Unique items per categoryitems = [    ("fruit", "apple"),    ("fruit", "banana"),    ("fruit", "apple"),  # Duplicate    ("vegetable", "carrot"),    ("vegetable", "carrot")  # Duplicate] categories = defaultdict(set)for category, item in items:    categories[category].add(item) print(dict(categories))# {'fruit': {'apple', 'banana'},#  'vegetable': {'carrot'}}

defaultdict với lambda

from collections import defaultdict # Custom default valuecounter = defaultdict(lambda: "Unknown") counter["apple"] = 5counter["banana"] = 3 print(counter["apple"])    # 5print(counter["grape"])    # Unknown (default) # Nested defaultdictnested = defaultdict(lambda: defaultdict(int))nested["row1"]["col1"] = 10nested["row1"]["col2"] = 20nested["row2"]["col1"] = 30 print(dict(nested))# {'row1': {'col1': 10, 'col2': 20}, 'row2': {'col1': 30}}

Counter - Đếm Phần Tử

Counter là dict subclass dùng để đếm hashable objects.

from collections import Counter # Đếm từ iterablewords = ["apple", "banana", "apple", "orange", "banana", "apple"]counter = Counter(words)print(counter)  # Counter({'apple': 3, 'banana': 2, 'orange': 1}) # Đếm ký tựtext = "hello world"char_count = Counter(text)print(char_count)  # Counter({'l': 3, 'o': 2, 'h': 1, ...}) # Đếm sốnumbers = [1, 2, 1, 3, 2, 1, 4, 3, 1]num_count = Counter(numbers)print(num_count)  # Counter({1: 4, 2: 2, 3: 2, 4: 1})

Counter Methods

from collections import Counter counter = Counter(["a", "b", "a", "c", "b", "a"]) # most_common(n) - n phần tử phổ biến nhấtprint(counter.most_common(2))  # [('a', 3), ('b', 2)] # elements() - iterate tất cả elementsprint(list(counter.elements()))  # ['a', 'a', 'a', 'b', 'b', 'c'] # total() - tổng counts (Python 3.10+)# print(counter.total())  # 6 # update() - thêm countscounter.update(["a", "b", "d"])print(counter)  # Counter({'a': 4, 'b': 3, 'c': 1, 'd': 1}) # subtract() - trừ countscounter.subtract(["a", "b"])print(counter)  # Counter({'a': 3, 'b': 2, 'c': 1, 'd': 1})

Counter Operations

from collections import Counter c1 = Counter(["a", "b", "a", "c"])c2 = Counter(["a", "b", "b", "d"]) # Additionprint(c1 + c2)  # Counter({'a': 3, 'b': 3, 'c': 1, 'd': 1}) # Subtraction (chỉ giữ positive)print(c1 - c2)  # Counter({'a': 1, 'c': 1}) # Intersection (min)print(c1 & c2)  # Counter({'a': 1, 'b': 1}) # Union (max)print(c1 | c2)  # Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1})

Merge Dictionaries

Dùng update()

dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # update() - modify dict1dict1.update(dict2)print(dict1)  # {'a': 1, 'b': 3, 'c': 4}

Dùng ** (Unpacking) - Python 3.5+

dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # Merge into new dictmerged = {**dict1, **dict2}print(merged)  # {'a': 1, 'b': 3, 'c': 4} # Multiple dictsdict3 = {"d": 5}merged = {**dict1, **dict2, **dict3}print(merged)  # {'a': 1, 'b': 3, 'c': 4, 'd': 5}

Dùng | Operator - Python 3.9+

dict1 = {"a": 1, "b": 2}dict2 = {"b": 3, "c": 4} # Merge với |merged = dict1 | dict2print(merged)  # {'a': 1, 'b': 3, 'c': 4} # Update với |=dict1 |= dict2print(dict1)  # {'a': 1, 'b': 3, 'c': 4}

Merge với Custom Logic

# Merge và cộng valuesdict1 = {"a": 1, "b": 2, "c": 3}dict2 = {"b": 3, "c": 4, "d": 5} merged = {}for key in set(dict1) | set(dict2):    merged[key] = dict1.get(key, 0) + dict2.get(key, 0) print(merged)  # {'a': 1, 'b': 5, 'c': 7, 'd': 5} # Hoặc dùng Counterfrom collections import Countermerged = dict(Counter(dict1) + Counter(dict2))print(merged)  # {'a': 1, 'b': 5, 'c': 7, 'd': 5}

Advanced Techniques

Sorting Dictionaries

# Sort by keydata = {"c": 3, "a": 1, "b": 2}sorted_by_key = dict(sorted(data.items()))print(sorted_by_key)  # {'a': 1, 'b': 2, 'c': 3} # Sort by valuesorted_by_value = dict(sorted(data.items(), key=lambda x: x[1]))print(sorted_by_value)  # {'a': 1, 'b': 2, 'c': 3} # Sort by value (descending)sorted_desc = dict(sorted(data.items(), key=lambda x: x[1], reverse=True))print(sorted_desc)  # {'c': 3, 'b': 2, 'a': 1} # Sort with custom keystudents = {    "Alice": {"age": 25, "grade": 85},    "Bob": {"age": 22, "grade": 90},    "Charlie": {"age": 23, "grade": 88}} sorted_by_grade = dict(sorted(students.items(), key=lambda x: x[1]["grade"], reverse=True))for name, info in sorted_by_grade.items():    print(f"{name}: {info['grade']}")

Filter Dictionaries

scores = {"Alice": 85, "Bob": 65, "Charlie": 92, "Diana": 78} # Filter by valuepassed = {k: v for k, v in scores.items() if v >= 80}print(passed)  # {'Alice': 85, 'Charlie': 92} # Filter by keynames_with_a = {k: v for k, v in scores.items() if k.startswith("A")}print(names_with_a)  # {'Alice': 85} # Filter với functiondef is_passing(item):    name, score = item    return score >= 80 passed2 = dict(filter(is_passing, scores.items()))print(passed2)  # {'Alice': 85, 'Charlie': 92}

Invert Dictionary

# Simple invertoriginal = {"a": 1, "b": 2, "c": 3}inverted = {v: k for k, v in original.items()}print(inverted)  # {1: 'a', 2: 'b', 3: 'c'} # Invert với duplicate valuesoriginal = {"a": 1, "b": 2, "c": 1, "d": 3}from collections import defaultdict inverted = defaultdict(list)for key, value in original.items():    inverted[value].append(key) print(dict(inverted))  # {1: ['a', 'c'], 2: ['b'], 3: ['d']}

Nested Dictionary Access

# Safe nested accessdata = {    "user": {        "profile": {            "name": "Alice",            "age": 25        }    }} # Manual checking (verbose)if "user" in data:    if "profile" in data["user"]:        if "name" in data["user"]["profile"]:            name = data["user"]["profile"]["name"] # Using get() (better)name = data.get("user", {}).get("profile", {}).get("name", "Unknown")print(name)  # Alice # Helper functiondef get_nested(dictionary, *keys, default=None):    """Safely get nested dictionary value"""    for key in keys:        if isinstance(dictionary, dict):            dictionary = dictionary.get(key, default)        else:            return default    return dictionary name = get_nested(data, "user", "profile", "name")print(name)  # Alice email = get_nested(data, "user", "profile", "email", default="N/A")print(email)  # N/A

Ví Dụ Thực Tế

1. Data Aggregation

from collections import defaultdict # Sales datasales = [    {"product": "Laptop", "category": "Electronics", "amount": 1000},    {"product": "Phone", "category": "Electronics", "amount": 500},    {"product": "Desk", "category": "Furniture", "amount": 300},    {"product": "Chair", "category": "Furniture", "amount": 150},    {"product": "Tablet", "category": "Electronics", "amount": 400}] # Aggregate by categorycategory_sales = defaultdict(int)for sale in sales:    category_sales[sale["category"]] += sale["amount"] print("Sales by Category:")for category, total in sorted(category_sales.items(), key=lambda x: x[1], reverse=True):    print(f"  {category:<15} ${total:>6}") # Count products per categorycategory_count = defaultdict(int)for sale in sales:    category_count[sale["category"]] += 1 print("\nProducts per Category:")for category, count in category_count.items():    print(f"  {category}: {count} products")

2. Text Analysis

from collections import Counter text = """Python is a high-level programming language. Python is easy to learn and Python is very popular.Python is used in web development, data science, and AI.""" # Clean textwords = text.lower().split()words = [w.strip('.,!?') for w in words if len(w) > 3] # Word frequencyword_freq = Counter(words) print("Top 10 words:")for word, count in word_freq.most_common(10):    print(f"  {word:<15} {count:>3} {'█' * count}") # Statisticstotal_words = sum(word_freq.values())unique_words = len(word_freq) print(f"\nTotal words: {total_words}")print(f"Unique words: {unique_words}")print(f"Vocabulary richness: {unique_words/total_words:.2%}")

3. JSON Data Processing

# Simulate JSON datausers = [    {"id": 1, "name": "Alice", "role": "admin", "active": True},    {"id": 2, "name": "Bob", "role": "user", "active": True},    {"id": 3, "name": "Charlie", "role": "admin", "active": False},    {"id": 4, "name": "Diana", "role": "user", "active": True}] # Index by idusers_by_id = {user["id"]: user for user in users}print("User #2:", users_by_id[2]) # Group by rolefrom collections import defaultdictusers_by_role = defaultdict(list)for user in users:    users_by_role[user["role"]].append(user["name"]) print("\nUsers by Role:")for role, names in users_by_role.items():    print(f"  {role}: {', '.join(names)}") # Filter active adminsactive_admins = [u for u in users if u["role"] == "admin" and u["active"]]print(f"\nActive admins: {[u['name'] for u in active_admins]}") # Transform datauser_names = {u["id"]: u["name"] for u in users}print("\nID to Name mapping:", user_names)

4. Cache Implementation

from collections import OrderedDict class LRUCache:    """Simple LRU Cache using OrderedDict"""        def __init__(self, capacity):        self.cache = OrderedDict()        self.capacity = capacity        def get(self, key):        """Get value and move to end (most recent)"""        if key not in self.cache:            return None                # Move to end        self.cache.move_to_end(key)        return self.cache[key]        def put(self, key, value):        """Put key-value and maintain capacity"""        if key in self.cache:            # Update and move to end            self.cache.move_to_end(key)                self.cache[key] = value                # Remove oldest if over capacity        if len(self.cache) > self.capacity:            self.cache.popitem(last=False)        def __str__(self):        return str(dict(self.cache)) # Test cachecache = LRUCache(3)cache.put("a", 1)cache.put("b", 2)cache.put("c", 3)print("Cache:", cache)  # {'a': 1, 'b': 2, 'c': 3} cache.put("d", 4)  # Remove 'a' (oldest)print("After adding 'd':", cache)  # {'b': 2, 'c': 3, 'd': 4} cache.get("b")  # Access 'b' (moves to end)cache.put("e", 5)  # Remove 'c' (oldest)print("After adding 'e':", cache)  # {'d': 4, 'b': 2, 'e': 5}

5. Configuration Manager

from collections import ChainMap # Multiple config layersdefaults = {    "theme": "light",    "language": "en",    "notifications": True,    "auto_save": True} user_config = {    "theme": "dark",    "language": "vi"} session_config = {    "theme": "blue"} # ChainMap - lookup cascadeconfig = ChainMap(session_config, user_config, defaults) print("Theme:", config["theme"])           # blue (from session)print("Language:", config["language"])     # vi (from user)print("Notifications:", config["notifications"])  # True (from defaults)print("Auto-save:", config["auto_save"])   # True (from defaults) # Update specific layerconfig.maps[1]["language"] = "en"  # Update user_configprint("\nAfter update:")print("Language:", config["language"])  # en # New values go to first mapconfig["new_setting"] = "value"print("\nSession config:", dict(session_config))

Bài Tập Thực Hành

Bài 1: Dict Comprehension

Sử dụng dict comprehension:

  • Tạo dict số và bình phương từ 1-20
  • Filter chỉ số lẻ
  • Swap keys và values của dict cho trước

Bài 2: Word Frequency

Viết chương trình phân tích text:

  • Đếm word frequency
  • Tìm 5 từ phổ biến nhất
  • Tính vocabulary richness
  • Tìm từ dài nhất

Bài 3: Group Students

Cho list students với (name, grade):

  • Group by grade (A, B, C)
  • Đếm số students mỗi grade
  • Tìm grade có nhiều students nhất

Bài 4: Merge Configs

Merge nhiều config dicts:

  • Database config
  • App config
  • User config
  • Priority: User > App > Database

Bài 5: Inventory Aggregation

Cho list transactions:

  • Aggregate total quantity per product
  • Calculate total value per category
  • Find top 3 products by value
  • Products low in stock (<10)

Tóm Tắt

✅ Dict comprehension: {k: v for item in iterable if condition}
defaultdict: Auto default values
Counter: Đếm elements
✅ Merge: update(), **unpacking, | operator
✅ Advanced: sorting, filtering, inverting, nested access
✅ Collections module rất powerful!

Bài Tiếp Theo

Bài 8: Sets - Unique collections, set operations, và khi nào dùng sets.


Remember:

  • Dict comprehension: concise và readable
  • defaultdict tránh KeyError khi grouping
  • Counter tốt nhất cho frequency counting
  • Python 3.9+: dùng | để merge dicts
  • Collections module có nhiều tools hữu ích!
  • Practice với real data để master dictionaries!