Bài 10: Collections Module

Mục Tiêu Bài Học

Sau khi hoàn thành bài này, bạn sẽ:

  • ✅ Sử dụng namedtuple
  • ✅ Làm việc với defaultdict
  • ✅ Sử dụng Counter
  • ✅ Hiểu OrderedDict
  • ✅ Làm việc với deque và ChainMap
  • ✅ Apply vào real-world scenarios

collections Module

Module collections cung cấp specialized container datatypes.

import collections # Available typesprint(dir(collections))# ['Counter', 'OrderedDict', 'ChainMap', 'defaultdict', 'deque', 'namedtuple', ...]

namedtuple - Named Tuples

namedtuple tạo tuple với named fields, giống struct hoặc simple class.

Creating namedtuple

from collections import namedtuple # Define a Point typePoint = namedtuple('Point', ['x', 'y']) # Create instancesp1 = Point(10, 20)p2 = Point(x=15, y=25) # Access by nameprint(p1.x)  # 10print(p1.y)  # 20 # Access by index (still a tuple)print(p1[0])  # 10print(p1[1])  # 20 # Unpackx, y = p1print(x, y)  # 10 20 # Immutable# p1.x = 30  # AttributeError # String format for fieldsPerson = namedtuple('Person', 'name age email')user = Person('Alice', 25, '[email protected]')print(user.name)  # Alice

namedtuple Methods

from collections import namedtuple Person = namedtuple('Person', ['name', 'age', 'city'])p = Person('Bob', 30, 'Hanoi') # _asdict() - convert to dictprint(p._asdict())# {'name': 'Bob', 'age': 30, 'city': 'Hanoi'} # _replace() - create new with changesp2 = p._replace(age=31)print(p2)  # Person(name='Bob', age=31, city='Hanoi') # _fields - get field namesprint(Person._fields)  # ('name', 'age', 'city') # _make() - create from iterabledata = ['Charlie', 28, 'HCMC']p3 = Person._make(data)print(p3)  # Person(name='Charlie', age=28, city='HCMC')

Real-world Usage

from collections import namedtuple # Database recordsUser = namedtuple('User', ['id', 'username', 'email', 'active']) users = [    User(1, 'alice', '[email protected]', True),    User(2, 'bob', '[email protected]', False),    User(3, 'charlie', '[email protected]', True)] # Easy to work withfor user in users:    if user.active:        print(f"{user.username}: {user.email}") # Function return multiple valuesdef get_coordinates():    Coordinate = namedtuple('Coordinate', ['lat', 'lon'])    return Coordinate(21.0285, 105.8542)  # Hanoi coord = get_coordinates()print(f"Latitude: {coord.lat}, Longitude: {coord.lon}")

defaultdict - Dict with Default Values

defaultdict tự động tạo values cho keys không tồn tại.

Basic Usage

from collections import defaultdict # Regular dict - KeyErrorregular_dict = {}# print(regular_dict['key'])  # KeyError # defaultdict with listdd = defaultdict(list)dd['fruits'].append('apple')dd['fruits'].append('banana')dd['vegetables'].append('carrot') print(dd)# defaultdict(<class 'list'>, {'fruits': ['apple', 'banana'], 'vegetables': ['carrot']}) # defaultdict with int (default 0)counts = defaultdict(int)counts['apples'] += 1counts['oranges'] += 2counts['apples'] += 1 print(counts)# defaultdict(<class 'int'>, {'apples': 2, 'oranges': 2}) # defaultdict with settags = defaultdict(set)tags['python'].add('programming')tags['python'].add('scripting')tags['django'].add('web') print(tags)# defaultdict(<class 'set'>, {'python': {'programming', 'scripting'}, 'django': {'web'}})

Custom Default Factory

from collections import defaultdict # Custom default valuedef default_value():    return 'N/A' dd = defaultdict(default_value)print(dd['missing'])  # N/A # Lambda for custom defaultsdd2 = defaultdict(lambda: [])dd2['items'].append(1)print(dd2)  # defaultdict(<function>, {'items': [1]}) # Nested defaultdictnested = defaultdict(lambda: defaultdict(int))nested['user1']['posts'] = 10nested['user1']['likes'] = 50nested['user2']['posts'] = 5 print(nested)# defaultdict(<function>, {'user1': defaultdict(<class 'int'>, {'posts': 10, 'likes': 50}), 'user2': defaultdict(<class 'int'>, {'posts': 5})})

Real-world Usage

from collections import defaultdict # Group items by categoryproducts = [    ('laptop', 'electronics'),    ('phone', 'electronics'),    ('desk', 'furniture'),    ('chair', 'furniture'),    ('tablet', 'electronics')] grouped = defaultdict(list)for product, category in products:    grouped[category].append(product) print(grouped)# defaultdict(<class 'list'>, {'electronics': ['laptop', 'phone', 'tablet'], 'furniture': ['desk', 'chair']}) # Count occurrences by grouptext = "the quick brown fox jumps over the lazy dog"word_lengths = defaultdict(list) for word in text.split():    word_lengths[len(word)].append(word) print(word_lengths)# defaultdict(<class 'list'>, {3: ['the', 'fox', 'the', 'dog'], 5: ['quick', 'brown', 'jumps', 'lazy'], 4: ['over']})

Counter - Count Occurrences

Counter là dict subclass để count hashable objects.

Basic Usage

from collections import Counter # Count from listfruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']counter = Counter(fruits) print(counter)# Counter({'apple': 3, 'banana': 2, 'orange': 1}) # Count from stringtext = "hello world"char_counter = Counter(text)print(char_counter)# Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1}) # Count from dictcounter2 = Counter({'apples': 3, 'oranges': 2})print(counter2) # Access countsprint(counter['apple'])   # 3print(counter['grape'])   # 0 (no KeyError!)

Counter Methods

from collections import Counter counter = Counter(['a', 'b', 'c', 'a', 'b', 'a']) # most_common(n) - get n most commonprint(counter.most_common(2))# [('a', 3), ('b', 2)] # elements() - iterator over elementsprint(list(counter.elements()))# ['a', 'a', 'a', 'b', 'b', 'c'] # update() - add countscounter.update(['a', 'd', 'd'])print(counter)# Counter({'a': 4, 'b': 2, 'd': 2, 'c': 1}) # subtract() - subtract countscounter.subtract(['a', 'b'])print(counter)# Counter({'a': 3, 'b': 1, 'd': 2, 'c': 1}) # Arithmetic operationsc1 = Counter(['a', 'b', 'c'])c2 = Counter(['a', 'b', 'd']) print(c1 + c2)  # Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1})print(c1 - c2)  # Counter({'c': 1})print(c1 & c2)  # Intersection: Counter({'a': 1, 'b': 1})print(c1 | c2)  # Union: Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1})

Real-world Usage

from collections import Counter # Word frequencytext = """Python is a programming language. Python is popular.Many developers use Python for web development.""" words = text.lower().split()word_freq = Counter(words) print("Most common words:")for word, count in word_freq.most_common(5):    print(f"{word}: {count}") # Vote countingvotes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Alice', 'Bob', 'Alice']vote_counter = Counter(votes) winner = vote_counter.most_common(1)[0]print(f"Winner: {winner[0]} with {winner[1]} votes") # Inventory managementinventory = Counter(laptop=5, phone=10, tablet=3)sold = Counter(laptop=2, phone=3) remaining = inventory - soldprint(f"Remaining inventory: {remaining}")# Counter({'phone': 7, 'laptop': 3, 'tablet': 3})

deque - Double-Ended Queue

deque (deck) là list-like container với fast appends/pops từ cả 2 ends.

Basic Operations

from collections import deque # Create dequedq = deque([1, 2, 3])print(dq)  # deque([1, 2, 3]) # Append to rightdq.append(4)print(dq)  # deque([1, 2, 3, 4]) # Append to leftdq.appendleft(0)print(dq)  # deque([0, 1, 2, 3, 4]) # Pop from rightprint(dq.pop())  # 4print(dq)  # deque([0, 1, 2, 3]) # Pop from leftprint(dq.popleft())  # 0print(dq)  # deque([1, 2, 3]) # Rotatedq.rotate(1)  # Rotate rightprint(dq)  # deque([3, 1, 2]) dq.rotate(-1)  # Rotate leftprint(dq)  # deque([1, 2, 3]) # Extenddq.extend([4, 5])print(dq)  # deque([1, 2, 3, 4, 5]) dq.extendleft([0, -1])  # Extends in reverseprint(dq)  # deque([-1, 0, 1, 2, 3, 4, 5])

maxlen - Bounded Deque

from collections import deque # Fixed size dequedq = deque(maxlen=3) dq.append(1)dq.append(2)dq.append(3)print(dq)  # deque([1, 2, 3], maxlen=3) # Adding more removes from leftdq.append(4)print(dq)  # deque([2, 3, 4], maxlen=3) dq.append(5)print(dq)  # deque([3, 4, 5], maxlen=3)

Real-world Usage

from collections import deque # Recent history (last N items)class BrowsingHistory:    def __init__(self, max_size=5):        self.history = deque(maxlen=max_size)        def visit(self, url):        self.history.append(url)        def get_recent(self):        return list(self.history) browser = BrowsingHistory(max_size=3)browser.visit('google.com')browser.visit('github.com')browser.visit('stackoverflow.com')browser.visit('python.org') print(browser.get_recent())# ['github.com', 'stackoverflow.com', 'python.org'] # Moving averagedef moving_average(data, window_size):    dq = deque(maxlen=window_size)    averages = []        for value in data:        dq.append(value)        if len(dq) == window_size:            avg = sum(dq) / window_size            averages.append(avg)        return averages data = [10, 20, 30, 40, 50, 60]print(moving_average(data, 3))# [20.0, 30.0, 40.0, 50.0] # Task queuetask_queue = deque() # Add taskstask_queue.append('task1')task_queue.append('task2')task_queue.append('task3') # Process tasks (FIFO)while task_queue:    task = task_queue.popleft()    print(f"Processing: {task}")

OrderedDict - Ordered Dictionary

OrderedDict maintains insertion order (Python 3.7+ dict cũng ordered, nhưng OrderedDict có extra features).

Basic Usage

from collections import OrderedDict # Regular dict (Python 3.7+ is ordered)regular = {'b': 2, 'a': 1, 'c': 3}print(regular)  # {'b': 2, 'a': 1, 'c': 3} # OrderedDictordered = OrderedDict()ordered['b'] = 2ordered['a'] = 1ordered['c'] = 3print(ordered)# OrderedDict([('b', 2), ('a', 1), ('c', 3)]) # move_to_end()ordered.move_to_end('b')print(ordered)# OrderedDict([('a', 1), ('c', 3), ('b', 2)]) ordered.move_to_end('a', last=False)  # Move to beginningprint(ordered)# OrderedDict([('a', 1), ('c', 3), ('b', 2)]) # popitem() - LIFO by defaultlast_item = ordered.popitem()print(last_item)  # ('b', 2) ordered.popitem(last=False)  # FIFOprint(ordered)  # OrderedDict([('c', 3)])

Real-world Usage

from collections import OrderedDict # LRU Cache implementationclass LRUCache:    def __init__(self, capacity):        self.cache = OrderedDict()        self.capacity = capacity        def get(self, key):        if key not in self.cache:            return None                # Move to end (most recently used)        self.cache.move_to_end(key)        return self.cache[key]        def put(self, key, value):        if key in self.cache:            # Update and move to end            self.cache.move_to_end(key)                self.cache[key] = value                # Remove least recently used if over capacity        if len(self.cache) > self.capacity:            self.cache.popitem(last=False) cache = LRUCache(3)cache.put('a', 1)cache.put('b', 2)cache.put('c', 3)print(cache.cache)  # OrderedDict([('a', 1), ('b', 2), ('c', 3)]) cache.get('a')  # Access 'a'cache.put('d', 4)  # Add 'd', removes 'b' (least recently used)print(cache.cache)  # OrderedDict([('c', 3), ('a', 1), ('d', 4)])

ChainMap - Combine Multiple Dicts

ChainMap groups multiple dicts into single view.

Basic Usage

from collections import ChainMap # Multiple dictsdefaults = {'color': 'blue', 'size': 'medium'}user_prefs = {'color': 'red'} # Combine (user_prefs has priority)config = ChainMap(user_prefs, defaults) print(config['color'])  # red (from user_prefs)print(config['size'])   # medium (from defaults) # View all mapsprint(config.maps)# [{'color': 'red'}, {'color': 'blue', 'size': 'medium'}] # Add new mapadmin_prefs = {'admin': True}config = config.new_child(admin_prefs)print(config['admin'])  # True

Real-world Usage

from collections import ChainMapimport os # Configuration hierarchyclass Config:    def __init__(self):        # Priority: command line > env vars > defaults        self.defaults = {            'host': 'localhost',            'port': 8000,            'debug': False        }                self.env_vars = {            k.lower(): v             for k, v in os.environ.items()             if k.startswith('APP_')        }                self.cli_args = {}  # Would be populated from argparse                self.config = ChainMap(            self.cli_args,            self.env_vars,            self.defaults        )        def get(self, key):        return self.config.get(key)        def set_cli_arg(self, key, value):        self.cli_args[key] = value config = Config()print(config.get('host'))  # localhost (from defaults) config.set_cli_arg('host', '0.0.0.0')print(config.get('host'))  # 0.0.0.0 (from CLI, highest priority)

Real-world Examples

1. Text Analysis Tool

from collections import Counter, defaultdict class TextAnalyzer:    def __init__(self, text):        self.text = text.lower()        self.words = self.text.split()        def word_frequency(self, top_n=10):        """Get most common words."""        counter = Counter(self.words)        return counter.most_common(top_n)        def words_by_length(self):        """Group words by length."""        grouped = defaultdict(list)        for word in set(self.words):            grouped[len(word)].append(word)        return dict(grouped)        def char_frequency(self):        """Get character frequency."""        return Counter(self.text) text = "Python is great. Python is powerful. Many people use Python."analyzer = TextAnalyzer(text) print("Top words:", analyzer.word_frequency(3))print("By length:", analyzer.words_by_length())

2. Request Rate Limiter

from collections import dequefrom datetime import datetime, timedelta class RateLimiter:    def __init__(self, max_requests, time_window_seconds):        self.max_requests = max_requests        self.time_window = timedelta(seconds=time_window_seconds)        self.requests = deque()        def allow_request(self):        """Check if request is allowed."""        now = datetime.now()                # Remove old requests outside time window        while self.requests and now - self.requests[0] > self.time_window:            self.requests.popleft()                # Check if under limit        if len(self.requests) < self.max_requests:            self.requests.append(now)            return True                return False # Allow 3 requests per 10 secondslimiter = RateLimiter(max_requests=3, time_window_seconds=10) for i in range(5):    if limiter.allow_request():        print(f"Request {i+1}: Allowed")    else:        print(f"Request {i+1}: Rate limit exceeded")

3. Command History

from collections import deque class CommandHistory:    def __init__(self, max_size=10):        self.history = deque(maxlen=max_size)        self.current_pos = -1        def add(self, command):        """Add command to history."""        self.history.append(command)        self.current_pos = len(self.history)        def previous(self):        """Get previous command."""        if self.current_pos > 0:            self.current_pos -= 1            return self.history[self.current_pos]        return None        def next(self):        """Get next command."""        if self.current_pos < len(self.history) - 1:            self.current_pos += 1            return self.history[self.current_pos]        return None        def show(self):        """Show all history."""        for i, cmd in enumerate(self.history):            print(f"{i+1}. {cmd}") history = CommandHistory(max_size=5)history.add("ls -la")history.add("cd /home")history.add("python main.py") history.show()

Best Practices

from collections import namedtuple, defaultdict, Counter, deque # 1. Use namedtuple for simple data structuresPoint = namedtuple('Point', ['x', 'y'])p = Point(10, 20) # 2. Use defaultdict to avoid KeyErrorcounts = defaultdict(int)counts['key'] += 1  # No need to check if exists # 3. Use Counter for countingwords = ['a', 'b', 'a', 'c']counter = Counter(words) # 4. Use deque for queues and stacksqueue = deque()queue.append(1)queue.popleft()  # O(1) - efficient # 5. Performance comparison# list.pop(0) - O(n)# deque.popleft() - O(1) # Use appropriate collection for task

Bài Tập Thực Hành

Bài 1: Log Analyzer

Tạo log analyzer với Counter:

  • Count log levels
  • Find most common errors
  • Group by timestamp

Bài 2: Task Scheduler

Implement task queue với deque:

  • Add tasks with priority
  • Process FIFO/LIFO
  • Limit queue size

Bài 3: Cache System

Build LRU cache với OrderedDict:

  • Fixed capacity
  • Evict least recently used
  • Track hits/misses

Bài 4: Data Grouper

Group data với defaultdict:

  • Group by multiple keys
  • Nested grouping
  • Aggregate functions

Bài 5: Config Manager

Manage configs với ChainMap:

  • Multiple config sources
  • Priority handling
  • Override mechanism

Tóm Tắt

namedtuple: Immutable, named fields, tuple alternative
defaultdict: Auto-create missing keys with default
Counter: Count hashable objects, most_common()
deque: Fast appends/pops from both ends
OrderedDict: Insertion order, move_to_end()
ChainMap: Combine multiple dicts with priority

Bài Tiếp Theo

Bài 11: Async Programming Basics - async/await, asyncio, và concurrent programming! 🚀


Remember:

  • Choose right collection for task
  • namedtuple for data structures
  • Counter for frequency analysis
  • deque for queues
  • defaultdict to avoid KeyError! 🎯