Bài 10: Collections Module
Mục Tiêu Bài Học
Sau khi hoàn thành bài này, bạn sẽ:
- ✅ Sử dụng namedtuple
- ✅ Làm việc với defaultdict
- ✅ Sử dụng Counter
- ✅ Hiểu OrderedDict
- ✅ Làm việc với deque và ChainMap
- ✅ Apply vào real-world scenarios
collections Module
Module collections cung cấp specialized container datatypes.
import collections # Available typesprint(dir(collections))# ['Counter', 'OrderedDict', 'ChainMap', 'defaultdict', 'deque', 'namedtuple', ...]
namedtuple - Named Tuples
namedtuple tạo tuple với named fields, giống struct hoặc simple class.
Creating namedtuple
from collections import namedtuple # Define a Point typePoint = namedtuple('Point', ['x', 'y']) # Create instancesp1 = Point(10, 20)p2 = Point(x=15, y=25) # Access by nameprint(p1.x) # 10print(p1.y) # 20 # Access by index (still a tuple)print(p1[0]) # 10print(p1[1]) # 20 # Unpackx, y = p1print(x, y) # 10 20 # Immutable# p1.x = 30 # AttributeError # String format for fieldsPerson = namedtuple('Person', 'name age email')user = Person('Alice', 25, '[email protected]')print(user.name) # Alice
namedtuple Methods
from collections import namedtuple Person = namedtuple('Person', ['name', 'age', 'city'])p = Person('Bob', 30, 'Hanoi') # _asdict() - convert to dictprint(p._asdict())# {'name': 'Bob', 'age': 30, 'city': 'Hanoi'} # _replace() - create new with changesp2 = p._replace(age=31)print(p2) # Person(name='Bob', age=31, city='Hanoi') # _fields - get field namesprint(Person._fields) # ('name', 'age', 'city') # _make() - create from iterabledata = ['Charlie', 28, 'HCMC']p3 = Person._make(data)print(p3) # Person(name='Charlie', age=28, city='HCMC')
Real-world Usage
from collections import namedtuple # Database recordsUser = namedtuple('User', ['id', 'username', 'email', 'active']) users = [ User(1, 'alice', '[email protected]', True), User(2, 'bob', '[email protected]', False), User(3, 'charlie', '[email protected]', True)] # Easy to work withfor user in users: if user.active: print(f"{user.username}: {user.email}") # Function return multiple valuesdef get_coordinates(): Coordinate = namedtuple('Coordinate', ['lat', 'lon']) return Coordinate(21.0285, 105.8542) # Hanoi coord = get_coordinates()print(f"Latitude: {coord.lat}, Longitude: {coord.lon}")
defaultdict - Dict with Default Values
defaultdict tự động tạo values cho keys không tồn tại.
Basic Usage
from collections import defaultdict # Regular dict - KeyErrorregular_dict = {}# print(regular_dict['key']) # KeyError # defaultdict with listdd = defaultdict(list)dd['fruits'].append('apple')dd['fruits'].append('banana')dd['vegetables'].append('carrot') print(dd)# defaultdict(<class 'list'>, {'fruits': ['apple', 'banana'], 'vegetables': ['carrot']}) # defaultdict with int (default 0)counts = defaultdict(int)counts['apples'] += 1counts['oranges'] += 2counts['apples'] += 1 print(counts)# defaultdict(<class 'int'>, {'apples': 2, 'oranges': 2}) # defaultdict with settags = defaultdict(set)tags['python'].add('programming')tags['python'].add('scripting')tags['django'].add('web') print(tags)# defaultdict(<class 'set'>, {'python': {'programming', 'scripting'}, 'django': {'web'}})
Custom Default Factory
from collections import defaultdict # Custom default valuedef default_value(): return 'N/A' dd = defaultdict(default_value)print(dd['missing']) # N/A # Lambda for custom defaultsdd2 = defaultdict(lambda: [])dd2['items'].append(1)print(dd2) # defaultdict(<function>, {'items': [1]}) # Nested defaultdictnested = defaultdict(lambda: defaultdict(int))nested['user1']['posts'] = 10nested['user1']['likes'] = 50nested['user2']['posts'] = 5 print(nested)# defaultdict(<function>, {'user1': defaultdict(<class 'int'>, {'posts': 10, 'likes': 50}), 'user2': defaultdict(<class 'int'>, {'posts': 5})})
Real-world Usage
from collections import defaultdict # Group items by categoryproducts = [ ('laptop', 'electronics'), ('phone', 'electronics'), ('desk', 'furniture'), ('chair', 'furniture'), ('tablet', 'electronics')] grouped = defaultdict(list)for product, category in products: grouped[category].append(product) print(grouped)# defaultdict(<class 'list'>, {'electronics': ['laptop', 'phone', 'tablet'], 'furniture': ['desk', 'chair']}) # Count occurrences by grouptext = "the quick brown fox jumps over the lazy dog"word_lengths = defaultdict(list) for word in text.split(): word_lengths[len(word)].append(word) print(word_lengths)# defaultdict(<class 'list'>, {3: ['the', 'fox', 'the', 'dog'], 5: ['quick', 'brown', 'jumps', 'lazy'], 4: ['over']})
Counter - Count Occurrences
Counter là dict subclass để count hashable objects.
Basic Usage
from collections import Counter # Count from listfruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']counter = Counter(fruits) print(counter)# Counter({'apple': 3, 'banana': 2, 'orange': 1}) # Count from stringtext = "hello world"char_counter = Counter(text)print(char_counter)# Counter({'l': 3, 'o': 2, 'h': 1, 'e': 1, ' ': 1, 'w': 1, 'r': 1, 'd': 1}) # Count from dictcounter2 = Counter({'apples': 3, 'oranges': 2})print(counter2) # Access countsprint(counter['apple']) # 3print(counter['grape']) # 0 (no KeyError!)
Counter Methods
from collections import Counter counter = Counter(['a', 'b', 'c', 'a', 'b', 'a']) # most_common(n) - get n most commonprint(counter.most_common(2))# [('a', 3), ('b', 2)] # elements() - iterator over elementsprint(list(counter.elements()))# ['a', 'a', 'a', 'b', 'b', 'c'] # update() - add countscounter.update(['a', 'd', 'd'])print(counter)# Counter({'a': 4, 'b': 2, 'd': 2, 'c': 1}) # subtract() - subtract countscounter.subtract(['a', 'b'])print(counter)# Counter({'a': 3, 'b': 1, 'd': 2, 'c': 1}) # Arithmetic operationsc1 = Counter(['a', 'b', 'c'])c2 = Counter(['a', 'b', 'd']) print(c1 + c2) # Counter({'a': 2, 'b': 2, 'c': 1, 'd': 1})print(c1 - c2) # Counter({'c': 1})print(c1 & c2) # Intersection: Counter({'a': 1, 'b': 1})print(c1 | c2) # Union: Counter({'a': 1, 'b': 1, 'c': 1, 'd': 1})
Real-world Usage
from collections import Counter # Word frequencytext = """Python is a programming language. Python is popular.Many developers use Python for web development.""" words = text.lower().split()word_freq = Counter(words) print("Most common words:")for word, count in word_freq.most_common(5): print(f"{word}: {count}") # Vote countingvotes = ['Alice', 'Bob', 'Alice', 'Charlie', 'Alice', 'Bob', 'Alice']vote_counter = Counter(votes) winner = vote_counter.most_common(1)[0]print(f"Winner: {winner[0]} with {winner[1]} votes") # Inventory managementinventory = Counter(laptop=5, phone=10, tablet=3)sold = Counter(laptop=2, phone=3) remaining = inventory - soldprint(f"Remaining inventory: {remaining}")# Counter({'phone': 7, 'laptop': 3, 'tablet': 3})
deque - Double-Ended Queue
deque (deck) là list-like container với fast appends/pops từ cả 2 ends.
Basic Operations
from collections import deque # Create dequedq = deque([1, 2, 3])print(dq) # deque([1, 2, 3]) # Append to rightdq.append(4)print(dq) # deque([1, 2, 3, 4]) # Append to leftdq.appendleft(0)print(dq) # deque([0, 1, 2, 3, 4]) # Pop from rightprint(dq.pop()) # 4print(dq) # deque([0, 1, 2, 3]) # Pop from leftprint(dq.popleft()) # 0print(dq) # deque([1, 2, 3]) # Rotatedq.rotate(1) # Rotate rightprint(dq) # deque([3, 1, 2]) dq.rotate(-1) # Rotate leftprint(dq) # deque([1, 2, 3]) # Extenddq.extend([4, 5])print(dq) # deque([1, 2, 3, 4, 5]) dq.extendleft([0, -1]) # Extends in reverseprint(dq) # deque([-1, 0, 1, 2, 3, 4, 5])
maxlen - Bounded Deque
from collections import deque # Fixed size dequedq = deque(maxlen=3) dq.append(1)dq.append(2)dq.append(3)print(dq) # deque([1, 2, 3], maxlen=3) # Adding more removes from leftdq.append(4)print(dq) # deque([2, 3, 4], maxlen=3) dq.append(5)print(dq) # deque([3, 4, 5], maxlen=3)
Real-world Usage
from collections import deque # Recent history (last N items)class BrowsingHistory: def __init__(self, max_size=5): self.history = deque(maxlen=max_size) def visit(self, url): self.history.append(url) def get_recent(self): return list(self.history) browser = BrowsingHistory(max_size=3)browser.visit('google.com')browser.visit('github.com')browser.visit('stackoverflow.com')browser.visit('python.org') print(browser.get_recent())# ['github.com', 'stackoverflow.com', 'python.org'] # Moving averagedef moving_average(data, window_size): dq = deque(maxlen=window_size) averages = [] for value in data: dq.append(value) if len(dq) == window_size: avg = sum(dq) / window_size averages.append(avg) return averages data = [10, 20, 30, 40, 50, 60]print(moving_average(data, 3))# [20.0, 30.0, 40.0, 50.0] # Task queuetask_queue = deque() # Add taskstask_queue.append('task1')task_queue.append('task2')task_queue.append('task3') # Process tasks (FIFO)while task_queue: task = task_queue.popleft() print(f"Processing: {task}")
OrderedDict - Ordered Dictionary
OrderedDict maintains insertion order (Python 3.7+ dict cũng ordered, nhưng OrderedDict có extra features).
Basic Usage
from collections import OrderedDict # Regular dict (Python 3.7+ is ordered)regular = {'b': 2, 'a': 1, 'c': 3}print(regular) # {'b': 2, 'a': 1, 'c': 3} # OrderedDictordered = OrderedDict()ordered['b'] = 2ordered['a'] = 1ordered['c'] = 3print(ordered)# OrderedDict([('b', 2), ('a', 1), ('c', 3)]) # move_to_end()ordered.move_to_end('b')print(ordered)# OrderedDict([('a', 1), ('c', 3), ('b', 2)]) ordered.move_to_end('a', last=False) # Move to beginningprint(ordered)# OrderedDict([('a', 1), ('c', 3), ('b', 2)]) # popitem() - LIFO by defaultlast_item = ordered.popitem()print(last_item) # ('b', 2) ordered.popitem(last=False) # FIFOprint(ordered) # OrderedDict([('c', 3)])
Real-world Usage
from collections import OrderedDict # LRU Cache implementationclass LRUCache: def __init__(self, capacity): self.cache = OrderedDict() self.capacity = capacity def get(self, key): if key not in self.cache: return None # Move to end (most recently used) self.cache.move_to_end(key) return self.cache[key] def put(self, key, value): if key in self.cache: # Update and move to end self.cache.move_to_end(key) self.cache[key] = value # Remove least recently used if over capacity if len(self.cache) > self.capacity: self.cache.popitem(last=False) cache = LRUCache(3)cache.put('a', 1)cache.put('b', 2)cache.put('c', 3)print(cache.cache) # OrderedDict([('a', 1), ('b', 2), ('c', 3)]) cache.get('a') # Access 'a'cache.put('d', 4) # Add 'd', removes 'b' (least recently used)print(cache.cache) # OrderedDict([('c', 3), ('a', 1), ('d', 4)])
ChainMap - Combine Multiple Dicts
ChainMap groups multiple dicts into single view.
Basic Usage
from collections import ChainMap # Multiple dictsdefaults = {'color': 'blue', 'size': 'medium'}user_prefs = {'color': 'red'} # Combine (user_prefs has priority)config = ChainMap(user_prefs, defaults) print(config['color']) # red (from user_prefs)print(config['size']) # medium (from defaults) # View all mapsprint(config.maps)# [{'color': 'red'}, {'color': 'blue', 'size': 'medium'}] # Add new mapadmin_prefs = {'admin': True}config = config.new_child(admin_prefs)print(config['admin']) # True
Real-world Usage
from collections import ChainMapimport os # Configuration hierarchyclass Config: def __init__(self): # Priority: command line > env vars > defaults self.defaults = { 'host': 'localhost', 'port': 8000, 'debug': False } self.env_vars = { k.lower(): v for k, v in os.environ.items() if k.startswith('APP_') } self.cli_args = {} # Would be populated from argparse self.config = ChainMap( self.cli_args, self.env_vars, self.defaults ) def get(self, key): return self.config.get(key) def set_cli_arg(self, key, value): self.cli_args[key] = value config = Config()print(config.get('host')) # localhost (from defaults) config.set_cli_arg('host', '0.0.0.0')print(config.get('host')) # 0.0.0.0 (from CLI, highest priority)
Real-world Examples
1. Text Analysis Tool
from collections import Counter, defaultdict class TextAnalyzer: def __init__(self, text): self.text = text.lower() self.words = self.text.split() def word_frequency(self, top_n=10): """Get most common words.""" counter = Counter(self.words) return counter.most_common(top_n) def words_by_length(self): """Group words by length.""" grouped = defaultdict(list) for word in set(self.words): grouped[len(word)].append(word) return dict(grouped) def char_frequency(self): """Get character frequency.""" return Counter(self.text) text = "Python is great. Python is powerful. Many people use Python."analyzer = TextAnalyzer(text) print("Top words:", analyzer.word_frequency(3))print("By length:", analyzer.words_by_length())
2. Request Rate Limiter
from collections import dequefrom datetime import datetime, timedelta class RateLimiter: def __init__(self, max_requests, time_window_seconds): self.max_requests = max_requests self.time_window = timedelta(seconds=time_window_seconds) self.requests = deque() def allow_request(self): """Check if request is allowed.""" now = datetime.now() # Remove old requests outside time window while self.requests and now - self.requests[0] > self.time_window: self.requests.popleft() # Check if under limit if len(self.requests) < self.max_requests: self.requests.append(now) return True return False # Allow 3 requests per 10 secondslimiter = RateLimiter(max_requests=3, time_window_seconds=10) for i in range(5): if limiter.allow_request(): print(f"Request {i+1}: Allowed") else: print(f"Request {i+1}: Rate limit exceeded")
3. Command History
from collections import deque class CommandHistory: def __init__(self, max_size=10): self.history = deque(maxlen=max_size) self.current_pos = -1 def add(self, command): """Add command to history.""" self.history.append(command) self.current_pos = len(self.history) def previous(self): """Get previous command.""" if self.current_pos > 0: self.current_pos -= 1 return self.history[self.current_pos] return None def next(self): """Get next command.""" if self.current_pos < len(self.history) - 1: self.current_pos += 1 return self.history[self.current_pos] return None def show(self): """Show all history.""" for i, cmd in enumerate(self.history): print(f"{i+1}. {cmd}") history = CommandHistory(max_size=5)history.add("ls -la")history.add("cd /home")history.add("python main.py") history.show()
Best Practices
from collections import namedtuple, defaultdict, Counter, deque # 1. Use namedtuple for simple data structuresPoint = namedtuple('Point', ['x', 'y'])p = Point(10, 20) # 2. Use defaultdict to avoid KeyErrorcounts = defaultdict(int)counts['key'] += 1 # No need to check if exists # 3. Use Counter for countingwords = ['a', 'b', 'a', 'c']counter = Counter(words) # 4. Use deque for queues and stacksqueue = deque()queue.append(1)queue.popleft() # O(1) - efficient # 5. Performance comparison# list.pop(0) - O(n)# deque.popleft() - O(1) # Use appropriate collection for task
Bài Tập Thực Hành
Bài 1: Log Analyzer
Tạo log analyzer với Counter:
- Count log levels
- Find most common errors
- Group by timestamp
Bài 2: Task Scheduler
Implement task queue với deque:
- Add tasks with priority
- Process FIFO/LIFO
- Limit queue size
Bài 3: Cache System
Build LRU cache với OrderedDict:
- Fixed capacity
- Evict least recently used
- Track hits/misses
Bài 4: Data Grouper
Group data với defaultdict:
- Group by multiple keys
- Nested grouping
- Aggregate functions
Bài 5: Config Manager
Manage configs với ChainMap:
- Multiple config sources
- Priority handling
- Override mechanism
Tóm Tắt
✅ namedtuple: Immutable, named fields, tuple alternative
✅ defaultdict: Auto-create missing keys with default
✅ Counter: Count hashable objects, most_common()
✅ deque: Fast appends/pops from both ends
✅ OrderedDict: Insertion order, move_to_end()
✅ ChainMap: Combine multiple dicts with priority
Bài Tiếp Theo
Bài 11: Async Programming Basics - async/await, asyncio, và concurrent programming! 🚀
Remember:
- Choose right collection for task
- namedtuple for data structures
- Counter for frequency analysis
- deque for queues
- defaultdict to avoid KeyError! 🎯