Bài 3: Generators và Iterators - Generators
Mục Tiêu Bài Học
Sau khi hoàn thành bài này, bạn sẽ:
- ✅ Hiểu generators và yield keyword
- ✅ Tạo generator functions
- ✅ Sử dụng generator expressions
- ✅ Hiểu yield from
- ✅ Áp dụng lazy evaluation
- ✅ So sánh memory efficiency
Generators Là Gì?
Generator là function sử dụng yield thay vì return. Generator tự động implement iteration protocol.
Generator vs Regular Function
# Regular function - returns all at oncedef regular_numbers(n): """Return list of numbers.""" result = [] for i in range(n): result.append(i) return result # Generator function - yields one at a timedef generator_numbers(n): """Yield numbers one by one.""" for i in range(n): yield i # Usageprint(regular_numbers(5)) # [0, 1, 2, 3, 4] - list gen = generator_numbers(5)print(gen) # <generator object> - not a list! # Iterate generatorfor num in gen: print(num, end=' ') # 0 1 2 3 4 print() # Convert to listgen2 = generator_numbers(5)print(list(gen2)) # [0, 1, 2, 3, 4]
How Generators Work
def simple_generator(): """Demonstrate generator execution.""" print("Starting") yield 1 print("Between yields") yield 2 print("Ending") yield 3 # Create generator (doesn't execute yet!)gen = simple_generator() # First next() - runs until first yieldprint(next(gen))# Output:# Starting# 1 # Second next() - resumes and runs until next yieldprint(next(gen))# Output:# Between yields# 2 # Third next()print(next(gen))# Output:# Ending# 3 # Fourth next() - raises StopIteration# print(next(gen)) # StopIteration
Basic Generators
1. Simple Generator
def countdown(n): """Countdown from n to 1.""" while n > 0: yield n n -= 1 # Usagefor num in countdown(5): print(num, end=' ') # 5 4 3 2 1 print() # Manual iterationgen = countdown(3)print(next(gen)) # 3print(next(gen)) # 2print(next(gen)) # 1# print(next(gen)) # StopIteration
2. Fibonacci Generator
def fibonacci(n): """Generate first n Fibonacci numbers.""" a, b = 0, 1 count = 0 while count < n: yield a a, b = b, a + b count += 1 # Usagefib = list(fibonacci(10))print(fib) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] # Infinite Fibonaccidef fibonacci_infinite(): """Generate Fibonacci numbers infinitely.""" a, b = 0, 1 while True: yield a a, b = b, a + b # Use with break or isliceimport itertools fib_inf = fibonacci_infinite()first_ten = list(itertools.islice(fib_inf, 10))print(first_ten) # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
3. Range Generator
def my_range(start, end, step=1): """Custom range generator.""" current = start if step > 0: while current < end: yield current current += step else: while current > end: yield current current += step # Usageprint(list(my_range(0, 10, 2))) # [0, 2, 4, 6, 8]print(list(my_range(10, 0, -2))) # [10, 8, 6, 4, 2] # Compare with built-inprint(list(range(0, 10, 2))) # [0, 2, 4, 6, 8]
Generator Expressions
Generator expression giống list comprehension nhưng dùng () thay vì [].
Syntax
# List comprehension - creates entire list in memorysquares_list = [x**2 for x in range(10)]print(squares_list) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # Generator expression - lazy evaluationsquares_gen = (x**2 for x in range(10))print(squares_gen) # <generator object> # Iteratefor sq in squares_gen: print(sq, end=' ') # 0 1 4 9 16 25 36 49 64 81 print() # Convert to listsquares_gen2 = (x**2 for x in range(10))print(list(squares_gen2)) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Generator Expression Examples
# Filter even numbersevens = (x for x in range(20) if x % 2 == 0)print(list(evens)) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] # String processingwords = ['hello', 'world', 'python', 'generator']uppercase = (word.upper() for word in words)print(list(uppercase)) # ['HELLO', 'WORLD', 'PYTHON', 'GENERATOR'] # Nested generator expressionmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]flattened = (num for row in matrix for num in row)print(list(flattened)) # [1, 2, 3, 4, 5, 6, 7, 8, 9] # Sum with generator (memory efficient)total = sum(x**2 for x in range(1000000))print(f"Sum: {total}")
Memory Efficiency
import sys # List comprehension - stores everythinglist_comp = [x**2 for x in range(10000)]print(f"List size: {sys.getsizeof(list_comp)} bytes")# List size: ~85KB # Generator expression - stores state onlygen_exp = (x**2 for x in range(10000))print(f"Generator size: {sys.getsizeof(gen_exp)} bytes")# Generator size: ~120 bytes # Memory difference is huge for large data!
yield from
yield from delegate to another generator/iterable.
Basic Usage
def generator1(): """First generator.""" yield 1 yield 2 yield 3 def generator2(): """Second generator.""" yield 4 yield 5 yield 6 # Without yield fromdef combine_old(): """Combine generators old way.""" for value in generator1(): yield value for value in generator2(): yield value # With yield fromdef combine_new(): """Combine generators with yield from.""" yield from generator1() yield from generator2() # Usageprint(list(combine_old())) # [1, 2, 3, 4, 5, 6]print(list(combine_new())) # [1, 2, 3, 4, 5, 6]
Flatten Nested Structure
def flatten(nested_list): """Flatten nested list recursively.""" for item in nested_list: if isinstance(item, list): yield from flatten(item) # Recursive! else: yield item # Usagenested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]flat = list(flatten(nested))print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Tree Traversal
class TreeNode: """Simple tree node.""" def __init__(self, value, children=None): self.value = value self.children = children or [] def traverse_tree(node): """Traverse tree and yield all values.""" yield node.value for child in node.children: yield from traverse_tree(child) # Recursive traversal # Create treetree = TreeNode(1, [ TreeNode(2, [ TreeNode(4), TreeNode(5) ]), TreeNode(3, [ TreeNode(6), TreeNode(7) ])]) # Traversevalues = list(traverse_tree(tree))print(values) # [1, 2, 4, 5, 3, 6, 7]
Advanced Generator Patterns
1. Pipeline Pattern
def read_lines(filename): """Read file lines.""" with open(filename, 'r') as f: for line in f: yield line.strip() def filter_comments(lines): """Filter out comment lines.""" for line in lines: if not line.startswith('#'): yield line def filter_blank(lines): """Filter out blank lines.""" for line in lines: if line: yield line def to_uppercase(lines): """Convert to uppercase.""" for line in lines: yield line.upper() # Create test filewith open('config.txt', 'w') as f: f.write("# Configuration\n") f.write("setting1=value1\n") f.write("\n") f.write("setting2=value2\n") f.write("# Comment\n") f.write("setting3=value3\n") # Build pipelinelines = read_lines('config.txt')lines = filter_comments(lines)lines = filter_blank(lines)lines = to_uppercase(lines) # Execute pipeline (lazy!)for line in lines: print(line)# SETTING1=VALUE1# SETTING2=VALUE2# SETTING3=VALUE3
2. Generator State Machine
def state_machine(): """Simple state machine generator.""" state = 'START' while True: if state == 'START': print("State: START") command = yield "Ready" state = 'RUNNING' if command == 'start' else 'START' elif state == 'RUNNING': print("State: RUNNING") command = yield "Processing" if command == 'stop': state = 'STOPPED' elif command == 'pause': state = 'PAUSED' elif state == 'PAUSED': print("State: PAUSED") command = yield "Paused" state = 'RUNNING' if command == 'resume' else 'PAUSED' elif state == 'STOPPED': print("State: STOPPED") yield "Stopped" break # Usagesm = state_machine()print(next(sm)) # State: START, Readyprint(sm.send('start')) # State: RUNNING, Processingprint(sm.send('pause')) # State: PAUSED, Pausedprint(sm.send('resume')) # State: RUNNING, Processingprint(sm.send('stop')) # State: STOPPED, Stopped
3. Sliding Window
def sliding_window(iterable, n): """Generate sliding windows of size n.""" from collections import deque window = deque(maxlen=n) for item in iterable: window.append(item) if len(window) == n: yield tuple(window) # Usagedata = [1, 2, 3, 4, 5, 6, 7, 8]for window in sliding_window(data, 3): print(window)# (1, 2, 3)# (2, 3, 4)# (3, 4, 5)# (4, 5, 6)# (5, 6, 7)# (6, 7, 8)
4. Pairwise Iterator
def pairwise(iterable): """Generate consecutive pairs.""" iterator = iter(iterable) try: prev = next(iterator) except StopIteration: return for item in iterator: yield (prev, item) prev = item # Usagenumbers = [1, 2, 3, 4, 5]for pair in pairwise(numbers): print(pair)# (1, 2)# (2, 3)# (3, 4)# (4, 5) # Calculate differencesdifferences = [b - a for a, b in pairwise(numbers)]print(differences) # [1, 1, 1, 1]
5. Infinite Sequences
def infinite_counter(start=0, step=1): """Infinite counter.""" current = start while True: yield current current += step def infinite_repeater(value): """Repeat value infinitely.""" while True: yield value def infinite_cycle(iterable): """Cycle through iterable infinitely.""" while True: for item in iterable: yield item # Usage with itertools.islicecounter = infinite_counter(10, 5)first_five = list(itertools.islice(counter, 5))print(first_five) # [10, 15, 20, 25, 30] repeater = infinite_repeater('A')first_three = list(itertools.islice(repeater, 3))print(first_three) # ['A', 'A', 'A'] cycler = infinite_cycle([1, 2, 3])first_ten = list(itertools.islice(cycler, 10))print(first_ten) # [1, 2, 3, 1, 2, 3, 1, 2, 3, 1]
Real-world Examples
1. Large File Processing
def process_large_file(filename): """Process large file line by line (memory efficient).""" with open(filename, 'r') as f: for line in f: # Process line if line.strip(): yield line.strip().upper() # Create large filewith open('large_file.txt', 'w') as f: for i in range(100): f.write(f"Line {i}\n") # Process efficiently (doesn't load entire file)for processed_line in itertools.islice(process_large_file('large_file.txt'), 5): print(processed_line)# LINE 0# LINE 1# LINE 2# LINE 3# LINE 4
2. Data Stream Processing
def simulate_sensor_data(): """Simulate sensor data stream.""" import random import time while True: temperature = random.uniform(20.0, 30.0) humidity = random.uniform(40.0, 60.0) yield { 'temperature': round(temperature, 2), 'humidity': round(humidity, 2), 'timestamp': time.time() } time.sleep(0.1) # Simulate delay def filter_anomalies(data_stream, temp_threshold=28.0): """Filter anomalous readings.""" for reading in data_stream: if reading['temperature'] > temp_threshold: yield reading # Usagesensor = simulate_sensor_data()anomalies = filter_anomalies(sensor, temp_threshold=27.0) # Process first 5 anomaliesfor reading in itertools.islice(anomalies, 5): print(f"Alert! Temp: {reading['temperature']}°C, " f"Humidity: {reading['humidity']}%")
3. Batch Processing
def batch_generator(iterable, batch_size): """Generate batches from iterable.""" batch = [] for item in iterable: batch.append(item) if len(batch) == batch_size: yield batch batch = [] # Yield remaining items if batch: yield batch def process_records(records): """Process database records in batches.""" # Simulate database records all_records = [{'id': i, 'data': f'record_{i}'} for i in range(100)] for batch in batch_generator(all_records, batch_size=10): # Process batch print(f"Processing batch of {len(batch)} records") # Simulate batch processing yield f"Processed {len(batch)} records" # Usagefor result in process_records(None): print(result)# Processing batch of 10 records# Processed 10 records# (repeated 10 times)
4. Log Parser
def parse_log_file(filename): """Parse log file and yield structured data.""" import re # Pattern: [timestamp] level: message pattern = r'\[(.*?)\] (\w+): (.*)' with open(filename, 'r') as f: for line in f: match = re.match(pattern, line) if match: timestamp, level, message = match.groups() yield { 'timestamp': timestamp, 'level': level, 'message': message } def filter_errors(log_entries): """Filter error-level entries.""" for entry in log_entries: if entry['level'] == 'ERROR': yield entry # Create sample logwith open('app.log', 'w') as f: f.write("[2025-10-27 10:00:00] INFO: Application started\n") f.write("[2025-10-27 10:01:00] ERROR: Connection failed\n") f.write("[2025-10-27 10:02:00] WARNING: Retrying connection\n") f.write("[2025-10-27 10:03:00] ERROR: Timeout occurred\n") f.write("[2025-10-27 10:04:00] INFO: Connection established\n") # Parse and filterlogs = parse_log_file('app.log')errors = filter_errors(logs) print("Error log entries:")for error in errors: print(f"[{error['timestamp']}] {error['message']}")# [2025-10-27 10:01:00] Connection failed# [2025-10-27 10:03:00] Timeout occurred
5. Fibonacci Cache với Generator
def fibonacci_with_cache(): """Fibonacci generator with caching.""" cache = {0: 0, 1: 1} def fib(n): if n not in cache: cache[n] = fib(n-1) + fib(n-2) return cache[n] n = 0 while True: yield fib(n) n += 1 # Usagefib_gen = fibonacci_with_cache() # Get first 20 Fibonacci numbers efficientlyfib_numbers = list(itertools.islice(fib_gen, 20))print(fib_numbers)# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]
Generator Methods
send()
def echo_generator(): """Generator that echoes values sent to it.""" value = None while True: received = yield value if received is not None: value = f"Echo: {received}" else: value = "Waiting..." # Usagegen = echo_generator()print(next(gen)) # None (prime generator)print(gen.send("Hello")) # Echo: Helloprint(gen.send("World")) # Echo: Worldprint(next(gen)) # Waiting...
throw()
def resilient_generator(): """Generator that handles exceptions.""" try: while True: value = yield "Running" except ValueError as e: yield f"Handled ValueError: {e}" except Exception as e: yield f"Handled Exception: {e}" # Usagegen = resilient_generator()print(next(gen)) # Runningprint(gen.throw(ValueError, "Bad value")) # Handled ValueError: Bad value
close()
def generator_with_cleanup(): """Generator with cleanup.""" print("Setup") try: while True: yield "Value" finally: print("Cleanup") # Usagegen = generator_with_cleanup()print(next(gen)) # Setup, Valuegen.close() # Cleanup
Performance Comparison
import timeimport sys # Test data sizen = 1_000_000 # List approachdef list_approach(): start = time.time() data = [x**2 for x in range(n)] result = sum(data) end = time.time() print(f"List approach:") print(f" Time: {end - start:.4f}s") print(f" Memory: {sys.getsizeof(data) / 1024 / 1024:.2f} MB") print(f" Result: {result}") # Generator approachdef generator_approach(): start = time.time() data = (x**2 for x in range(n)) result = sum(data) end = time.time() gen = (x**2 for x in range(n)) print(f"\nGenerator approach:") print(f" Time: {end - start:.4f}s") print(f" Memory: {sys.getsizeof(gen) / 1024:.2f} KB") print(f" Result: {result}") # Run comparisonlist_approach()generator_approach() # Output example:# List approach:# Time: 0.0845s# Memory: 8.00 MB# Result: 333332833333500000## Generator approach:# Time: 0.0821s# Memory: 0.11 KB# Result: 333332833333500000
Best Practices
# 1. Use generators for large datasetsdef process_large_data(): """Process data lazily.""" for item in large_dataset: yield process(item) # Memory efficient # 2. Generator expressions for simple casessquares = (x**2 for x in range(1000)) # Better than list comp # 3. Use yield from for delegationdef combine_generators(): yield from gen1() yield from gen2() # 4. Close generators when donegen = my_generator()try: # Use generator passfinally: gen.close() # Clean up # 5. Pipeline pattern for data processingdef pipeline(data): """Chain generators.""" data = filter_data(data) data = transform_data(data) data = aggregate_data(data) return data
Bài Tập Thực Hành
Bài 1: Prime Number Generator
Tạo generator generate prime numbers với Sieve of Eratosthenes.
Bài 2: File Merger
Tạo generator merge nhiều sorted files thành một sorted stream.
Bài 3: Moving Average
Tạo generator calculate moving average của data stream.
Bài 4: XML Parser
Tạo generator parse large XML file element by element.
Bài 5: Permutations Generator
Tạo generator generate all permutations của list.
Tóm Tắt
✅ Generator: Function với yield, lazy evaluation
✅ yield: Pause function và return value
✅ Generator expression: (expr for item in iterable)
✅ yield from: Delegate to sub-generator
✅ Memory efficient: Store state, not data
✅ Patterns: Pipeline, state machine, sliding window
✅ Methods: send(), throw(), close()
✅ Real-world: File processing, streams, batching
Kết Luận Cả 2 Parts
Part 1 - Iterators:
- Iteration protocol (
__iter__,__next__) - Custom iterators và iterable classes
- Iterator patterns và built-in functions
Part 2 - Generators:
- yield keyword và generator functions
- Generator expressions
- yield from và advanced patterns
- Memory efficiency và performance
Key Takeaways:
- Generators simpler than iterators
- Use generators for lazy evaluation
- Memory efficient for large data
- Pipeline pattern for data processing
- Generator expressions for simple cases
Bài Tiếp Theo
Bài 4: Context Managers - with statement, __enter__, __exit__, và contextlib! 🚀
Remember:
- Generators = lazy iterators
- yield pauses function
- Generator expressions for simple cases
- Pipeline pattern for data flow
- Memory efficiency wins! 🎯