Bài 3: Generators và Iterators - Generators

Mục Tiêu Bài Học

Sau khi hoàn thành bài này, bạn sẽ:

  • ✅ Hiểu generators và yield keyword
  • ✅ Tạo generator functions
  • ✅ Sử dụng generator expressions
  • ✅ Hiểu yield from
  • ✅ Áp dụng lazy evaluation
  • ✅ So sánh memory efficiency

Generators Là Gì?

Generator là function sử dụng yield thay vì return. Generator tự động implement iteration protocol.

Generator vs Regular Function

# Regular function - returns all at oncedef regular_numbers(n):    """Return list of numbers."""    result = []    for i in range(n):        result.append(i)    return result # Generator function - yields one at a timedef generator_numbers(n):    """Yield numbers one by one."""    for i in range(n):        yield i # Usageprint(regular_numbers(5))  # [0, 1, 2, 3, 4] - list gen = generator_numbers(5)print(gen)  # <generator object> - not a list! # Iterate generatorfor num in gen:    print(num, end=' ')  # 0 1 2 3 4 print() # Convert to listgen2 = generator_numbers(5)print(list(gen2))  # [0, 1, 2, 3, 4]

How Generators Work

def simple_generator():    """Demonstrate generator execution."""    print("Starting")    yield 1    print("Between yields")    yield 2    print("Ending")    yield 3 # Create generator (doesn't execute yet!)gen = simple_generator() # First next() - runs until first yieldprint(next(gen))# Output:# Starting# 1 # Second next() - resumes and runs until next yieldprint(next(gen))# Output:# Between yields# 2 # Third next()print(next(gen))# Output:# Ending# 3 # Fourth next() - raises StopIteration# print(next(gen))  # StopIteration

Basic Generators

1. Simple Generator

def countdown(n):    """Countdown from n to 1."""    while n > 0:        yield n        n -= 1 # Usagefor num in countdown(5):    print(num, end=' ')  # 5 4 3 2 1 print() # Manual iterationgen = countdown(3)print(next(gen))  # 3print(next(gen))  # 2print(next(gen))  # 1# print(next(gen))  # StopIteration

2. Fibonacci Generator

def fibonacci(n):    """Generate first n Fibonacci numbers."""    a, b = 0, 1    count = 0        while count < n:        yield a        a, b = b, a + b        count += 1 # Usagefib = list(fibonacci(10))print(fib)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34] # Infinite Fibonaccidef fibonacci_infinite():    """Generate Fibonacci numbers infinitely."""    a, b = 0, 1    while True:        yield a        a, b = b, a + b # Use with break or isliceimport itertools fib_inf = fibonacci_infinite()first_ten = list(itertools.islice(fib_inf, 10))print(first_ten)  # [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

3. Range Generator

def my_range(start, end, step=1):    """Custom range generator."""    current = start        if step > 0:        while current < end:            yield current            current += step    else:        while current > end:            yield current            current += step # Usageprint(list(my_range(0, 10, 2)))  # [0, 2, 4, 6, 8]print(list(my_range(10, 0, -2)))  # [10, 8, 6, 4, 2] # Compare with built-inprint(list(range(0, 10, 2)))  # [0, 2, 4, 6, 8]

Generator Expressions

Generator expression giống list comprehension nhưng dùng () thay vì [].

Syntax

# List comprehension - creates entire list in memorysquares_list = [x**2 for x in range(10)]print(squares_list)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # Generator expression - lazy evaluationsquares_gen = (x**2 for x in range(10))print(squares_gen)  # <generator object> # Iteratefor sq in squares_gen:    print(sq, end=' ')  # 0 1 4 9 16 25 36 49 64 81 print() # Convert to listsquares_gen2 = (x**2 for x in range(10))print(list(squares_gen2))  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator Expression Examples

# Filter even numbersevens = (x for x in range(20) if x % 2 == 0)print(list(evens))  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] # String processingwords = ['hello', 'world', 'python', 'generator']uppercase = (word.upper() for word in words)print(list(uppercase))  # ['HELLO', 'WORLD', 'PYTHON', 'GENERATOR'] # Nested generator expressionmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]flattened = (num for row in matrix for num in row)print(list(flattened))  # [1, 2, 3, 4, 5, 6, 7, 8, 9] # Sum with generator (memory efficient)total = sum(x**2 for x in range(1000000))print(f"Sum: {total}")

Memory Efficiency

import sys # List comprehension - stores everythinglist_comp = [x**2 for x in range(10000)]print(f"List size: {sys.getsizeof(list_comp)} bytes")# List size: ~85KB # Generator expression - stores state onlygen_exp = (x**2 for x in range(10000))print(f"Generator size: {sys.getsizeof(gen_exp)} bytes")# Generator size: ~120 bytes # Memory difference is huge for large data!

yield from

yield from delegate to another generator/iterable.

Basic Usage

def generator1():    """First generator."""    yield 1    yield 2    yield 3 def generator2():    """Second generator."""    yield 4    yield 5    yield 6 # Without yield fromdef combine_old():    """Combine generators old way."""    for value in generator1():        yield value    for value in generator2():        yield value # With yield fromdef combine_new():    """Combine generators with yield from."""    yield from generator1()    yield from generator2() # Usageprint(list(combine_old()))  # [1, 2, 3, 4, 5, 6]print(list(combine_new()))  # [1, 2, 3, 4, 5, 6]

Flatten Nested Structure

def flatten(nested_list):    """Flatten nested list recursively."""    for item in nested_list:        if isinstance(item, list):            yield from flatten(item)  # Recursive!        else:            yield item # Usagenested = [1, [2, 3, [4, 5]], 6, [7, [8, 9]]]flat = list(flatten(nested))print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Tree Traversal

class TreeNode:    """Simple tree node."""        def __init__(self, value, children=None):        self.value = value        self.children = children or [] def traverse_tree(node):    """Traverse tree and yield all values."""    yield node.value    for child in node.children:        yield from traverse_tree(child)  # Recursive traversal # Create treetree = TreeNode(1, [    TreeNode(2, [        TreeNode(4),        TreeNode(5)    ]),    TreeNode(3, [        TreeNode(6),        TreeNode(7)    ])]) # Traversevalues = list(traverse_tree(tree))print(values)  # [1, 2, 4, 5, 3, 6, 7]

Advanced Generator Patterns

1. Pipeline Pattern

def read_lines(filename):    """Read file lines."""    with open(filename, 'r') as f:        for line in f:            yield line.strip() def filter_comments(lines):    """Filter out comment lines."""    for line in lines:        if not line.startswith('#'):            yield line def filter_blank(lines):    """Filter out blank lines."""    for line in lines:        if line:            yield line def to_uppercase(lines):    """Convert to uppercase."""    for line in lines:        yield line.upper() # Create test filewith open('config.txt', 'w') as f:    f.write("# Configuration\n")    f.write("setting1=value1\n")    f.write("\n")    f.write("setting2=value2\n")    f.write("# Comment\n")    f.write("setting3=value3\n") # Build pipelinelines = read_lines('config.txt')lines = filter_comments(lines)lines = filter_blank(lines)lines = to_uppercase(lines) # Execute pipeline (lazy!)for line in lines:    print(line)# SETTING1=VALUE1# SETTING2=VALUE2# SETTING3=VALUE3

2. Generator State Machine

def state_machine():    """Simple state machine generator."""    state = 'START'        while True:        if state == 'START':            print("State: START")            command = yield "Ready"            state = 'RUNNING' if command == 'start' else 'START'                elif state == 'RUNNING':            print("State: RUNNING")            command = yield "Processing"            if command == 'stop':                state = 'STOPPED'            elif command == 'pause':                state = 'PAUSED'                elif state == 'PAUSED':            print("State: PAUSED")            command = yield "Paused"            state = 'RUNNING' if command == 'resume' else 'PAUSED'                elif state == 'STOPPED':            print("State: STOPPED")            yield "Stopped"            break # Usagesm = state_machine()print(next(sm))           # State: START, Readyprint(sm.send('start'))   # State: RUNNING, Processingprint(sm.send('pause'))   # State: PAUSED, Pausedprint(sm.send('resume'))  # State: RUNNING, Processingprint(sm.send('stop'))    # State: STOPPED, Stopped

3. Sliding Window

def sliding_window(iterable, n):    """Generate sliding windows of size n."""    from collections import deque        window = deque(maxlen=n)        for item in iterable:        window.append(item)        if len(window) == n:            yield tuple(window) # Usagedata = [1, 2, 3, 4, 5, 6, 7, 8]for window in sliding_window(data, 3):    print(window)# (1, 2, 3)# (2, 3, 4)# (3, 4, 5)# (4, 5, 6)# (5, 6, 7)# (6, 7, 8)

4. Pairwise Iterator

def pairwise(iterable):    """Generate consecutive pairs."""    iterator = iter(iterable)        try:        prev = next(iterator)    except StopIteration:        return        for item in iterator:        yield (prev, item)        prev = item # Usagenumbers = [1, 2, 3, 4, 5]for pair in pairwise(numbers):    print(pair)# (1, 2)# (2, 3)# (3, 4)# (4, 5) # Calculate differencesdifferences = [b - a for a, b in pairwise(numbers)]print(differences)  # [1, 1, 1, 1]

5. Infinite Sequences

def infinite_counter(start=0, step=1):    """Infinite counter."""    current = start    while True:        yield current        current += step def infinite_repeater(value):    """Repeat value infinitely."""    while True:        yield value def infinite_cycle(iterable):    """Cycle through iterable infinitely."""    while True:        for item in iterable:            yield item # Usage with itertools.islicecounter = infinite_counter(10, 5)first_five = list(itertools.islice(counter, 5))print(first_five)  # [10, 15, 20, 25, 30] repeater = infinite_repeater('A')first_three = list(itertools.islice(repeater, 3))print(first_three)  # ['A', 'A', 'A'] cycler = infinite_cycle([1, 2, 3])first_ten = list(itertools.islice(cycler, 10))print(first_ten)  # [1, 2, 3, 1, 2, 3, 1, 2, 3, 1]

Real-world Examples

1. Large File Processing

def process_large_file(filename):    """Process large file line by line (memory efficient)."""        with open(filename, 'r') as f:        for line in f:            # Process line            if line.strip():                yield line.strip().upper() # Create large filewith open('large_file.txt', 'w') as f:    for i in range(100):        f.write(f"Line {i}\n") # Process efficiently (doesn't load entire file)for processed_line in itertools.islice(process_large_file('large_file.txt'), 5):    print(processed_line)# LINE 0# LINE 1# LINE 2# LINE 3# LINE 4

2. Data Stream Processing

def simulate_sensor_data():    """Simulate sensor data stream."""    import random    import time        while True:        temperature = random.uniform(20.0, 30.0)        humidity = random.uniform(40.0, 60.0)        yield {            'temperature': round(temperature, 2),            'humidity': round(humidity, 2),            'timestamp': time.time()        }        time.sleep(0.1)  # Simulate delay def filter_anomalies(data_stream, temp_threshold=28.0):    """Filter anomalous readings."""    for reading in data_stream:        if reading['temperature'] > temp_threshold:            yield reading # Usagesensor = simulate_sensor_data()anomalies = filter_anomalies(sensor, temp_threshold=27.0) # Process first 5 anomaliesfor reading in itertools.islice(anomalies, 5):    print(f"Alert! Temp: {reading['temperature']}°C, "          f"Humidity: {reading['humidity']}%")

3. Batch Processing

def batch_generator(iterable, batch_size):    """Generate batches from iterable."""    batch = []        for item in iterable:        batch.append(item)        if len(batch) == batch_size:            yield batch            batch = []        # Yield remaining items    if batch:        yield batch def process_records(records):    """Process database records in batches."""    # Simulate database records    all_records = [{'id': i, 'data': f'record_{i}'} for i in range(100)]        for batch in batch_generator(all_records, batch_size=10):        # Process batch        print(f"Processing batch of {len(batch)} records")        # Simulate batch processing        yield f"Processed {len(batch)} records" # Usagefor result in process_records(None):    print(result)# Processing batch of 10 records# Processed 10 records# (repeated 10 times)

4. Log Parser

def parse_log_file(filename):    """Parse log file and yield structured data."""    import re        # Pattern: [timestamp] level: message    pattern = r'\[(.*?)\] (\w+): (.*)'        with open(filename, 'r') as f:        for line in f:            match = re.match(pattern, line)            if match:                timestamp, level, message = match.groups()                yield {                    'timestamp': timestamp,                    'level': level,                    'message': message                } def filter_errors(log_entries):    """Filter error-level entries."""    for entry in log_entries:        if entry['level'] == 'ERROR':            yield entry # Create sample logwith open('app.log', 'w') as f:    f.write("[2025-10-27 10:00:00] INFO: Application started\n")    f.write("[2025-10-27 10:01:00] ERROR: Connection failed\n")    f.write("[2025-10-27 10:02:00] WARNING: Retrying connection\n")    f.write("[2025-10-27 10:03:00] ERROR: Timeout occurred\n")    f.write("[2025-10-27 10:04:00] INFO: Connection established\n") # Parse and filterlogs = parse_log_file('app.log')errors = filter_errors(logs) print("Error log entries:")for error in errors:    print(f"[{error['timestamp']}] {error['message']}")# [2025-10-27 10:01:00] Connection failed# [2025-10-27 10:03:00] Timeout occurred

5. Fibonacci Cache với Generator

def fibonacci_with_cache():    """Fibonacci generator with caching."""    cache = {0: 0, 1: 1}        def fib(n):        if n not in cache:            cache[n] = fib(n-1) + fib(n-2)        return cache[n]        n = 0    while True:        yield fib(n)        n += 1 # Usagefib_gen = fibonacci_with_cache() # Get first 20 Fibonacci numbers efficientlyfib_numbers = list(itertools.islice(fib_gen, 20))print(fib_numbers)# [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181]

Generator Methods

send()

def echo_generator():    """Generator that echoes values sent to it."""    value = None        while True:        received = yield value        if received is not None:            value = f"Echo: {received}"        else:            value = "Waiting..." # Usagegen = echo_generator()print(next(gen))           # None (prime generator)print(gen.send("Hello"))   # Echo: Helloprint(gen.send("World"))   # Echo: Worldprint(next(gen))           # Waiting...

throw()

def resilient_generator():    """Generator that handles exceptions."""    try:        while True:            value = yield "Running"    except ValueError as e:        yield f"Handled ValueError: {e}"    except Exception as e:        yield f"Handled Exception: {e}" # Usagegen = resilient_generator()print(next(gen))                        # Runningprint(gen.throw(ValueError, "Bad value"))  # Handled ValueError: Bad value

close()

def generator_with_cleanup():    """Generator with cleanup."""    print("Setup")    try:        while True:            yield "Value"    finally:        print("Cleanup") # Usagegen = generator_with_cleanup()print(next(gen))  # Setup, Valuegen.close()       # Cleanup

Performance Comparison

import timeimport sys # Test data sizen = 1_000_000 # List approachdef list_approach():    start = time.time()    data = [x**2 for x in range(n)]    result = sum(data)    end = time.time()        print(f"List approach:")    print(f"  Time: {end - start:.4f}s")    print(f"  Memory: {sys.getsizeof(data) / 1024 / 1024:.2f} MB")    print(f"  Result: {result}") # Generator approachdef generator_approach():    start = time.time()    data = (x**2 for x in range(n))    result = sum(data)    end = time.time()        gen = (x**2 for x in range(n))    print(f"\nGenerator approach:")    print(f"  Time: {end - start:.4f}s")    print(f"  Memory: {sys.getsizeof(gen) / 1024:.2f} KB")    print(f"  Result: {result}") # Run comparisonlist_approach()generator_approach() # Output example:# List approach:#   Time: 0.0845s#   Memory: 8.00 MB#   Result: 333332833333500000## Generator approach:#   Time: 0.0821s#   Memory: 0.11 KB#   Result: 333332833333500000

Best Practices

# 1. Use generators for large datasetsdef process_large_data():    """Process data lazily."""    for item in large_dataset:        yield process(item)  # Memory efficient # 2. Generator expressions for simple casessquares = (x**2 for x in range(1000))  # Better than list comp # 3. Use yield from for delegationdef combine_generators():    yield from gen1()    yield from gen2() # 4. Close generators when donegen = my_generator()try:    # Use generator    passfinally:    gen.close()  # Clean up # 5. Pipeline pattern for data processingdef pipeline(data):    """Chain generators."""    data = filter_data(data)    data = transform_data(data)    data = aggregate_data(data)    return data

Bài Tập Thực Hành

Bài 1: Prime Number Generator

Tạo generator generate prime numbers với Sieve of Eratosthenes.

Bài 2: File Merger

Tạo generator merge nhiều sorted files thành một sorted stream.

Bài 3: Moving Average

Tạo generator calculate moving average của data stream.

Bài 4: XML Parser

Tạo generator parse large XML file element by element.

Bài 5: Permutations Generator

Tạo generator generate all permutations của list.

Tóm Tắt

Generator: Function với yield, lazy evaluation
yield: Pause function và return value
Generator expression: (expr for item in iterable)
yield from: Delegate to sub-generator
Memory efficient: Store state, not data
Patterns: Pipeline, state machine, sliding window
Methods: send(), throw(), close()
Real-world: File processing, streams, batching

Kết Luận Cả 2 Parts

Part 1 - Iterators:

  • Iteration protocol (__iter__, __next__)
  • Custom iterators và iterable classes
  • Iterator patterns và built-in functions

Part 2 - Generators:

  • yield keyword và generator functions
  • Generator expressions
  • yield from và advanced patterns
  • Memory efficiency và performance

Key Takeaways:

  • Generators simpler than iterators
  • Use generators for lazy evaluation
  • Memory efficient for large data
  • Pipeline pattern for data processing
  • Generator expressions for simple cases

Bài Tiếp Theo

Bài 4: Context Managers - with statement, __enter__, __exit__, và contextlib! 🚀


Remember:

  • Generators = lazy iterators
  • yield pauses function
  • Generator expressions for simple cases
  • Pipeline pattern for data flow
  • Memory efficiency wins! 🎯