Bài 6: Comprehensions - Biểu Thức Tạo Nhanh

Mục Tiêu Bài Học

Sau khi hoàn thành bài này, bạn sẽ:

  • ✅ Master list comprehensions
  • ✅ Sử dụng dict comprehensions
  • ✅ Sử dụng set comprehensions
  • ✅ Làm việc với generator expressions
  • ✅ Hiểu nested comprehensions
  • ✅ Áp dụng comprehension patterns

Comprehensions Là Gì?

Comprehensions là cách ngắn gọn để tạo collections (list, dict, set) từ iterables.

Traditional vs Comprehension

# Traditional way - verbosesquares = []for i in range(10):    squares.append(i ** 2) # Comprehension - concisesquares = [i ** 2 for i in range(10)] print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

List Comprehensions

List comprehension tạo list từ iterable.

Basic Syntax

# Syntax: [expression for item in iterable]numbers = [x for x in range(10)]print(numbers)  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # With transformationsquares = [x ** 2 for x in range(10)]print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # String operationswords = ['hello', 'world', 'python']uppercase = [word.upper() for word in words]print(uppercase)  # ['HELLO', 'WORLD', 'PYTHON'] # Multiple operationslengths = [len(word) for word in words]print(lengths)  # [5, 5, 6]

With Filtering

# Syntax: [expression for item in iterable if condition] # Even numbers onlyevens = [x for x in range(20) if x % 2 == 0]print(evens)  # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] # Positive numbersnumbers = [-5, -3, -1, 0, 2, 4, 6]positives = [x for x in numbers if x > 0]print(positives)  # [2, 4, 6] # Long wordswords = ['a', 'hello', 'hi', 'world', 'python']long_words = [word for word in words if len(word) > 3]print(long_words)  # ['hello', 'world', 'python'] # Combined transformation and filteringeven_squares = [x ** 2 for x in range(10) if x % 2 == 0]print(even_squares)  # [0, 4, 16, 36, 64]

Multiple Conditions

# Multiple if conditions (AND)numbers = [x for x in range(100) if x % 2 == 0 if x % 5 == 0]print(numbers)  # [0, 10, 20, 30, 40, 50, 60, 70, 80, 90] # Equivalent to:numbers = [x for x in range(100) if x % 2 == 0 and x % 5 == 0] # if-else expressionresult = ['even' if x % 2 == 0 else 'odd' for x in range(10)]print(result)  # ['even', 'odd', 'even', 'odd', ...] # Complex conditionsvalues = [x if x > 0 else 0 for x in [-3, -1, 0, 2, 4]]print(values)  # [0, 0, 0, 2, 4]

Nested Loops

# Flatten 2D listmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]flat = [num for row in matrix for num in row]print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9] # Cartesian productcolors = ['red', 'blue']sizes = ['S', 'M', 'L']combinations = [(color, size) for color in colors for size in sizes]print(combinations)# [('red', 'S'), ('red', 'M'), ('red', 'L'), #  ('blue', 'S'), ('blue', 'M'), ('blue', 'L')] # Multiplication tabletable = [[i * j for j in range(1, 6)] for i in range(1, 6)]for row in table:    print(row)# [1, 2, 3, 4, 5]# [2, 4, 6, 8, 10]# [3, 6, 9, 12, 15]# [4, 8, 12, 16, 20]# [5, 10, 15, 20, 25]

Dict Comprehensions

Dict comprehension tạo dictionary từ iterable.

Basic Syntax

# Syntax: {key_expr: value_expr for item in iterable} # Square dictionarysquares = {x: x ** 2 for x in range(6)}print(squares)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25} # Character frequencytext = "hello"char_count = {char: text.count(char) for char in text}print(char_count)  # {'h': 1, 'e': 1, 'l': 2, 'o': 1} # Enumerate to dictwords = ['apple', 'banana', 'cherry']word_dict = {i: word for i, word in enumerate(words)}print(word_dict)  # {0: 'apple', 1: 'banana', 2: 'cherry'}

With Filtering

# Even numbers onlyeven_squares = {x: x ** 2 for x in range(10) if x % 2 == 0}print(even_squares)  # {0: 0, 2: 4, 4: 16, 6: 36, 8: 64} # Long wordswords = ['a', 'hello', 'hi', 'world', 'python']long_words = {word: len(word) for word in words if len(word) > 3}print(long_words)  # {'hello': 5, 'world': 5, 'python': 6}

Transform Existing Dict

# Swap keys and valuesoriginal = {'a': 1, 'b': 2, 'c': 3}swapped = {v: k for k, v in original.items()}print(swapped)  # {1: 'a', 2: 'b', 3: 'c'} # Transform valuesprices = {'apple': 10, 'banana': 5, 'cherry': 15}discounted = {item: price * 0.9 for item, price in prices.items()}print(discounted)  # {'apple': 9.0, 'banana': 4.5, 'cherry': 13.5} # Filter and transformscores = {'alice': 85, 'bob': 65, 'charlie': 92, 'david': 58}passed = {name: score for name, score in scores.items() if score >= 70}print(passed)  # {'alice': 85, 'charlie': 92}

From Two Lists

# Zip two lists into dictkeys = ['name', 'age', 'city']values = ['Alice', 25, 'NYC']person = {k: v for k, v in zip(keys, values)}print(person)  # {'name': 'Alice', 'age': 25, 'city': 'NYC'} # With transformationnames = ['alice', 'bob', 'charlie']ages = [25, 30, 35]people = {name.upper(): age for name, age in zip(names, ages)}print(people)  # {'ALICE': 25, 'BOB': 30, 'CHARLIE': 35}

Set Comprehensions

Set comprehension tạo set từ iterable (unique, unordered).

Basic Syntax

# Syntax: {expression for item in iterable} # Unique squaressquares = {x ** 2 for x in range(10)}print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81} # Unique characterstext = "hello world"chars = {char for char in text if char != ' '}print(chars)  # {'h', 'e', 'l', 'o', 'w', 'r', 'd'} # Remove duplicates with transformationnumbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]unique_squares = {x ** 2 for x in numbers}print(unique_squares)  # {1, 4, 9, 16}

With Filtering

# Even numbers onlyevens = {x for x in range(20) if x % 2 == 0}print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18} # Unique lengthswords = ['hello', 'world', 'hi', 'python', 'code']lengths = {len(word) for word in words if len(word) > 2}print(lengths)  # {5, 6, 4}

Set Operations

# Find common charactersword1 = "hello"word2 = "world"common = {c for c in word1 if c in word2}print(common)  # {'l', 'o'} # Different charactersdiff = {c for c in word1 if c not in word2}print(diff)  # {'h', 'e'}

Generator Expressions

Generator expression giống list comprehension nhưng dùng () và lazy evaluation.

Basic Syntax

# Generator expression - lazygen = (x ** 2 for x in range(10))print(gen)  # <generator object>print(type(gen))  # <class 'generator'> # Iterate oncefor num in gen:    print(num, end=' ')  # 0 1 4 9 16 25 36 49 64 81print() # Convert to list if neededgen2 = (x ** 2 for x in range(10))squares = list(gen2)print(squares)  # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Memory Efficiency

import sys # List comprehension - stores all valueslist_comp = [x ** 2 for x in range(10000)]print(f"List size: {sys.getsizeof(list_comp)} bytes")  # ~85KB # Generator expression - stores state onlygen_exp = (x ** 2 for x in range(10000))print(f"Generator size: {sys.getsizeof(gen_exp)} bytes")  # ~120 bytes # Use with sum (doesn't create list)total = sum(x ** 2 for x in range(10000))print(f"Sum: {total}")

With Functions

# any() and all() work well with generatorsnumbers = range(100) # Check if any evenhas_even = any(x % 2 == 0 for x in numbers)print(has_even)  # True # Check if all positivenumbers2 = [1, 2, 3, 4, 5]all_positive = all(x > 0 for x in numbers2)print(all_positive)  # True # max/min with generatormax_square = max(x ** 2 for x in range(10))print(max_square)  # 81 # String joiningwords = ['hello', 'world', 'python']sentence = ' '.join(word.upper() for word in words)print(sentence)  # HELLO WORLD PYTHON

Nested Comprehensions

2D Lists

# Create 2D matrixmatrix = [[i * j for j in range(5)] for i in range(5)]for row in matrix:    print(row)# [0, 0, 0, 0, 0]# [0, 1, 2, 3, 4]# [0, 2, 4, 6, 8]# [0, 3, 6, 9, 12]# [0, 4, 8, 12, 16] # Matrix transpositionoriginal = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]transposed = [[row[i] for row in original] for i in range(len(original[0]))]print(transposed)# [[1, 4, 7], [2, 5, 8], [3, 6, 9]] # Flatten with filteringmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]evens = [num for row in matrix for num in row if num % 2 == 0]print(evens)  # [2, 4, 6, 8]

Nested Dict Comprehensions

# Dict of dictsusers = ['alice', 'bob', 'charlie']user_data = {    user: {'id': i, 'active': True}    for i, user in enumerate(users, 1)}print(user_data)# {#     'alice': {'id': 1, 'active': True},#     'bob': {'id': 2, 'active': True},#     'charlie': {'id': 3, 'active': True}# } # Group by keydata = [    {'name': 'Alice', 'dept': 'IT'},    {'name': 'Bob', 'dept': 'HR'},    {'name': 'Charlie', 'dept': 'IT'}] by_dept = {    dept: [item['name'] for item in data if item['dept'] == dept]    for dept in {item['dept'] for item in data}}print(by_dept)# {'IT': ['Alice', 'Charlie'], 'HR': ['Bob']}

Real-world Examples

1. Data Filtering and Transformation

# Filter and transform user datausers = [    {'name': 'Alice', 'age': 25, 'active': True, 'score': 85},    {'name': 'Bob', 'age': 30, 'active': False, 'score': 65},    {'name': 'Charlie', 'age': 35, 'active': True, 'score': 92},    {'name': 'David', 'age': 28, 'active': True, 'score': 58}] # Active users onlyactive_users = [u['name'] for u in users if u['active']]print(active_users)  # ['Alice', 'Charlie', 'David'] # High scorershigh_scorers = [u['name'] for u in users if u['score'] >= 70]print(high_scorers)  # ['Alice', 'Charlie'] # Age mappingage_map = {u['name']: u['age'] for u in users}print(age_map)  # {'Alice': 25, 'Bob': 30, 'Charlie': 35, 'David': 28} # Complex filteringqualified = [    u['name'] for u in users    if u['active'] and u['score'] >= 70 and u['age'] < 30]print(qualified)  # ['Alice']

2. String Processing

# Parse CSV datacsv_data = """name,age,cityAlice,25,NYCBob,30,LACharlie,35,SF""" lines = csv_data.strip().split('\n')headers = lines[0].split(',') # Parse into list of dictsrecords = [    dict(zip(headers, line.split(',')))    for line in lines[1:]]print(records)# [#     {'name': 'Alice', 'age': '25', 'city': 'NYC'},#     {'name': 'Bob', 'age': '30', 'city': 'LA'},#     {'name': 'Charlie', 'age': '35', 'city': 'SF'}# ] # Extract specific fieldsnames = [r['name'] for r in records]print(names)  # ['Alice', 'Bob', 'Charlie'] # Clean and transformtext = "  Hello, World! Python is GREAT.  "words = [    word.strip('.,!').lower()    for word in text.split()    if word.strip('.,!')]print(words)  # ['hello', 'world', 'python', 'is', 'great']

3. File Processing

# Create test filewith open('data.txt', 'w') as f:    f.write("# Comment line\n")    f.write("value1=100\n")    f.write("\n")    f.write("value2=200\n")    f.write("# Another comment\n")    f.write("value3=300\n") # Read and parse configurationwith open('data.txt', 'r') as f:    config = {        line.split('=')[0]: int(line.split('=')[1])        for line in f        if line.strip() and not line.startswith('#') and '=' in line    } print(config)  # {'value1': 100, 'value2': 200, 'value3': 300}

4. Mathematical Operations

# Matrix operationsmatrix1 = [[1, 2, 3], [4, 5, 6]]matrix2 = [[7, 8, 9], [10, 11, 12]] # Element-wise additionresult = [    [a + b for a, b in zip(row1, row2)]    for row1, row2 in zip(matrix1, matrix2)]print(result)  # [[8, 10, 12], [14, 16, 18]] # Coordinates in rangecoords = [(x, y) for x in range(5) for y in range(5) if x + y < 5]print(coords[:10])# [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), #  (1, 0), (1, 1), (1, 2), (1, 3), (2, 0)] # Prime numbers (simple check)def is_prime(n):    if n < 2:        return False    return all(n % i != 0 for i in range(2, int(n ** 0.5) + 1)) primes = [n for n in range(2, 50) if is_prime(n)]print(primes)# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]

5. API Response Processing

# Simulate API responseapi_response = {    'users': [        {'id': 1, 'name': 'Alice', 'posts': [1, 2, 3]},        {'id': 2, 'name': 'Bob', 'posts': [4, 5]},        {'id': 3, 'name': 'Charlie', 'posts': []}    ],    'posts': [        {'id': 1, 'title': 'Post 1', 'likes': 10},        {'id': 2, 'title': 'Post 2', 'likes': 5},        {'id': 3, 'title': 'Post 3', 'likes': 15},        {'id': 4, 'title': 'Post 4', 'likes': 8},        {'id': 5, 'title': 'Post 5', 'likes': 12}    ]} # Extract user IDsuser_ids = [u['id'] for u in api_response['users']]print(user_ids)  # [1, 2, 3] # Users with postsactive_users = [    u['name'] for u in api_response['users']    if u['posts']]print(active_users)  # ['Alice', 'Bob'] # Popular postspopular = [    p['title'] for p in api_response['posts']    if p['likes'] >= 10]print(popular)  # ['Post 1', 'Post 3', 'Post 5'] # User-post mappinguser_post_count = {    u['name']: len(u['posts'])    for u in api_response['users']}print(user_post_count)# {'Alice': 3, 'Bob': 2, 'Charlie': 0}

Performance Considerations

import time # List comprehension vs for loopdef with_loop():    result = []    for i in range(1000000):        result.append(i ** 2)    return result def with_comprehension():    return [i ** 2 for i in range(1000000)] # Time comparisonstart = time.time()with_loop()print(f"Loop: {time.time() - start:.4f}s") start = time.time()with_comprehension()print(f"Comprehension: {time.time() - start:.4f}s")# Comprehension is usually faster! # Memory: Generator vs Listdef memory_test():    # Generator - memory efficient    gen = (x ** 2 for x in range(1000000))        # List - stores everything    lst = [x ** 2 for x in range(1000000)]        import sys    print(f"Generator: {sys.getsizeof(gen)} bytes")    print(f"List: {sys.getsizeof(lst)} bytes") memory_test()

Best Practices

# 1. Keep comprehensions simple and readable# Goodsquares = [x ** 2 for x in range(10)] # Bad - too complexresult = [x ** 2 for x in range(100) if x % 2 == 0 if x % 3 == 0 if x > 10] # 2. Use generator expressions for large data# Memory efficienttotal = sum(x ** 2 for x in range(1000000)) # 3. Break complex comprehensions into steps# Bad - hard to readresult = [[j for j in row if j > 0] for row in matrix if any(j > 0 for j in row)] # Betternon_zero_rows = [row for row in matrix if any(j > 0 for j in row)]result = [[j for j in row if j > 0] for row in non_zero_rows] # 4. Use meaningful variable names# Bad[x for x in y] # Good[user_name for user_name in user_list] # 5. Consider readability over brevity# Sometimes traditional loop is clearerresult = []for item in complex_data:    if some_complex_condition(item):        transformed = complex_transformation(item)        result.append(transformed)

Common Patterns

# 1. Filter and mapresult = [transform(x) for x in data if condition(x)] # 2. Flatten nested listflat = [item for sublist in nested for item in sublist] # 3. Unique valuesunique = list({x for x in data}) # 4. Dict from list of tuplesd = {k: v for k, v in pairs} # 5. Invert dictionaryinverted = {v: k for k, v in original.items()} # 6. Group by attributefrom itertools import groupbygrouped = {    key: list(group)    for key, group in groupby(sorted(data, key=key_func), key=key_func)} # 7. Conditional expressionresult = [x if condition else y for item in data] # 8. Multiple iterablespairs = [(a, b) for a in list1 for b in list2] # 9. Enumerate with comprehensionindexed = {i: item for i, item in enumerate(data)} # 10. Filter None valuesclean = [x for x in data if x is not None]

Bài Tập Thực Hành

Bài 1: Word Analysis

Tạo dict với word frequency từ text file.

Bài 2: Matrix Operations

Implement matrix multiplication using comprehensions.

Bài 3: Data Validation

Filter và validate list of dicts với multiple conditions.

Bài 4: Nested Data Processing

Parse và transform deeply nested JSON structure.

Bài 5: Performance Comparison

So sánh performance comprehension vs traditional loops.

Tóm Tắt

List comprehension: [expr for item in iter if cond]
Dict comprehension: {k: v for item in iter if cond}
Set comprehension: {expr for item in iter if cond}
Generator expression: (expr for item in iter if cond)
Nested: Multiple for clauses
Performance: Comprehensions usually faster than loops
Memory: Generators for large datasets
Readability: Keep it simple and clear

Bài Tiếp Theo

Bài 7: Regular Expressions - Pattern matching, regex operations, và text processing! 🚀


Remember:

  • Comprehensions are concise and readable
  • Use generators for memory efficiency
  • Keep comprehensions simple
  • Break complex logic into steps
  • Readability matters! 🎯