Bài 6: Comprehensions - Biểu Thức Tạo Nhanh
Mục Tiêu Bài Học
Sau khi hoàn thành bài này, bạn sẽ:
- ✅ Master list comprehensions
- ✅ Sử dụng dict comprehensions
- ✅ Sử dụng set comprehensions
- ✅ Làm việc với generator expressions
- ✅ Hiểu nested comprehensions
- ✅ Áp dụng comprehension patterns
Comprehensions Là Gì?
Comprehensions là cách ngắn gọn để tạo collections (list, dict, set) từ iterables.
Traditional vs Comprehension
# Traditional way - verbosesquares = []for i in range(10): squares.append(i ** 2) # Comprehension - concisesquares = [i ** 2 for i in range(10)] print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
List Comprehensions
List comprehension tạo list từ iterable.
Basic Syntax
# Syntax: [expression for item in iterable]numbers = [x for x in range(10)]print(numbers) # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] # With transformationsquares = [x ** 2 for x in range(10)]print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] # String operationswords = ['hello', 'world', 'python']uppercase = [word.upper() for word in words]print(uppercase) # ['HELLO', 'WORLD', 'PYTHON'] # Multiple operationslengths = [len(word) for word in words]print(lengths) # [5, 5, 6]
With Filtering
# Syntax: [expression for item in iterable if condition] # Even numbers onlyevens = [x for x in range(20) if x % 2 == 0]print(evens) # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18] # Positive numbersnumbers = [-5, -3, -1, 0, 2, 4, 6]positives = [x for x in numbers if x > 0]print(positives) # [2, 4, 6] # Long wordswords = ['a', 'hello', 'hi', 'world', 'python']long_words = [word for word in words if len(word) > 3]print(long_words) # ['hello', 'world', 'python'] # Combined transformation and filteringeven_squares = [x ** 2 for x in range(10) if x % 2 == 0]print(even_squares) # [0, 4, 16, 36, 64]
Multiple Conditions
# Multiple if conditions (AND)numbers = [x for x in range(100) if x % 2 == 0 if x % 5 == 0]print(numbers) # [0, 10, 20, 30, 40, 50, 60, 70, 80, 90] # Equivalent to:numbers = [x for x in range(100) if x % 2 == 0 and x % 5 == 0] # if-else expressionresult = ['even' if x % 2 == 0 else 'odd' for x in range(10)]print(result) # ['even', 'odd', 'even', 'odd', ...] # Complex conditionsvalues = [x if x > 0 else 0 for x in [-3, -1, 0, 2, 4]]print(values) # [0, 0, 0, 2, 4]
Nested Loops
# Flatten 2D listmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]flat = [num for row in matrix for num in row]print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9] # Cartesian productcolors = ['red', 'blue']sizes = ['S', 'M', 'L']combinations = [(color, size) for color in colors for size in sizes]print(combinations)# [('red', 'S'), ('red', 'M'), ('red', 'L'), # ('blue', 'S'), ('blue', 'M'), ('blue', 'L')] # Multiplication tabletable = [[i * j for j in range(1, 6)] for i in range(1, 6)]for row in table: print(row)# [1, 2, 3, 4, 5]# [2, 4, 6, 8, 10]# [3, 6, 9, 12, 15]# [4, 8, 12, 16, 20]# [5, 10, 15, 20, 25]
Dict Comprehensions
Dict comprehension tạo dictionary từ iterable.
Basic Syntax
# Syntax: {key_expr: value_expr for item in iterable} # Square dictionarysquares = {x: x ** 2 for x in range(6)}print(squares) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25} # Character frequencytext = "hello"char_count = {char: text.count(char) for char in text}print(char_count) # {'h': 1, 'e': 1, 'l': 2, 'o': 1} # Enumerate to dictwords = ['apple', 'banana', 'cherry']word_dict = {i: word for i, word in enumerate(words)}print(word_dict) # {0: 'apple', 1: 'banana', 2: 'cherry'}
With Filtering
# Even numbers onlyeven_squares = {x: x ** 2 for x in range(10) if x % 2 == 0}print(even_squares) # {0: 0, 2: 4, 4: 16, 6: 36, 8: 64} # Long wordswords = ['a', 'hello', 'hi', 'world', 'python']long_words = {word: len(word) for word in words if len(word) > 3}print(long_words) # {'hello': 5, 'world': 5, 'python': 6}
Transform Existing Dict
# Swap keys and valuesoriginal = {'a': 1, 'b': 2, 'c': 3}swapped = {v: k for k, v in original.items()}print(swapped) # {1: 'a', 2: 'b', 3: 'c'} # Transform valuesprices = {'apple': 10, 'banana': 5, 'cherry': 15}discounted = {item: price * 0.9 for item, price in prices.items()}print(discounted) # {'apple': 9.0, 'banana': 4.5, 'cherry': 13.5} # Filter and transformscores = {'alice': 85, 'bob': 65, 'charlie': 92, 'david': 58}passed = {name: score for name, score in scores.items() if score >= 70}print(passed) # {'alice': 85, 'charlie': 92}
From Two Lists
# Zip two lists into dictkeys = ['name', 'age', 'city']values = ['Alice', 25, 'NYC']person = {k: v for k, v in zip(keys, values)}print(person) # {'name': 'Alice', 'age': 25, 'city': 'NYC'} # With transformationnames = ['alice', 'bob', 'charlie']ages = [25, 30, 35]people = {name.upper(): age for name, age in zip(names, ages)}print(people) # {'ALICE': 25, 'BOB': 30, 'CHARLIE': 35}
Set Comprehensions
Set comprehension tạo set từ iterable (unique, unordered).
Basic Syntax
# Syntax: {expression for item in iterable} # Unique squaressquares = {x ** 2 for x in range(10)}print(squares) # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81} # Unique characterstext = "hello world"chars = {char for char in text if char != ' '}print(chars) # {'h', 'e', 'l', 'o', 'w', 'r', 'd'} # Remove duplicates with transformationnumbers = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]unique_squares = {x ** 2 for x in numbers}print(unique_squares) # {1, 4, 9, 16}
With Filtering
# Even numbers onlyevens = {x for x in range(20) if x % 2 == 0}print(evens) # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18} # Unique lengthswords = ['hello', 'world', 'hi', 'python', 'code']lengths = {len(word) for word in words if len(word) > 2}print(lengths) # {5, 6, 4}
Set Operations
# Find common charactersword1 = "hello"word2 = "world"common = {c for c in word1 if c in word2}print(common) # {'l', 'o'} # Different charactersdiff = {c for c in word1 if c not in word2}print(diff) # {'h', 'e'}
Generator Expressions
Generator expression giống list comprehension nhưng dùng () và lazy evaluation.
Basic Syntax
# Generator expression - lazygen = (x ** 2 for x in range(10))print(gen) # <generator object>print(type(gen)) # <class 'generator'> # Iterate oncefor num in gen: print(num, end=' ') # 0 1 4 9 16 25 36 49 64 81print() # Convert to list if neededgen2 = (x ** 2 for x in range(10))squares = list(gen2)print(squares) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Memory Efficiency
import sys # List comprehension - stores all valueslist_comp = [x ** 2 for x in range(10000)]print(f"List size: {sys.getsizeof(list_comp)} bytes") # ~85KB # Generator expression - stores state onlygen_exp = (x ** 2 for x in range(10000))print(f"Generator size: {sys.getsizeof(gen_exp)} bytes") # ~120 bytes # Use with sum (doesn't create list)total = sum(x ** 2 for x in range(10000))print(f"Sum: {total}")
With Functions
# any() and all() work well with generatorsnumbers = range(100) # Check if any evenhas_even = any(x % 2 == 0 for x in numbers)print(has_even) # True # Check if all positivenumbers2 = [1, 2, 3, 4, 5]all_positive = all(x > 0 for x in numbers2)print(all_positive) # True # max/min with generatormax_square = max(x ** 2 for x in range(10))print(max_square) # 81 # String joiningwords = ['hello', 'world', 'python']sentence = ' '.join(word.upper() for word in words)print(sentence) # HELLO WORLD PYTHON
Nested Comprehensions
2D Lists
# Create 2D matrixmatrix = [[i * j for j in range(5)] for i in range(5)]for row in matrix: print(row)# [0, 0, 0, 0, 0]# [0, 1, 2, 3, 4]# [0, 2, 4, 6, 8]# [0, 3, 6, 9, 12]# [0, 4, 8, 12, 16] # Matrix transpositionoriginal = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]transposed = [[row[i] for row in original] for i in range(len(original[0]))]print(transposed)# [[1, 4, 7], [2, 5, 8], [3, 6, 9]] # Flatten with filteringmatrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]evens = [num for row in matrix for num in row if num % 2 == 0]print(evens) # [2, 4, 6, 8]
Nested Dict Comprehensions
# Dict of dictsusers = ['alice', 'bob', 'charlie']user_data = { user: {'id': i, 'active': True} for i, user in enumerate(users, 1)}print(user_data)# {# 'alice': {'id': 1, 'active': True},# 'bob': {'id': 2, 'active': True},# 'charlie': {'id': 3, 'active': True}# } # Group by keydata = [ {'name': 'Alice', 'dept': 'IT'}, {'name': 'Bob', 'dept': 'HR'}, {'name': 'Charlie', 'dept': 'IT'}] by_dept = { dept: [item['name'] for item in data if item['dept'] == dept] for dept in {item['dept'] for item in data}}print(by_dept)# {'IT': ['Alice', 'Charlie'], 'HR': ['Bob']}
Real-world Examples
1. Data Filtering and Transformation
# Filter and transform user datausers = [ {'name': 'Alice', 'age': 25, 'active': True, 'score': 85}, {'name': 'Bob', 'age': 30, 'active': False, 'score': 65}, {'name': 'Charlie', 'age': 35, 'active': True, 'score': 92}, {'name': 'David', 'age': 28, 'active': True, 'score': 58}] # Active users onlyactive_users = [u['name'] for u in users if u['active']]print(active_users) # ['Alice', 'Charlie', 'David'] # High scorershigh_scorers = [u['name'] for u in users if u['score'] >= 70]print(high_scorers) # ['Alice', 'Charlie'] # Age mappingage_map = {u['name']: u['age'] for u in users}print(age_map) # {'Alice': 25, 'Bob': 30, 'Charlie': 35, 'David': 28} # Complex filteringqualified = [ u['name'] for u in users if u['active'] and u['score'] >= 70 and u['age'] < 30]print(qualified) # ['Alice']
2. String Processing
# Parse CSV datacsv_data = """name,age,cityAlice,25,NYCBob,30,LACharlie,35,SF""" lines = csv_data.strip().split('\n')headers = lines[0].split(',') # Parse into list of dictsrecords = [ dict(zip(headers, line.split(','))) for line in lines[1:]]print(records)# [# {'name': 'Alice', 'age': '25', 'city': 'NYC'},# {'name': 'Bob', 'age': '30', 'city': 'LA'},# {'name': 'Charlie', 'age': '35', 'city': 'SF'}# ] # Extract specific fieldsnames = [r['name'] for r in records]print(names) # ['Alice', 'Bob', 'Charlie'] # Clean and transformtext = " Hello, World! Python is GREAT. "words = [ word.strip('.,!').lower() for word in text.split() if word.strip('.,!')]print(words) # ['hello', 'world', 'python', 'is', 'great']
3. File Processing
# Create test filewith open('data.txt', 'w') as f: f.write("# Comment line\n") f.write("value1=100\n") f.write("\n") f.write("value2=200\n") f.write("# Another comment\n") f.write("value3=300\n") # Read and parse configurationwith open('data.txt', 'r') as f: config = { line.split('=')[0]: int(line.split('=')[1]) for line in f if line.strip() and not line.startswith('#') and '=' in line } print(config) # {'value1': 100, 'value2': 200, 'value3': 300}
4. Mathematical Operations
# Matrix operationsmatrix1 = [[1, 2, 3], [4, 5, 6]]matrix2 = [[7, 8, 9], [10, 11, 12]] # Element-wise additionresult = [ [a + b for a, b in zip(row1, row2)] for row1, row2 in zip(matrix1, matrix2)]print(result) # [[8, 10, 12], [14, 16, 18]] # Coordinates in rangecoords = [(x, y) for x in range(5) for y in range(5) if x + y < 5]print(coords[:10])# [(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), # (1, 0), (1, 1), (1, 2), (1, 3), (2, 0)] # Prime numbers (simple check)def is_prime(n): if n < 2: return False return all(n % i != 0 for i in range(2, int(n ** 0.5) + 1)) primes = [n for n in range(2, 50) if is_prime(n)]print(primes)# [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47]
5. API Response Processing
# Simulate API responseapi_response = { 'users': [ {'id': 1, 'name': 'Alice', 'posts': [1, 2, 3]}, {'id': 2, 'name': 'Bob', 'posts': [4, 5]}, {'id': 3, 'name': 'Charlie', 'posts': []} ], 'posts': [ {'id': 1, 'title': 'Post 1', 'likes': 10}, {'id': 2, 'title': 'Post 2', 'likes': 5}, {'id': 3, 'title': 'Post 3', 'likes': 15}, {'id': 4, 'title': 'Post 4', 'likes': 8}, {'id': 5, 'title': 'Post 5', 'likes': 12} ]} # Extract user IDsuser_ids = [u['id'] for u in api_response['users']]print(user_ids) # [1, 2, 3] # Users with postsactive_users = [ u['name'] for u in api_response['users'] if u['posts']]print(active_users) # ['Alice', 'Bob'] # Popular postspopular = [ p['title'] for p in api_response['posts'] if p['likes'] >= 10]print(popular) # ['Post 1', 'Post 3', 'Post 5'] # User-post mappinguser_post_count = { u['name']: len(u['posts']) for u in api_response['users']}print(user_post_count)# {'Alice': 3, 'Bob': 2, 'Charlie': 0}
Performance Considerations
import time # List comprehension vs for loopdef with_loop(): result = [] for i in range(1000000): result.append(i ** 2) return result def with_comprehension(): return [i ** 2 for i in range(1000000)] # Time comparisonstart = time.time()with_loop()print(f"Loop: {time.time() - start:.4f}s") start = time.time()with_comprehension()print(f"Comprehension: {time.time() - start:.4f}s")# Comprehension is usually faster! # Memory: Generator vs Listdef memory_test(): # Generator - memory efficient gen = (x ** 2 for x in range(1000000)) # List - stores everything lst = [x ** 2 for x in range(1000000)] import sys print(f"Generator: {sys.getsizeof(gen)} bytes") print(f"List: {sys.getsizeof(lst)} bytes") memory_test()
Best Practices
# 1. Keep comprehensions simple and readable# Goodsquares = [x ** 2 for x in range(10)] # Bad - too complexresult = [x ** 2 for x in range(100) if x % 2 == 0 if x % 3 == 0 if x > 10] # 2. Use generator expressions for large data# Memory efficienttotal = sum(x ** 2 for x in range(1000000)) # 3. Break complex comprehensions into steps# Bad - hard to readresult = [[j for j in row if j > 0] for row in matrix if any(j > 0 for j in row)] # Betternon_zero_rows = [row for row in matrix if any(j > 0 for j in row)]result = [[j for j in row if j > 0] for row in non_zero_rows] # 4. Use meaningful variable names# Bad[x for x in y] # Good[user_name for user_name in user_list] # 5. Consider readability over brevity# Sometimes traditional loop is clearerresult = []for item in complex_data: if some_complex_condition(item): transformed = complex_transformation(item) result.append(transformed)
Common Patterns
# 1. Filter and mapresult = [transform(x) for x in data if condition(x)] # 2. Flatten nested listflat = [item for sublist in nested for item in sublist] # 3. Unique valuesunique = list({x for x in data}) # 4. Dict from list of tuplesd = {k: v for k, v in pairs} # 5. Invert dictionaryinverted = {v: k for k, v in original.items()} # 6. Group by attributefrom itertools import groupbygrouped = { key: list(group) for key, group in groupby(sorted(data, key=key_func), key=key_func)} # 7. Conditional expressionresult = [x if condition else y for item in data] # 8. Multiple iterablespairs = [(a, b) for a in list1 for b in list2] # 9. Enumerate with comprehensionindexed = {i: item for i, item in enumerate(data)} # 10. Filter None valuesclean = [x for x in data if x is not None]
Bài Tập Thực Hành
Bài 1: Word Analysis
Tạo dict với word frequency từ text file.
Bài 2: Matrix Operations
Implement matrix multiplication using comprehensions.
Bài 3: Data Validation
Filter và validate list of dicts với multiple conditions.
Bài 4: Nested Data Processing
Parse và transform deeply nested JSON structure.
Bài 5: Performance Comparison
So sánh performance comprehension vs traditional loops.
Tóm Tắt
✅ List comprehension: [expr for item in iter if cond]
✅ Dict comprehension: {k: v for item in iter if cond}
✅ Set comprehension: {expr for item in iter if cond}
✅ Generator expression: (expr for item in iter if cond)
✅ Nested: Multiple for clauses
✅ Performance: Comprehensions usually faster than loops
✅ Memory: Generators for large datasets
✅ Readability: Keep it simple and clear
Bài Tiếp Theo
Bài 7: Regular Expressions - Pattern matching, regex operations, và text processing! 🚀
Remember:
- Comprehensions are concise and readable
- Use generators for memory efficiency
- Keep comprehensions simple
- Break complex logic into steps
- Readability matters! 🎯