CBit 06
In this session, you will develop key programming skills required for sequence alignment using the Needleman-Wunsch algorithm. You will work with:
- Lists and nested lists to store biological sequences.
- For loops and nested loops to manipulate matrices.
- Functions to write reusable and modular code.
- Debugging techniques to ensure correctness.
By the end of this session, you will implement a function that initializes the Needleman-Wunsch alignment matrix.
Tip: You can peruse Think Python for more material and help with Python.
Lists and Indexing ¶
In Python, lists are ordered collections of elements. Biological sequences (such as DNA, RNA, or proteins) can be stored in lists.
# A DNA sequence represented as a list
dna_sequence = ["A", "G", "T", "C"]
# Accessing elements using indexing
print(dna_sequence[0]) # Output: 'A' (first element)
print(dna_sequence[2]) # Output: 'T' (third element)
print(dna_sequence[-1]) # Output: 'C' (last element)
Practice
-
Create a list called
protein_sequence
that stores the amino acids"M"
,"A"
,"D"
,"E"
,"L"
. - Print the second and second-to-last elements of your list.
Nested lists (Matrices) ¶
A matrix is a two-dimensional data structure, where each row contains multiple elements. In Python, we represent a matrix as a list of lists.
# A 3x3 matrix represented as a list of lists
matrix = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
# Accessing elements
print(matrix[0][1]) # Output: 1 (first row, second column)
print(matrix[2][2]) # Output: 8 (third row, third column)
Practice
- Create a 4x4 matrix filled with zeros.
-
Modify the value at row index 2, column index 3 to be
5
, then print the updated matrix.
Using Nested Loops to Populate a Matrix ¶
To initialize a matrix, we need to iterate through its rows and columns. We use nested loops for this.
rows = 3
cols = 4
matrix = []
# Filling the matrix using nested loops
for i in range(rows):
row = [] # Create an empty row
for j in range(cols):
row.append(i + j) # Fill with a computed value
matrix.append(row) # Add row to the matrix
# Printing the matrix
for row in matrix:
print(row)
Practice
-
Modify the nested loop to create a 5×5 matrix where each value is
i * j
. - Print the resulting matrix.
Initializing the First Row and Column of Needleman-Wunsch ¶
The first row and first column of the Needleman-Wunsch matrix represent gap penalties. If the gap penalty is
-2
, the first row should be
[0, -2, -4, -6, ...]
.
def initialize_nw_matrix(seq_a: str, seq_b: str, gap_score: int) -> list[list[int]]:
"""
Initializes the Needleman-Wunsch alignment matrix.
"""
m = len(seq_a) + 1 # Number of rows
n = len(seq_b) + 1 # Number of columns
# Create a matrix filled with zeros
matrix = [[0 for _ in range(n)] for _ in range(m)]
# Initialize first row
for i in range(n):
matrix[0][i] = i * gap_score
# Initialize first column
for i in range(m):
matrix[i][0] = i * gap_score
return matrix
# Example usage
seq_a = "AGT"
seq_b = "AC"
gap_score = -2
matrix = initialize_nw_matrix(seq_a, seq_b, gap_score)
# Print the matrix
for row in matrix:
print(row)
Practice
- Modify the function to use a gap score of -1 instead of -2.
- Add a print statement inside the loop to track how the matrix is being filled.
Debugging and Understanding the Output ¶
Many errors in dynamic programming algorithms come from incorrect matrix indexing or loop iteration. To debug, print the matrix step by step.
Modify the function to print each step of matrix initialization:
def debug_initialize_nw_matrix(seq_a: str, seq_b: str, gap_score: int):
m = len(seq_a) + 1
n = len(seq_b) + 1
matrix = [[0 for _ in range(n)] for _ in range(m)]
# Initialize first row
for i in range(n):
matrix[0][i] = i * gap_score
print(f"Updated row 0, col {i}: {matrix[0]}") # Debugging print
# Initialize first column
for i in range(m):
matrix[i][0] = i * gap_score
print(
f"Updated col 0, row {i}: {[row[0] for row in matrix]}"
) # Debugging print
return matrix
# Test the debugging function
debug_initialize_nw_matrix("AGT", "AC", -2)
Practice
- Run the function and analyze how the first row and column are filled step by step.
-
Identify what happens if you mistakenly start the loop at
1
instead of0
.
Final Challenge ¶
Modify
initialize_nw_matrix()
to:
-
Accept a starting score instead of
0
. - Allow a custom function for computing penalties.
- Improve the format of printed output for readability.