logo
BIOSC 1540 - Spring 2025
CB09
Initializing search
    oasci/pitt-biosc1540-2025s
    • Home
    • Syllabus
    • Lectures
    • Assessments
    • CBytes
    • Team
    • Students
    oasci/pitt-biosc1540-2025s
    • Home
    • Syllabus
    • Lectures
    • Assessments
    • CBytes
      • CB01
      • CB02
      • CB03
      • CB04
      • CB05
      • CB06
      • CB07
      • CB08
      • CB09
    • Team
    • Students

    CBit 09

    Colab Image

    With our experience in using NumPy and MatPlotLib, we're prepared to tackle a real problem in computational structural biology! Here's the setting of our problem for today.

    We've run MD simulations on two proteins (Chain Q and Chain P) in two different pH environments -- pH 7, and pH 5. After these simulations, we've calculated the RMSF (the Root Mean Square Fluctuation) of each residue (from 21 to 293), of each protein, in each environment. This leaves us with data in the format of prsa2-rmsf.csv :

    • A column, Residue ID , that contains the ID of each residue
    • A column, Q pH 7 , that contains the RMSF of the residue in that row, for Chain Q, in pH 7
    • A column, P pH 7 , that contains the RMSF of the residue in that row, for Chain P, in pH 7
    • A column, Q pH 5 , that contains the RMSF of the residue in that row, for Chain Q, in pH 5
    • A column, P pH 5 , that contains the RMSF of the residue in that row, for Chain Q, in pH 5

    For example, the first row contains information relating to the 21st ( 21 in Residue ID ) residue in Chain Q and Chain P. The 21st residue of Chain Q has an RMSF of 2.880 in pH 7 and 4.354 in pH 5. The 21st residue of Chain P has an RMSF of 2.221 in pH 7 and 1.186 in pH 5.

    Our task is to recreate the following plots, using the afformentioned data:

    Alt text

    Alt text

    The first plot depicts an RMSF comparison between different pH settings for each residue for each chain, with each chain's plot being side-by-side. The second plot depicts the difference in RMSF between different pH settings for each residue for a given chain, with each chain having a line on the same plot (note also the horizontal dotted line!).

    The first thing we need to worry about is loading in our data. We'll want to load in the .CSV into NumPy array format. Don't stress about losing the column labels (however we'll still need to account for them -- you'll see)! We know that the order of columns goes Residue ID (index 0)-> Q pH 7 (index 1)-> P pH 7 (index 2)-> Q pH 5 (index 3)-> P pH 5 (index 4).

    In [ ]:
    Copied!
    import numpy as np
    
    # Load in the data here!
    # Look into the documentation of the NumPy function 'genfromtxt' -- .CSV files are COMMA (",") separated (delimited) values.
    
    import numpy as np # Load in the data here! # Look into the documentation of the NumPy function 'genfromtxt' -- .CSV files are COMMA (",") separated (delimited) values.

    We will also want to compute the difference between the pH conditions for each chain, and have those as columns on the same NumPy array. It's better to do this now, that way we can only worry about plot formatting (arguably the most finicky part) later.

    In [ ]:
    Copied!
    # You might be inclined to solve this with iteration -- and you'd be right in thinking that is a correct strategy!
    # However, remember that NumPy arrays (especially in this case, where they are numbers!) are **matrices**/**vectors**. That means
    # we can use matrix/vector math in place of iteration.
    
    # If you're wondering how you can add additional columns to an existing array, look into the NumPy function 'column_stack'.
    
    # You might be inclined to solve this with iteration -- and you'd be right in thinking that is a correct strategy! # However, remember that NumPy arrays (especially in this case, where they are numbers!) are **matrices**/**vectors**. That means # we can use matrix/vector math in place of iteration. # If you're wondering how you can add additional columns to an existing array, look into the NumPy function 'column_stack'.

    With our data properly loaded and processed, we can begin to worry about our figures. Let's first try and recreate the first figure -- the side-by-side plot.

    The hardest part will be making sure everything "fits" together correctly -- getting our plots side-by-side correctly, colors correct, etc. A lot of that may come down to fiddling around with MatPlotLib functions/methods, and that's perfectly okay! I have genuinely never gotten a plot correct the first time.

    What we can do now is get our "absolutes" out of the way -- features of the plot we know we will be able to just "drop in" to whatever MatPlotLib function we are using. Think of this as the "traits" of the plot: the title, the line labels (in the legend), the line colors, the data (both the x values and y values), etc.!

    In [ ]:
    Copied!
    # Create your variables here to hold your plot traits -- something like this!
    # y_label = ...
    # x_label_1 = ...
    # x_label_2 = ...
    # x_data = ...
    # chain_p_5_data = ...
    # x_ticks = ...
    # ... (there's more)
    # And so on!
    
    # Create your variables here to hold your plot traits -- something like this! # y_label = ... # x_label_1 = ... # x_label_2 = ... # x_data = ... # chain_p_5_data = ... # x_ticks = ... # ... (there's more) # And so on!

    And now, we can put things together! I won't give too many hints here, as this is great practice for looking into documentation and online resources -- 99% of the time taken to make a plot is looking at documentation/references of similar plots (at least for me).

    In [ ]:
    Copied!
    import matplotlib.pyplot as plt
    
    # For this, you'll want to look into the "plt.subplots" function and the "fig, ax = plt.subplots" notation. With this,
    # we can create our plots to be side-by-side!
    
    # Use plt.show() to see your plot!
    
    import matplotlib.pyplot as plt # For this, you'll want to look into the "plt.subplots" function and the "fig, ax = plt.subplots" notation. With this, # we can create our plots to be side-by-side! # Use plt.show() to see your plot!

    Now we'll be doing a similar process for the other plot, depicting the difference in RMSF due to pH environments for our two chains. We'll want to get our absolutes out of the way, and then work on making things fit together.

    In [ ]:
    Copied!
    # Define your absolutes here!
    
    # Define your absolutes here!

    And then we can put things together! This plot will be a little easier than the last plot, as we're only doing with one subplot! However, there is that pesky dashed horizontal line -- how can we do that in MatPlotLib?

    In [ ]:
    Copied!
    # Bring it all together here!
    
    # Bring it all together here!
    2025-03-28 10:54:59
    CC BY-NC-SA 4.0 by OASCI
    Made with Material for MkDocs