CBit 07

In this session, you will develop key skills for using NumPy for scientific Python.

Tip: You can peruse the following resources to help you with Python.

Getting started with NumPy ¶

NumPy gives us many tools relating to arrays/matrices -- N-dimensional tables that we can use to hold values. These are incredibly useful not only for storage purposes (having everything together in one place is nice!), but NumPy also optimizes operations on matrices to be incredibly fast . This makes NumPy arrays an incredible tool -- you have powerful storage, powerful methods associated with that storage, and on top of it all, it's quick!

But where the heck do we start?

Before we can start working with NumPy, we first need to import it into our Python environment. This is done using the following statement:

In [ ]:

         
            Copied!
           
         import numpy as np

         import numpy as np

This line of code does a few important things:

It loads the NumPy library, making all of its powerful functionality available for use in our script or interactive environment.
The as np part is an alias , meaning that instead of typing out numpy every time we need to access one of its functions, we can simply use np . This makes our code cleaner and easier to read.
This alias is a convention in the Python community—most tutorials, documentation, and libraries assume that NumPy is imported as np . Following this convention makes your code more readable and understandable to others.

NumPy ( np ) allows you to create arrays based off of lists. This is great! We can apply the same kind of list fundamentals that we know, but now to this new data structure. For example, say I wanted to create a 1D np array, with the values 0 , 10 , 23 , 547384 , 28 . I can do so like this:

In [ ]:

         
            Copied!
           
         my_values = []
my_np_array = np.array(my_values)

         my_values = []
my_np_array = np.array(my_values)

Pretty much any list can become an np array, including lists of lists . This means that np arrays don't have to just be arrays, but they can become MATRICES! Now they can become REALLY powerful!

In [ ]:

         
            Copied!
           
         # Create a 2x3 `np` array, with R1 = 0, 1, 2 and R2 = 2, 1, 0
my_values = [[0, 1, 2], ["What should go here?"]]
my_np_array = np.array(my_values)
print(my_np_array)

         # Create a 2x3 `np` array, with R1 = 0, 1, 2 and R2 = 2, 1, 0
my_values = [[0, 1, 2], ["What should go here?"]]
my_np_array = np.array(my_values)
print(my_np_array)

With that in mind, let's tackle an np array creation problems.

In [ ]:

         
            Copied!
           
         # Give me a 3 x 3 `np` array, where each COLUMN follows the sequence 0, 1, 2.
# To begin, think of what this would look like! Draw it out!

# We know lists of lists are essentially lists of rows that we stack together -- lets break things down into ROWS.
row_1 = [] # What goes in each of these?
row_2 = []
row_3 = []
# We'll then want to combine them somehow to actually make our 3x3 matrix (with Python lists).
my_combined_rows = [] # What should go inside of the list?

# And now, let's make it into an `np` array!
my_epic_array = # And how can I make my 2D array?

         # Give me a 3 x 3 `np` array, where each COLUMN follows the sequence 0, 1, 2.
# To begin, think of what this would look like! Draw it out!

# We know lists of lists are essentially lists of rows that we stack together -- lets break things down into ROWS.
row_1 = [] # What goes in each of these?
row_2 = []
row_3 = []
# We'll then want to combine them somehow to actually make our 3x3 matrix (with Python lists).
my_combined_rows = [] # What should go inside of the list?

# And now, let's make it into an `np` array!
my_epic_array = # And how can I make my 2D array?

Storing stuff in arrays is great and all, but eventually, we'll want to get those items back! How can we do that? Well, just like with lists, you can use INDEXING!!! . If list indexing is still a little fuzzy, don't worry! np uses a lot of the same concepts, and we'll be going through them 🥳.

Python is a 0-indexed programming language -- most are (unlike R.) -- which means when we store items in data structures such as lists (or arrays!), they'll have a number associated with their position, starting from 0. Think of it like a house number.

In [ ]:

         
            Copied!
           
         my_amazing_list = np.array(["Cesar", "is", "a", "TA"])

# If we're starting at 0, that means in this lift of FOUR ELEMENTS, we'll have indices 0, 1, 2, and 3. In that exact order!
# The third item in this array is index #____
# The first item in this array is index #____
# The index of "TA" is index #____

# There are cool hardware reasons for why this is, so if you're interested in learning a little more, shoot me an email!

         my_amazing_list = np.array(["Cesar", "is", "a", "TA"])

# If we're starting at 0, that means in this lift of FOUR ELEMENTS, we'll have indices 0, 1, 2, and 3. In that exact order!
# The third item in this array is index #____
# The first item in this array is index #____
# The index of "TA" is index #____

# There are cool hardware reasons for why this is, so if you're interested in learning a little more, shoot me an email!

Knowing the position is half the battle -- now we want to be able to actually grab those items! We can do that with array indexing . The syntax for this is:

In [ ]:

         
            Copied!
           
         # Ignore this!
the_house_number = 0
where_im_storing_it = np.array(["random shmrandom!"])

#######################################################
# TADA!
the_item_i_want = where_im_storing_it[the_house_number]
#######################################################

# We can then print it out!
print(the_item_i_want)

         # Ignore this!
the_house_number = 0
where_im_storing_it = np.array(["random shmrandom!"])

#######################################################
# TADA!
the_item_i_want = where_im_storing_it[the_house_number]
#######################################################

# We can then print it out!
print(the_item_i_want)

Those brackets are the bread and butter -- when we put them next to where we're storing our items (our np array 😀), it's telling our computer "Hey! I want an item from this storage!". Let's try some examples!

In [ ]:

         
            Copied!
           
         # Let's create an `np` array that has these items: "villainry", "hijinks", "evil", "mischief"
the_evil_array = # Recall how we created them up above!

# Now, we're going to start getting some items!
the_first_item = the_evil_array[0]
the_second_item = the_evil_array["What goes here?"]
the_last_item = # What goes here?

         # Let's create an `np` array that has these items: "villainry", "hijinks", "evil", "mischief"
the_evil_array = # Recall how we created them up above!

# Now, we're going to start getting some items!
the_first_item = the_evil_array[0]
the_second_item = the_evil_array["What goes here?"]
the_last_item = # What goes here?

But you might be wondering -- woah woah woah. This seems to be only for 1D array! In computational biology, I'll definitely have more dimensions than that (you probably will)! Well, we with np , we're able to index through SEVERAL DIMENSIONS OF AN ARRAY!

np lets you do this by comma-separating indices in the bracket notation. Think of each individual index as corresponding to a dimension in your array -- the first is rows, the second columns, the third depth, and so on.

In [ ]:

         
           
            Copied!
           
           
            
            
            
            
           
          

        

         # Let's create a 2x3 `np` array -- the first row should contain the values
# "Programming", "is", "awesome", and the second row should contain the
# values "I", "love", "BIOSC1540".
the_awesome_array = # How should we do this? Recall the example from above!

# We should then have an array that looks a little like this (figuratively):
#          C0             C1         C2
#     -----------------------------------------
#  R0 | "Programming" |  "is"  |   "awesome"  |
#     -----------------------------------------
#  R1 |      "I"      | "love" | "BIOSC 1540" |
#     -----------------------------------------

# I can then grab specific values from the cells! Say I want "I",
# which is the second (1st index) row, and the first (what index?) column.
item_i_want = the_awesome_array[1, 0]
print(item_i_want)

# BOOM!
# How would I grab "BIOSC 1540"? Let's think it through!
row_index = # What row is it in?
col_index = # what column is it in?
item_i_want = the_awesome_array[row_index, col_index]
print(item_i_want)

# AWESOME SAUCE! Let's do one more!
# How could I grab "awesome"?
item_i_want = # What do I put here?
print(item_i_want)

# WOOOHOOOO!!!!!
# Remember that this also applies for any N dimensions in an array, up to four
# quadrillion (or however big a number your computer can handle). We usually deal with 2D
# but if you're curious, you'd just keep comma-separating indices for each dimension. 

        

         # Let's create a 2x3 `np` array -- the first row should contain the values
# "Programming", "is", "awesome", and the second row should contain the
# values "I", "love", "BIOSC1540".
the_awesome_array = # How should we do this? Recall the example from above!

# We should then have an array that looks a little like this (figuratively):
#          C0             C1         C2
#     -----------------------------------------
#  R0 | "Programming" |  "is"  |   "awesome"  |
#     -----------------------------------------
#  R1 |      "I"      | "love" | "BIOSC 1540" |
#     -----------------------------------------

# I can then grab specific values from the cells! Say I want "I",
# which is the second (1st index) row, and the first (what index?) column.
item_i_want = the_awesome_array[1, 0]
print(item_i_want)

# BOOM!
# How would I grab "BIOSC 1540"? Let's think it through!
row_index = # What row is it in?
col_index = # what column is it in?
item_i_want = the_awesome_array[row_index, col_index]
print(item_i_want)

# AWESOME SAUCE! Let's do one more!
# How could I grab "awesome"?
item_i_want = # What do I put here?
print(item_i_want)

# WOOOHOOOO!!!!!
# Remember that this also applies for any N dimensions in an array, up to four
# quadrillion (or however big a number your computer can handle). We usually deal with 2D
# but if you're curious, you'd just keep comma-separating indices for each dimension.
        

But now, we can get even crazier. Because, a lot of the time, we don't just want one item -- we want several! We can do this with SLICING!! . Slicing is a powerful tool that lets us specify ranges of indices. The syntax for this is:

In [ ]:

         
            Copied!
           
         the_new_and_improved_array = np.array([0, 1, 2, 3, 4])
start_index = 1  # When do we want to start including items?
stop_index = (
    3  # At what index do we STOP including items -- this index wouldn't be included!
)

# We get items from start_index, until right before stop_index!
the_item_i_want = the_new_and_improved_array[start_index:stop_index]
# In math, the notation would be like [start_index, stop_index), if that clarifies things!

# ALSO! If you want ALL of a dimension, you can just use ":"! You could use this to
# grab all rows, but a separate specific column index, for example.

         the_new_and_improved_array = np.array([0, 1, 2, 3, 4])
start_index = 1  # When do we want to start including items?
stop_index = (
    3  # At what index do we STOP including items -- this index wouldn't be included!
)

# We get items from start_index, until right before stop_index!
the_item_i_want = the_new_and_improved_array[start_index:stop_index]
# In math, the notation would be like [start_index, stop_index), if that clarifies things!

# ALSO! If you want ALL of a dimension, you can just use ":"! You could use this to
# grab all rows, but a separate specific column index, for example.

Think of these slices as their own individual indices -- that means for indexing through several dimensions of our table, the same comma separation rule applies. Let's try some examples!

In [ ]:

         
           
            Copied!
           
           
            
            
            
            
           
          

        

         amazing_array = np.array([[10, 20, 30, 40], [11, 22, 33, 44], [1, 2, 3, 4]])
# 10 20 30 40
# 11 22 33 44
# 1  2  3  4

# Let's try getting some specific chunks of values!
# Let's get the lower four -- 11, 22, 1, and 2.
# This would be rows 2 and 3, columns 1 and 2
                # Start at row index 1, go until row index 3
                # Start at col index 0, go until col index 2
our_chunk = amazing_array[1:3, 0:2]

# How about the other lower four? 33, 44, 3, and 4?
# What rows would this be?
row_index_start = #
row_index_end = #

# What columns would this be?
col_index_start = #
col_index_end = #

our_chunk = amazing_array[row_index_start:row_index_end, "What goes here?"]

# Final test! This one's for all the marbles (not really)
# Let's get the central upper four -- 20, 30, 22, and 33
our_chunk = # What goes here? Break it down like above!

        

         amazing_array = np.array([[10, 20, 30, 40], [11, 22, 33, 44], [1, 2, 3, 4]])
# 10 20 30 40
# 11 22 33 44
# 1  2  3  4

# Let's try getting some specific chunks of values!
# Let's get the lower four -- 11, 22, 1, and 2.
# This would be rows 2 and 3, columns 1 and 2
                # Start at row index 1, go until row index 3
                # Start at col index 0, go until col index 2
our_chunk = amazing_array[1:3, 0:2]

# How about the other lower four? 33, 44, 3, and 4?
# What rows would this be?
row_index_start = #
row_index_end = #

# What columns would this be?
col_index_start = #
col_index_end = #

our_chunk = amazing_array[row_index_start:row_index_end, "What goes here?"]

# Final test! This one's for all the marbles (not really)
# Let's get the central upper four -- 20, 30, 22, and 33
our_chunk = # What goes here? Break it down like above!
        

Now lets tackle a problem from start to finish.

In [ ]:

         
           
            Copied!
           
           
            
            
            
            
           
          

        

         # Consider this array, representing blocks in New York City and their population density.
# Each individual cell is one block's population density.:
nyc_density = np.array([
    [80, 36, 41, 18, 74],
    [68, 1,  1,  99, 61],
    [10, 10, 11, 21, 100],
    [70, 72, 41, 57, 28],
    [1,  2,  3,  4,  5],
])

# We're building a park, and we're hoping to put it right smack in the middle -- we want to occupy the central 9 squares.
# But the first thing we need to know is who is in there? Print out this subset of the array.

# Consider what row indices the middle 9 squares would occupy
# Consider what col indices the middle 9 squares would occupy
# Put it together!
potential_park = # What would go here?
print(potential_park)

        

         # Consider this array, representing blocks in New York City and their population density.
# Each individual cell is one block's population density.:
nyc_density = np.array([
    [80, 36, 41, 18, 74],
    [68, 1,  1,  99, 61],
    [10, 10, 11, 21, 100],
    [70, 72, 41, 57, 28],
    [1,  2,  3,  4,  5],
])

# We're building a park, and we're hoping to put it right smack in the middle -- we want to occupy the central 9 squares.
# But the first thing we need to know is who is in there? Print out this subset of the array.

# Consider what row indices the middle 9 squares would occupy
# Consider what col indices the middle 9 squares would occupy
# Put it together!
potential_park = # What would go here?
print(potential_park)
        

Getting started with Matplotlib ¶

Matplotlib ( mpl ) has a pretty consistent syntax of plotting y's against x's, or y's against categories! We'll walk through the syntax below.

In [ ]:

         
            Copied!
           
         import matplotlib.pyplot as plt

         import matplotlib.pyplot as plt

In [ ]:

         
            Copied!
           
         # We'll have some array for our x values!
x = np.array([1, 2, 3, 4, 5])
# And some array for our y values!
y = np.array([10, 20, 15, 25, 30])

# We'll set up our fig/ax -- this allows us to easily do more complex stuff down the line (which we won't touch on right now)
fig, ax = plt.subplots()

# And we can plot! .plot here plots a line plot by default -- we can label it with label
ax.plot(x, y, label="Sales Over Time")  # Fill in the y-values

ax.set_xlabel("Month")
ax.set_ylabel("Sales")
ax.set_title("Monthly Sales Trend")
ax.legend()

plt.show()

         # We'll have some array for our x values!
x = np.array([1, 2, 3, 4, 5])
# And some array for our y values!
y = np.array([10, 20, 15, 25, 30])

# We'll set up our fig/ax -- this allows us to easily do more complex stuff down the line (which we won't touch on right now)
fig, ax = plt.subplots()

# And we can plot! .plot here plots a line plot by default -- we can label it with label
ax.plot(x, y, label="Sales Over Time")  # Fill in the y-values

ax.set_xlabel("Month")
ax.set_ylabel("Sales")
ax.set_title("Monthly Sales Trend")
ax.legend()

plt.show()

But the biggest thing to remember with mpl is that you definitely do not have to know specifically how to implement what plot you want upfront! There are a multitude of resources to consult, such as the Python graph library, stack overflow, or good old documentation! So let's search -- how can we try plotting a bar graph with categorical labels? To Google!

In [ ]: