QTM 151 - Introduction to Statistical Computing II

Installing Packages, Variables, and Lists

Danilo Freire

Emory University

09 September, 2024

Nice to see you all again! 😊

Brief Recap 📚

In the last lecture, we learned:

  • How git and GitHub work
  • Why they are important for reproducibility and collaboration
  • How to fork a repository, clone it, and push changes
  • How to run a Juptyer notebook on your computer
  • How to edit text and code cells
  • Feel free to email your assignments to me at
  • We will mark them and provide feedback on your work
  • Assignment 02 is already online as well :)

Questions? Please let us know!

Today’s Agenda

Installing packages and working with variables and lists

  • Python is a versatile programming language, but it doesn’t come with all the tools we need
  • Packages are collections of functions that extend Python’s capabilities
  • There are thousands of packages available, and we can install them using conda install
  • We will also learn about variables and lists
  • Variables are containers that store data values
  • Lists are collections of items that can be of different types
  • Today, we will learn how to create, access, and modify variables and lists

Any questions about installing Anaconda and Jupyter? 🐍

Installing Packages

  • There are several ways to install packages in Python
  • The two most common ways are pip and conda
  • pip is the Python package installer, which comes pre-installed with Python
  • conda is the package manager that comes with Anaconda, and it is even more user-friendly
  • We will use conda to install packages in this course
  • You can install packages using the command conda install package in the terminal or go to the Anaconda Navigator and install them from there
  • In Anaconda Navigator, you can search for packages in the “Environments” tab
  • The main packages we will use are:
    • numpy: for numerical computing
    • pandas: for data manipulation
    • matplotlib: for data visualisation
  • Try to install these packages on your computer!

Tip

Now let’s open Jupyter and start coding! 🚀

Creating a new notebook in VS Code

It’s easier than you think!

  • There are two easy ways to create a new notebook in VS Code
  • The first way is to click on “File” > “New File” and save it as a .ipynb file

  • The other way is to press Cmmd + Shift + P and type “Create: New Jupyter Notebook”
  • This is also how you can do many other things in VS Code
  • Then select the kernel you want to use (base)

Is everyone ready to start coding? 🤓

Loading packages

  • The first thing we need to do is to load the packages we installed
  • We can do this using the import command
  • For example, to load the numpy package, we use import numpy as np
  • This command loads the package and assigns it the alias np
  • We can then use the functions in the package by typing np.function()
  • Why np? Because it is a common alias for numpy. You can use any alias you want, but it is good practice to use common ones as they make your code more readable
  • Let’s load the matplotlib package as well
  • We will use this package to create plots later on
  • To load the matplotlib package, we use import matplotlib.pyplot as plt
  • Importing only the pyplot module instead of the entire matplotlib package allows us to access the plotting functions we need without importing unnecessary components, which can be more efficient

Let’s see how this work in practice!

Loading packages

  • Open the 03-variables-lists.ipynb notebook in VS Code (or create your own)
  • Select the Python kernel you want to use
  • To execute your code, press Shift + Enter or click on the “Run” button
  • If you want to create a new Python cell, press + Code in the toolbar
# Load the numpy and matplotlib packages
import numpy as np
import matplotlib.pyplot as plt
  • If you do not see any errors, you have successfully loaded the packages!

Variables and data types 📊

Variables

A container that stores data values

  • Variables can be of different types, such as:
    • integers: whole numbers
    • floats: numbers with decimals
    • strings: text
    • booleans: True or False
  • We identify the type of a variable using the type() function
  • We can use the print() function to display the value of a variable
type(3)
type(3.5)

print(type(3))
print(type(3.5))
print(type("Danilo's car"))
print(type(True))

# You can define strings using single or double quotes
type("hello")
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>
str

Store variables in memory

  • We can store variables in memory using the assignment operator =
  • For example, to store the value 3 in the variable x, we use x = 3
  • We can then access the value of x by typing x
  • Please note:
    • Variable names are case-sensitive
    • Variable names cannot start with a number
    • Variable names cannot contain spaces
    • Variable names cannot contain special characters, except for _
  • Click “Variables” in the top panel of Jupyter Notebooks
  • Install the Data Wranger extension to see the values of your variables
number3 = 3
number3andhalf = 3.5
message_hello = "hello"

number3
3
# Use the print function to display the value of a variable
print(number3)
print(number3andhalf)
3
3.5
  • Now try it yourself! Create a variable with your favourite movie Appendix 01

Basic operations with variables

  • We can perform basic operations with variables:
  • Addition: +, Subtraction: -, Multiplication: *, Division: /, Exponentiation: **
  • If you try this on a string it will not work
    • Try it! Type print("QTM" + 151)
print(3*2)
print(3+2)
print(3-2)
print(3/2)
print(3**2)
6
5
1
1.5
9
  • Use parentheses for order of operations
(3 + 4) / 5
1.4
(number3 + 4)/5
1.4
  • Concatenation “adds” two strings:
name = str("Danilo")

"My name is" + " "  + name
'My name is Danilo'
  • Try it yourself! Define a variable with your name, define a new variable with your major, and print a concantenated string with your name and major Appendix 02

Lists 📝

Lists

  • Lists are collections of items
  • We can store different types of items in a list
  • Lists are always enclosed in square brackets []
  • Elements are separated by commas ,
  • We can access elements in a list using their index
  • Indexes start at 0
# List of numbers
list_numbers = [1,2,3,4,5]
list_numbers_sqr = [1,4,9,16,25]
print(list_numbers)
print(type(list_numbers))
[1, 2, 3, 4, 5]
<class 'list'>
# List with strings
# Example: Suppose you ask 5 people about their favorite colour. 
# The results:
list_colours = ["red","yellow","yellow", "green","red"]
print(list_colours)

# List with mixed types
list_mixed = ["red",1,"yellow",4,5, 3.5]

# Lists can be nested too
another_list = [list_mixed, 3, 'h']
['red', 'yellow', 'yellow', 'green', 'red']

Extracting elements from a list

Remember, indexes start at 0! (Yes, it’s annoying!)

  • Use square brackets [] to access elements in a list
  • For instance, to access the first element in a list, we use list[0]
floors_england = ["ground", "floor1", "floor2"]

floors_england[0]
floors_england[1]
'floor1'
print(another_list)
print(another_list[0])

print(another_list[0][2]) # What will this return? And why?
[['red', 1, 'yellow', 4, 5, 3.5], 3, 'h']
['red', 1, 'yellow', 4, 5, 3.5]
yellow
  • Now try it yourself!
  • Create a list with your three favourite movies and print the last one Appendix 03

Visualising data with matplotlib 📊

Visualising lists with histograms

  • We can use the matplotlib package to create plots
  • The hist() function creates a histogram
  • We can pass a list as an argument to the hist() function
  • We can also customise the plot by adding labels, titles, and changing the colour (more on that later)
  • You print the graph by using the show() function


  • Try it yourself!
  • Create a list with repeated string values and compute your own histogram Appendix 04
# Create a new list
list_list = list_colours + ['red']
print(list_list)

# Create a histogram of the list of numbers
plt.hist(x = list_list)
plt.show()
['red', 'yellow', 'yellow', 'green', 'red', 'red']

Scatter plots

  • We can also create scatter plots using the scatter() function
  • The scatter() function takes two lists as arguments
    • The first list contains the x-coordinates
    • The second list contains the y-coordinates
  • We use them to visualise the relationship between two continuous variables
  • Here, we will use the xlabel() and ylabel() functions to label the axes
print(list_numbers)
print(list_numbers_sqr)

# Create a scatter plot
plt.scatter(x = list_numbers, y = list_numbers_sqr)
plt.xlabel("A meaningful name for the X-axis") 
plt.ylabel("Favourite name for Y-axis") 
plt.show()
[1, 2, 3, 4, 5]
[1, 4, 9, 16, 25]

Scatter plots

  • Try it yourself!
  • Create two lists with numbers, then create your own scatter plot Appendix 05

And that’s it for today! 🎉

Summary

  • Today we larned to:
    • Install packages using conda install
    • Load packages using the import command
    • Create variables and lists
    • Access and modify variables and lists
    • Create histograms and scatter plots using the matplotlib package
  • Next time, we will learn how to:
    • Solve mathematical problems using numpy
    • Generate random numbers
    • (Maybe) do some matrix operations

Any questions? 🤔

Appendix 01

Create a variable with your favourite movie

movie = "The Godfather"

Back to the main text

Appendix 02

Define a variable with your name and major

name = "Danilo"
major = "QSS"

print("My name is " + name + " and I am majoring in " + major)
My name is Danilo and I am majoring in QSS

Back to the main text

Appendix 03

Create a list with your three favourite movies

movies = ["The Godfather", "The Godfather II", "The Godfather III"]
print(movies[2])
The Godfather III

Back to the main text

Appendix 04

Create a list with repeated string values and compute your own histogram

favourite_books = ["The Odyssey", "Don Quijote", "The Illiad", "The Odyssey", "The Illiad", "The Illiad"]
plt.hist(x = favourite_books)
plt.show()

Back to the main text

Appendix 05

Create two lists with numbers, then create your own scatter plot

list_x = [5, 10, 15, 20, 25]
list_y = [10, 20, 30, 40, 50]

plt.scatter(x = list_x, y = list_y)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

Back to the main text