Python Libraries for ML

08/29/22

Advanced Python

List comprehensions
Doc strings and help()
Exceptions

Bad Imports

from matplotlib.pyplot import *
from statistics import *
from numpy import *

Bad imports!

Hide functionality
Where did that function come from?

Good imports

import matplotlib.pyplot as plt
import statistics as stat
import numpy as np

Essential Libraries

Anaconda

Nice distribution that includes:

Scikit-learn
Numpy
Scipy
matplotlib
pandas
Jupyter Notebook

Scikit-learn

Widely used, open source machine learning library
You will need to read the documentation sometimes
http://scikit-learn.org

Numpy

Important library for scientific computing
Data structures like multidimensional arrays
Lots of linear algebra functions
Random number generators
etc.

Numpy arrays

import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print("x:\n{}".format(x))

x:
[[1 2 3]
 [4 5 6]]

Scipy

Collection of scientific computing functions
Advanced linear algebra
function optimization
Signal processing
Statistical distributions

Scipy sparse matrices

from scipy import sparse

# 2D NumPy array, identity matrix
eye = np.eye(4)
print("NumPy array:\n{}".format(eye))

NumPy array:
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Scipy sparse matrices

sm = sparse.csr_matrix(eye)
print("SciPy sparse CSR matrix:\n{}".format(sm))

SciPy sparse CSR matrix:
  (0, 0)	1.0
  (1, 1)	1.0
  (2, 2)	1.0
  (3, 3)	1.0

matplotlib

Scientific plotting library
Line charts, histograms, scatter plots, etc.

Line plot

import numpy as np
import matplotlib.pyplot as plt

# Generate sequence from -10 to 10, 100 steps in between
x = np.linspace(-10, 10, 100)
y = np.sin(x)
plt.plot(x, y, marker="x")

Pandas

Data "wrangling" and analysis library
DataFrame data structure, modeled on R's DataFrame
Basically a table, like an Excel spreadsheet as a varaible
pandas can read in many file formats, e.g., SQL, Excel files, CSV

Tables

import pandas as pd
# create simple dataset
data = {'Name': ["John", "Anna", "Peter", "Linda"],
      	'Location': ["New York", "Paris", "Berlin", "London"],
	      'Age': [24, 13, 53, 33]
       }

data_pandas = pd.DataFrame(data)
data_pandas

	Name	Location	Age
0	John	New York	24
1	Anna	Paris	13
2	Peter	Berlin	53
3	Linda	London	33

Tables

data_pandas[data_pandas.Age > 30]

	Name	Location	Age
2	Peter	Berlin	53
3	Linda	London	33

Jupyter Notebook

Interactive environment for editing/running code
Mix text and code
Run code and display results immediately