Python Libraries for ML

08/29/22

Advanced Python

  • List comprehensions
  • Doc strings and help()
  • Exceptions

Bad Imports

from matplotlib.pyplot import *
from statistics import *
from numpy import *

Bad imports!

  • Hide functionality
  • Where did that function come from?

Good imports

import matplotlib.pyplot as plt
import statistics as stat
import numpy as np

Essential Libraries

Anaconda

Nice distribution that includes:

  • Scikit-learn
  • Numpy
  • Scipy
  • matplotlib
  • pandas
  • Jupyter Notebook

Scikit-learn

  • Widely used, open source machine learning library
  • You will need to read the documentation sometimes
  • http://scikit-learn.org

Numpy

  • Important library for scientific computing
  • Data structures like multidimensional arrays
  • Lots of linear algebra functions
  • Random number generators
  • etc.

Numpy arrays

import numpy as np
x = np.array([[1,2,3],[4,5,6]])
print("x:\n{}".format(x))
x:
[[1 2 3]
 [4 5 6]]

Scipy

  • Collection of scientific computing functions
  • Advanced linear algebra
  • function optimization
  • Signal processing
  • Statistical distributions

Scipy sparse matrices

from scipy import sparse

# 2D NumPy array, identity matrix
eye = np.eye(4)
print("NumPy array:\n{}".format(eye))
NumPy array:
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

Scipy sparse matrices

sm = sparse.csr_matrix(eye)
print("SciPy sparse CSR matrix:\n{}".format(sm))
SciPy sparse CSR matrix:
  (0, 0)	1.0
  (1, 1)	1.0
  (2, 2)	1.0
  (3, 3)	1.0

matplotlib

  • Scientific plotting library
  • Line charts, histograms, scatter plots, etc.

Line plot

import numpy as np
import matplotlib.pyplot as plt

# Generate sequence from -10 to 10, 100 steps in between
x = np.linspace(-10, 10, 100)
y = np.sin(x)
plt.plot(x, y, marker="x")

lineplot.png

Pandas

  • Data "wrangling" and analysis library
  • DataFrame data structure, modeled on R's DataFrame
  • Basically a table, like an Excel spreadsheet as a varaible
  • pandas can read in many file formats, e.g., SQL, Excel files, CSV

Tables

import pandas as pd
# create simple dataset
data = {'Name': ["John", "Anna", "Peter", "Linda"],
      	'Location': ["New York", "Paris", "Berlin", "London"],
	      'Age': [24, 13, 53, 33]
       }

data_pandas = pd.DataFrame(data)
data_pandas
Name Location Age
0 John New York 24
1 Anna Paris 13
2 Peter Berlin 53
3 Linda London 33

Tables

data_pandas[data_pandas.Age > 30]
Name Location Age
2 Peter Berlin 53
3 Linda London 33

Jupyter Notebook

  • Interactive environment for editing/running code
  • Mix text and code
  • Run code and display results immediately