\[ \hat{y} = \sgn{\vec{w}^T \vec{x} + b} = \sgn{\sum^{(i)} w^{(i)} x^{(i)} + b}\]
\[ \hat{y} = \sgn{\vec{w}^T\vec{x} + b}\] \[ \min_{\vec{w} \in \mathbb{R}^p} \sum_{i=1}^m \mathbf{1}_{\{ y^{(i)} \ne \sgn{w^T\vec{x}^{(i)} + b} \}} \]
\[ \ln\left( \frac{p(y=1 | \vec{x})}{p(y=0 | \vec{x})} \right) = \vec{w}^T \vec{x} \] \[ p(y | \vec{x}) = \frac{1}{1 + \e^{-\vec{w}^T\vec{x}-b}} \] \[ \minw \sum_{i=1}^m \ln{(\exp{(-y^{(i)}\vec{w}^T\vec{x}^{(i)}+b)} + 1)} \] \[ \hat{y} = \sgn{\vec{w}^T\vec{x} + b} \]
\[ \minw C \summ \logloss + \ltwo{\vec{w}}^2 \] \[ \minw C \summ \logloss + \lone{\vec{w}} \]
\[ \minw C \summ \logloss + \ltwo{\vec{w}}^2 \] \[ \minw C \summ \logloss + \lone{\vec{w}} \]
\[ \minw C \summ \max(0, 1-y^{(i)}(\vec{w}^T \vec{x}^{(i)}+b)) + \ltwo{\vec{w}}^2 \] \[ \minw C \summ \max(0, 1-y^{(i)}(\vec{w}^T \vec{x}^{(i)}+b)) + \lone{\vec{w}} \]
\[ \minw C \summ \max(0, 1-y^{(i)}(\vec{w}^T \vec{x}^{(i)}+b)) + \ltwo{\vec{w}}^2 \] \[ \minw C \summ \max(0, 1-y^{(i)}(\vec{w}^T \vec{x}^{(i)}+b)) + \lone{\vec{w}} \]
\[ \minw C \summ \logloss + \ltwo{\vec{w}}^2 \] \[ \minw C \summ \max(0, 1-y^{(i)}(\vec{w}^T \vec{x}^{(i)}+b)) + \ltwo{\vec{w}}^2 \]
\[ p(y=i | \vec{x}) = \frac{\e^{(\vec{w}^{(i)})^T \vec{x} + b^{(j)}}}{\sum_{j=1}^k \e^{(\vec{w}^{(j)})^T \vec{x} + b^{(j)}}} \] \[ \min_{\vec{w}\in\mathbb{R}^{pk},b\in\mathbb{R}^k} \sum_{i=1}^m \ln(p(y=y^{(i)} | x^{(i)})) \] \[ \hat{y} = \text{argmax}_{i \in 1,\ldots,k} ((\vec{w}^{(i)})^T \vec{x} + b^{(i)}) \]
LogisticRegression
clf.decision_function
\(=\vec{w}^T \vec{x}\)clf.predict_proba
gives probabilities for each classSVC(probability=True)
not greatiris = load_iris()
X,y = iris.data, iris.target
print(X.shape)
print(np.bincount(y))
logreg = LogisticRegression(multi_class="multinomial",
random_state=0,
solver="lbfgs").fit(X,y)
linearsvm = LinearSVC().fit(X,y)
print(logreg.coef_.shape)
print(linearsvm.coef_.shape)
(150, 4) [50 50 50] (3, 4) (3, 4)
print(logreg.coef_)
[[-0.41815181 0.96640966 -2.52143555 -1.08402204] [ 0.53103513 -0.31447032 -0.19924552 -0.94919389] [-0.11288332 -0.65193934 2.72068108 2.03321592]]
(for linear models)
SVC(kernel='linear')
, use LinearSVC
LinearSVC, LogisticRegression
when \(m >> p\): dual=False
LogisticRegression(solver="sag")
when \(m\) is really big (100,000+)