svm¶
ai.svm
¶
Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.
The advantages of support vector machines are:
- Effective in high dimensional spaces.
- Still effective in cases where number of dimensions is greater than the number of samples.
- Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
- Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.
The disadvantages of support vector machines include:
- If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
- SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).
Types of SVMs:
-
Linear SVM
-
Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.
-
Non-Linear SVM
-
Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly separable data. The original input data is transformed by these kernel functions into a higher-dimensional feature space, where the data points can be linearly separated. A linear SVM is used to locate a nonlinear decision boundary in this modified space.
The followings are the SVMs implementations that ai includes:
ai.svm.classes.LinearSVC
LinearSVC
¶
Linear SVC classifier implementation.
Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.
The objective of training a LinearSVC is to minimize the norm of the weight
vector \(||w||\), which is the slope of the decision function.
Note:
We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.
The cost function used by Linear SVM classifier is the following:
The loss function used is the Hinge Loss function that clips the value at \(0\):
Source code in ai/svm/classes.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
__init__(*, alpha=0.01, lambda_p=0.01, n_iters=1000)
¶
Initializes model's alpha, lambda parameter and number of
iterations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
Optional[float16]
|
Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn. |
0.01
|
lambda_p
|
Optional[float32]
|
Lambda parameter for updating weights. |
0.01
|
n_iters
|
Optional[int64]
|
Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss. |
1000
|
Source code in ai/svm/classes.py
fit(X, y)
¶
Fit LinearSVC according to X, y.
The objective of training a LinearSVC is to minimize the norm of the
weight vector \(||w||\), which is the slope of the decision function.
Note:
We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.
The cost function used by Linear SVM classifier is the following:
The loss function used is the Hinge Loss function that clips the value at \(0\):
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training vectors, where |
required |
y
|
ndarray
|
Target values. |
required |
Returns:
| Type | Description |
|---|---|
LinearSVC
|
Returns the instance itself. |
Source code in ai/svm/classes.py
predict(X)
¶
Predict for X using the previously calculated weights and bias.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature vector. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Target vector. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
ValueError
|
If shape of the given |
Source code in ai/svm/classes.py
classes
¶
Linear Support Vector Classification.
LinearSVC
¶
Linear SVC classifier implementation.
Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.
The objective of training a LinearSVC is to minimize the norm of the weight
vector \(||w||\), which is the slope of the decision function.
Note:
We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.
The cost function used by Linear SVM classifier is the following:
The loss function used is the Hinge Loss function that clips the value at \(0\):
Source code in ai/svm/classes.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | |
__init__(*, alpha=0.01, lambda_p=0.01, n_iters=1000)
¶
Initializes model's alpha, lambda parameter and number of
iterations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alpha
|
Optional[float16]
|
Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn. |
0.01
|
lambda_p
|
Optional[float32]
|
Lambda parameter for updating weights. |
0.01
|
n_iters
|
Optional[int64]
|
Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss. |
1000
|
Source code in ai/svm/classes.py
fit(X, y)
¶
Fit LinearSVC according to X, y.
The objective of training a LinearSVC is to minimize the norm of the
weight vector \(||w||\), which is the slope of the decision function.
Note:
We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.
The cost function used by Linear SVM classifier is the following:
The loss function used is the Hinge Loss function that clips the value at \(0\):
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training vectors, where |
required |
y
|
ndarray
|
Target values. |
required |
Returns:
| Type | Description |
|---|---|
LinearSVC
|
Returns the instance itself. |
Source code in ai/svm/classes.py
predict(X)
¶
Predict for X using the previously calculated weights and bias.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature vector. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Target vector. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If |
ValueError
|
If shape of the given |
Source code in ai/svm/classes.py
kernels
¶
Common kernel implementations for Support Vector Machines.
grbf(x, x_prime, sigma)
¶
Implements Gaussian Radial Basis Function (RBF).
The Gaussian RBF is a commonly used kernel function in Support Vector Machines and other machine learning algorithms. It's defined as:
Here's the derivation:
- The Gaussian RBF is defined as a function of two data points, \(x\) and \(x'\), with a parameter \(\sigma\) that controls the width of the kernel.
- We start by computing the euclidean distance between the two data points \(x\) and \(x'\):
Now, let's insert this distance into the Gaussian RBF formula:
We have an exponential term, and we can simplify it further:
We can further generalize it using the property \(e^{a + b}\) = \(e^a * e ^b\), we can separate the exponential factors:
This is the mathematical implementation of the Gaussian RBF. It measures the similarity between two data points \(x\) and \(x'\) based on the Euclidean distance between them, with the parameter \(\sigma\) controlling the width of the kernel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
The numpy vector \(x\). |
required |
x_prime
|
ndarray
|
The numpy vector \(x'\). |
required |
sigma
|
float32
|
The paramter \(\sigma\) to control the width of the kernel. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
RBF value. |
Examples:
Example data points¶
x1 = np.array([1, 2, 3]) x2 = np.array([4, 5, 6]) ...
Set the width parameter¶
sigma = 1.0 ...
Calculate the RBF value¶
rbf_result = grbf(x1, x2, sigma) print("Gaussian RBF Value:", rbf_result)