svm¶

`ai.svm` ¶

Support vector machines (SVMs) are a set of supervised learning methods used for classification, regression and outliers detection.

The advantages of support vector machines are:

Effective in high dimensional spaces.
Still effective in cases where number of dimensions is greater than the number of samples.
Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
Versatile: different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

The disadvantages of support vector machines include:

If the number of features is much greater than the number of samples, avoid over-fitting in choosing Kernel functions and regularization term is crucial.
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation (see Scores and probabilities, below).

Types of SVMs:

Linear SVM
Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.
Non-Linear SVM
Non-Linear SVM can be used to classify data when it cannot be separated into two classes by a straight line (in the case of 2D). By using kernel functions, nonlinear SVMs can handle nonlinearly separable data. The original input data is transformed by these kernel functions into a higher-dimensional feature space, where the data points can be linearly separated. A linear SVM is used to locate a nonlinear decision boundary in this modified space.

The followings are the SVMs implementations that ai includes:

ai.svm.classes.LinearSVC

`LinearSVC` ¶

Linear SVC classifier implementation.

Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.

The objective of training a LinearSVC is to minimize the norm of the weight vector \(||w||\), which is the slope of the decision function.

\[\min_{{w, b}} \dfrac{1}{2} w^{T}w = \dfrac{1}{2} ||w||^{2}\]

\[\mathrm{subject\ to}\ t^{i} (w^{T}x^{i} + b) \ge 1\]

Note:

We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.

The cost function used by Linear SVM classifier is the following:

\[J(w, b) = \dfrac{1}{2}w^{T}w + C\sum_{i=1}^{m}\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

The loss function used is the Hinge Loss function that clips the value at \(0\):

\[\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

Source code in ai/svm/classes.py

class LinearSVC:
  """**Linear SVC classifier** implementation.

  Linear SVMs use a linear decision boundary to separate the data points of
  different classes. When the data can be precisely linearly separated, linear
  SVMs are very suitable. This means that a single straight line (in 2D) or a
  hyperplane (in higher dimensions) can entirely divide the data points into
  their respective classes. A hyperplane that maximizes the margin between the
  classes is the decision boundary.

  The objective of training a `LinearSVC` is to minimize the norm of the weight
  vector $||w||$, which is the slope of the decision function.

  $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

  $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

  Note:

    We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
    $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
    Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
    (just $w$) while $||w||$ is not differentiable at $w = 0$.
    Optimization algorithms work much better on differentiable functions.

  The cost function used by **Linear SVM classifier** is the following:

  $$J(w, b) = \\dfrac{1}{2}w^{T}w +
  C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  The loss function used is the **Hinge Loss** function that clips the value at
  $0$:

  $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$
  """

  def __init__(
    self,
    *,
    alpha: Optional[np.float16] = .01,
    lambda_p: Optional[np.float32] = .01,
    n_iters: Optional[np.int64] = 1_000
  ):
    """Initializes model's `alpha`, `lambda parameter` and number of
    `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      lambda_p: Lambda parameter for updating weights.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._alpha = alpha
    self._lambda_p = lambda_p
    self._n_iters = n_iters

    self._bias = None
    self._weights = None

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearSVC':  # pylint: disable=invalid-name
    """Fit `LinearSVC` according to X, y.

    The objective of training a `LinearSVC` is to minimize the norm of the
    weight vector $||w||$, which is the slope of the decision function.

    $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

    $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

    Note:

      We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
      $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
      Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
      (just $w$) while $||w||$ is not differentiable at
      $w = 0$. Optimization algorithms work much better on differentiable
      functions.

    The cost function used by **Linear SVM classifier** is the following:

    $$J(w, b) = \\dfrac{1}{2}w^{T}w +
    C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

    The loss function used is the **Hinge Loss** function that clips the value
    at $0$:

    $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target values.

    Returns:
      Returns the instance itself.
    """
    self._bias = 0
    self._weights = np.random.rand(X.shape[1])

    # We need the decision function to be greater than 1 for all positive
    # training instances, and lower than -1 for all negative training instances.
    # If we define ``t_i = -1`` for negative instances, and ``t_i = 1`` for
    # positive instances, then we can acheive less margin violation.
    t = np.where(y <= 0, -1, 1)
    for _ in range(self._n_iters):
      for idx, x_i in enumerate(X):
        # Since, cost function is different at different decision boundaries we
        # calculate weights and bias differently.
        if t[idx] * (np.dot(x_i, self._weights) - self._bias) >= 1:
          self._weights -= self._alpha * (2 * self._lambda_p * self._weights)
        else:
          self._weights -= self._alpha * (
            2 * self._lambda_p * self._weights - np.dot(t[idx], x_i)
          )
          self._bias -= self._alpha * t[idx]
    return self

  def predict(self, X: np.ndarray) -> np.ndarray:  # pylint: disable=invalid-name
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    return np.sign(np.dot(X, self._weights) - self._bias)

`init(*, alpha=0.01, lambda_p=0.01, n_iters=1000)` ¶

Initializes model's alpha, lambda parameter and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`Optional[float16]`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`lambda_p`	`Optional[float32]`	Lambda parameter for updating weights.	`0.01`
`n_iters`	`Optional[int64]`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/svm/classes.py

def __init__(
  self,
  *,
  alpha: Optional[np.float16] = .01,
  lambda_p: Optional[np.float32] = .01,
  n_iters: Optional[np.int64] = 1_000
):
  """Initializes model's `alpha`, `lambda parameter` and number of
  `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    lambda_p: Lambda parameter for updating weights.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._alpha = alpha
  self._lambda_p = lambda_p
  self._n_iters = n_iters

  self._bias = None
  self._weights = None

`fit(X, y)` ¶

Fit LinearSVC according to X, y.

The objective of training a LinearSVC is to minimize the norm of the weight vector \(||w||\), which is the slope of the decision function.

\[\min_{{w, b}} \dfrac{1}{2} w^{T}w = \dfrac{1}{2} ||w||^{2}\]

\[\mathrm{subject\ to}\ t^{i} (w^{T}x^{i} + b) \ge 1\]

Note:

We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.

The cost function used by Linear SVM classifier is the following:

\[J(w, b) = \dfrac{1}{2}w^{T}w + C\sum_{i=1}^{m}\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

The loss function used is the Hinge Loss function that clips the value at \(0\):

\[\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Type	Description
`LinearSVC`	Returns the instance itself.

Source code in ai/svm/classes.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearSVC':  # pylint: disable=invalid-name
  """Fit `LinearSVC` according to X, y.

  The objective of training a `LinearSVC` is to minimize the norm of the
  weight vector $||w||$, which is the slope of the decision function.

  $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

  $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

  Note:

    We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
    $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
    Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
    (just $w$) while $||w||$ is not differentiable at
    $w = 0$. Optimization algorithms work much better on differentiable
    functions.

  The cost function used by **Linear SVM classifier** is the following:

  $$J(w, b) = \\dfrac{1}{2}w^{T}w +
  C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  The loss function used is the **Hinge Loss** function that clips the value
  at $0$:

  $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target values.

  Returns:
    Returns the instance itself.
  """
  self._bias = 0
  self._weights = np.random.rand(X.shape[1])

  # We need the decision function to be greater than 1 for all positive
  # training instances, and lower than -1 for all negative training instances.
  # If we define ``t_i = -1`` for negative instances, and ``t_i = 1`` for
  # positive instances, then we can acheive less margin violation.
  t = np.where(y <= 0, -1, 1)
  for _ in range(self._n_iters):
    for idx, x_i in enumerate(X):
      # Since, cost function is different at different decision boundaries we
      # calculate weights and bias differently.
      if t[idx] * (np.dot(x_i, self._weights) - self._bias) >= 1:
        self._weights -= self._alpha * (2 * self._lambda_p * self._weights)
      else:
        self._weights -= self._alpha * (
          2 * self._lambda_p * self._weights - np.dot(t[idx], x_i)
        )
        self._bias -= self._alpha * t[idx]
  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/svm/classes.py

def predict(self, X: np.ndarray) -> np.ndarray:  # pylint: disable=invalid-name
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  return np.sign(np.dot(X, self._weights) - self._bias)

`classes` ¶

Linear Support Vector Classification.

`LinearSVC` ¶

Linear SVC classifier implementation.

Linear SVMs use a linear decision boundary to separate the data points of different classes. When the data can be precisely linearly separated, linear SVMs are very suitable. This means that a single straight line (in 2D) or a hyperplane (in higher dimensions) can entirely divide the data points into their respective classes. A hyperplane that maximizes the margin between the classes is the decision boundary.

The objective of training a LinearSVC is to minimize the norm of the weight vector \(||w||\), which is the slope of the decision function.

\[\min_{{w, b}} \dfrac{1}{2} w^{T}w = \dfrac{1}{2} ||w||^{2}\]

\[\mathrm{subject\ to}\ t^{i} (w^{T}x^{i} + b) \ge 1\]

Note:

We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.

The cost function used by Linear SVM classifier is the following:

\[J(w, b) = \dfrac{1}{2}w^{T}w + C\sum_{i=1}^{m}\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

The loss function used is the Hinge Loss function that clips the value at \(0\):

\[\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

Source code in ai/svm/classes.py

class LinearSVC:
  """**Linear SVC classifier** implementation.

  Linear SVMs use a linear decision boundary to separate the data points of
  different classes. When the data can be precisely linearly separated, linear
  SVMs are very suitable. This means that a single straight line (in 2D) or a
  hyperplane (in higher dimensions) can entirely divide the data points into
  their respective classes. A hyperplane that maximizes the margin between the
  classes is the decision boundary.

  The objective of training a `LinearSVC` is to minimize the norm of the weight
  vector $||w||$, which is the slope of the decision function.

  $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

  $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

  Note:

    We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
    $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
    Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
    (just $w$) while $||w||$ is not differentiable at $w = 0$.
    Optimization algorithms work much better on differentiable functions.

  The cost function used by **Linear SVM classifier** is the following:

  $$J(w, b) = \\dfrac{1}{2}w^{T}w +
  C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  The loss function used is the **Hinge Loss** function that clips the value at
  $0$:

  $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$
  """

  def __init__(
    self,
    *,
    alpha: Optional[np.float16] = .01,
    lambda_p: Optional[np.float32] = .01,
    n_iters: Optional[np.int64] = 1_000
  ):
    """Initializes model's `alpha`, `lambda parameter` and number of
    `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      lambda_p: Lambda parameter for updating weights.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._alpha = alpha
    self._lambda_p = lambda_p
    self._n_iters = n_iters

    self._bias = None
    self._weights = None

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearSVC':  # pylint: disable=invalid-name
    """Fit `LinearSVC` according to X, y.

    The objective of training a `LinearSVC` is to minimize the norm of the
    weight vector $||w||$, which is the slope of the decision function.

    $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

    $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

    Note:

      We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
      $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
      Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
      (just $w$) while $||w||$ is not differentiable at
      $w = 0$. Optimization algorithms work much better on differentiable
      functions.

    The cost function used by **Linear SVM classifier** is the following:

    $$J(w, b) = \\dfrac{1}{2}w^{T}w +
    C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

    The loss function used is the **Hinge Loss** function that clips the value
    at $0$:

    $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target values.

    Returns:
      Returns the instance itself.
    """
    self._bias = 0
    self._weights = np.random.rand(X.shape[1])

    # We need the decision function to be greater than 1 for all positive
    # training instances, and lower than -1 for all negative training instances.
    # If we define ``t_i = -1`` for negative instances, and ``t_i = 1`` for
    # positive instances, then we can acheive less margin violation.
    t = np.where(y <= 0, -1, 1)
    for _ in range(self._n_iters):
      for idx, x_i in enumerate(X):
        # Since, cost function is different at different decision boundaries we
        # calculate weights and bias differently.
        if t[idx] * (np.dot(x_i, self._weights) - self._bias) >= 1:
          self._weights -= self._alpha * (2 * self._lambda_p * self._weights)
        else:
          self._weights -= self._alpha * (
            2 * self._lambda_p * self._weights - np.dot(t[idx], x_i)
          )
          self._bias -= self._alpha * t[idx]
    return self

  def predict(self, X: np.ndarray) -> np.ndarray:  # pylint: disable=invalid-name
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    return np.sign(np.dot(X, self._weights) - self._bias)

`init(*, alpha=0.01, lambda_p=0.01, n_iters=1000)` ¶

Initializes model's alpha, lambda parameter and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`Optional[float16]`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`lambda_p`	`Optional[float32]`	Lambda parameter for updating weights.	`0.01`
`n_iters`	`Optional[int64]`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/svm/classes.py

def __init__(
  self,
  *,
  alpha: Optional[np.float16] = .01,
  lambda_p: Optional[np.float32] = .01,
  n_iters: Optional[np.int64] = 1_000
):
  """Initializes model's `alpha`, `lambda parameter` and number of
  `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    lambda_p: Lambda parameter for updating weights.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._alpha = alpha
  self._lambda_p = lambda_p
  self._n_iters = n_iters

  self._bias = None
  self._weights = None

`fit(X, y)` ¶

Fit LinearSVC according to X, y.

The objective of training a LinearSVC is to minimize the norm of the weight vector \(||w||\), which is the slope of the decision function.

\[\min_{{w, b}} \dfrac{1}{2} w^{T}w = \dfrac{1}{2} ||w||^{2}\]

\[\mathrm{subject\ to}\ t^{i} (w^{T}x^{i} + b) \ge 1\]

Note:

We are minimizing \(\dfrac{1}{2}w^{T}w\), which is equal to \(\dfrac{1}{2}||w||^{2}\), rather than minimizing \(||w||\). Indeed, \(\dfrac{1}{2}||w||^{2}\) has a nice and simple derivative (just \(w\)) while \(||w||\) is not differentiable at \(w = 0\). Optimization algorithms work much better on differentiable functions.

The cost function used by Linear SVM classifier is the following:

\[J(w, b) = \dfrac{1}{2}w^{T}w + C\sum_{i=1}^{m}\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

The loss function used is the Hinge Loss function that clips the value at \(0\):

\[\max(0, 1 - t^{i}(w^{T}x^{i} + b))\]

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Type	Description
`LinearSVC`	Returns the instance itself.

Source code in ai/svm/classes.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearSVC':  # pylint: disable=invalid-name
  """Fit `LinearSVC` according to X, y.

  The objective of training a `LinearSVC` is to minimize the norm of the
  weight vector $||w||$, which is the slope of the decision function.

  $$\\min_{{w, b}} \\dfrac{1}{2} w^{T}w = \\dfrac{1}{2} ||w||^{2}$$

  $$\\mathrm{subject\\ to}\\ t^{i} (w^{T}x^{i} + b) \\ge 1$$

  Note:

    We are minimizing $\\dfrac{1}{2}w^{T}w$, which is equal to
    $\\dfrac{1}{2}||w||^{2}$, rather than minimizing $||w||$.
    Indeed, $\\dfrac{1}{2}||w||^{2}$ has a nice and simple derivative
    (just $w$) while $||w||$ is not differentiable at
    $w = 0$. Optimization algorithms work much better on differentiable
    functions.

  The cost function used by **Linear SVM classifier** is the following:

  $$J(w, b) = \\dfrac{1}{2}w^{T}w +
  C\\sum_{i=1}^{m}\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  The loss function used is the **Hinge Loss** function that clips the value
  at $0$:

  $$\\max(0, 1 - t^{i}(w^{T}x^{i} + b))$$

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target values.

  Returns:
    Returns the instance itself.
  """
  self._bias = 0
  self._weights = np.random.rand(X.shape[1])

  # We need the decision function to be greater than 1 for all positive
  # training instances, and lower than -1 for all negative training instances.
  # If we define ``t_i = -1`` for negative instances, and ``t_i = 1`` for
  # positive instances, then we can acheive less margin violation.
  t = np.where(y <= 0, -1, 1)
  for _ in range(self._n_iters):
    for idx, x_i in enumerate(X):
      # Since, cost function is different at different decision boundaries we
      # calculate weights and bias differently.
      if t[idx] * (np.dot(x_i, self._weights) - self._bias) >= 1:
        self._weights -= self._alpha * (2 * self._lambda_p * self._weights)
      else:
        self._weights -= self._alpha * (
          2 * self._lambda_p * self._weights - np.dot(t[idx], x_i)
        )
        self._bias -= self._alpha * t[idx]
  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/svm/classes.py

def predict(self, X: np.ndarray) -> np.ndarray:  # pylint: disable=invalid-name
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  return np.sign(np.dot(X, self._weights) - self._bias)

`kernels` ¶

Common kernel implementations for Support Vector Machines.

`grbf(x, x_prime, sigma)` ¶

Implements Gaussian Radial Basis Function (RBF).

The Gaussian RBF is a commonly used kernel function in Support Vector Machines and other machine learning algorithms. It's defined as:

\[K(x, x') = e^{-\dfrac{||x - x'||^{2}}{2\sigma^{2}}}\]

Here's the derivation:

The Gaussian RBF is defined as a function of two data points, \(x\) and \(x'\), with a parameter \(\sigma\) that controls the width of the kernel.
We start by computing the euclidean distance between the two data points \(x\) and \(x'\):

\[||x - x'||^{2} = \sum_{i=1}^{n}(x_{i} - x'_{i})^2\]

Now, let's insert this distance into the Gaussian RBF formula:

\[K(x, x') = e^{-\dfrac{||x - x'||^{2}}{2\sigma^{2}}}\]

We have an exponential term, and we can simplify it further:

\[e^{-\dfrac{||x - x'||^{2}}{2\sigma^{2}}} = e^{-\dfrac{1}{2\sigma^{2}} \sum_{i=1}^{n}(x_{i} - x'_{i})^2}\]

We can further generalize it using the property \(e^{a + b}\) = \(e^a * e ^b\), we can separate the exponential factors:

\[K(x, x') = e^{-\dfrac{1}{2\sigma^{2}} \sum_{i=1}^{n}(x_{i} - x'_{i})^2} = \prod_{i=1}^{n}e^{-\dfrac{1}{2\sigma^{2}}(x - x')^{2}}\]

This is the mathematical implementation of the Gaussian RBF. It measures the similarity between two data points \(x\) and \(x'\) based on the Euclidean distance between them, with the parameter \(\sigma\) controlling the width of the kernel.

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	The numpy vector \(x\).	required
`x_prime`	`ndarray`	The numpy vector \(x'\).	required
`sigma`	`float32`	The paramter \(\sigma\) to control the width of the kernel.	required

Returns:

Type	Description
`ndarray`	RBF value.

Examples:

Example data points¶

x1 = np.array([1, 2, 3]) x2 = np.array([4, 5, 6]) ...

Set the width parameter¶

sigma = 1.0 ...

Calculate the RBF value¶

rbf_result = grbf(x1, x2, sigma) print("Gaussian RBF Value:", rbf_result)

Source code in ai/svm/kernels.py

def grbf(x: np.ndarray, x_prime: np.ndarray, sigma: np.float32) -> np.ndarray:
  """Implements Gaussian Radial Basis Function (RBF).

  The Gaussian RBF is a commonly used kernel function in Support Vector Machines
  and other machine learning algorithms. It's defined as:

  $$K(x, x') = e^{-\\dfrac{||x - x'||^{2}}{2\\sigma^{2}}}$$

  Here's the derivation:

  * The Gaussian RBF is defined as a function of two data points, $x$ and
  $x'$, with a parameter $\\sigma$ that controls the width of the
  kernel.
  * We start by computing the euclidean distance between the two data points
  $x$ and $x'$:

  $$||x - x'||^{2} = \\sum_{i=1}^{n}(x_{i} - x'_{i})^2$$

  Now, let's insert this distance into the Gaussian RBF formula:

  $$K(x, x') = e^{-\\dfrac{||x - x'||^{2}}{2\\sigma^{2}}}$$

  We have an exponential term, and we can simplify it further:

  $$e^{-\\dfrac{||x - x'||^{2}}{2\\sigma^{2}}} =
  e^{-\\dfrac{1}{2\\sigma^{2}} \\sum_{i=1}^{n}(x_{i} - x'_{i})^2}$$

  We can further generalize it using the property
  $e^{a + b}$ = $e^a * e ^b$, we can separate the exponential
  factors:

  $$K(x, x') = e^{-\\dfrac{1}{2\\sigma^{2}}
  \\sum_{i=1}^{n}(x_{i} - x'_{i})^2} =
  \\prod_{i=1}^{n}e^{-\\dfrac{1}{2\\sigma^{2}}(x - x')^{2}}$$

  This is the mathematical implementation of the Gaussian RBF. It measures the
  similarity between two data points $x$ and $x'$ based on the
  Euclidean distance between them, with the parameter $\\sigma$
  controlling the width of the kernel.

  Args:
    x: The numpy vector $x$.
    x_prime: The numpy vector $x'$.
    sigma: The paramter $\\sigma$ to control the width of the kernel.

  Returns:
    RBF value.

  Examples:

  >>> # Example data points
  >>> x1 = np.array([1, 2, 3])
  >>> x2 = np.array([4, 5, 6])
  ...
  >>> # Set the width parameter
  >>> sigma = 1.0
  ...
  >>> # Calculate the RBF value
  >>> rbf_result = grbf(x1, x2, sigma)
  >>> print("Gaussian RBF Value:", rbf_result)
  """
  sqr_dist = np.sum(np.power((x - x_prime), 2))
  grbf_val = np.exp(-sqr_dist / (2 * sigma**2))
  return grbf_val

svm¶

ai.svm ¶

LinearSVC ¶

__init__(*, alpha=0.01, lambda_p=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

classes ¶

LinearSVC ¶

__init__(*, alpha=0.01, lambda_p=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

kernels ¶

grbf(x, x_prime, sigma) ¶

Example data points¶

Set the width parameter¶

Calculate the RBF value¶

`ai.svm` ¶

`LinearSVC` ¶

`init(*, alpha=0.01, lambda_p=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`classes` ¶

`LinearSVC` ¶

`init(*, alpha=0.01, lambda_p=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`kernels` ¶

`grbf(x, x_prime, sigma)` ¶