linear_model¶

`ai.linear_model` ¶

The term linear model implies that the model is specified as a linear combination of features. Based on training data, the learning process computes one weight for each feature to form a model that can predict or estimate the target value.

ai.linear_model module implements a variety of linear models:

ai.linear_model.linear.LinearRegression
ai.linear_model.logistic.LogisticRegression
ai.linear_model.perceptron.Perceptron

`LinearRegression` ¶

LinearRegression fits a linear model with coefficients w = (w1, ..., wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Source code in ai/linear_model/linear.py

class LinearRegression:
  """`LinearRegression` fits a linear model with coefficients w = (w1, ..., wp)
  to minimize the residual sum of squares between the observed targets in the
  dataset, and the targets predicted by the linear approximation.
  """
  def __init__(self, *, alpha: np.float16 = .01, n_iters: int = 1000):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._alpha = alpha
    self._n_iters = n_iters
    self._bias = None
    self._weights = None

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegression':
    """Fit the linear model on `X` given `y`.

    Hypothesis function for our `LinearRegression` $\\hat y = b + wX$,
    where `b` is the model's intercept and `w` is the coefficient of `X`.

    The cost function or the loss function that we use is the Mean Squared Error
    (MSE) between the predicted value and the true value. The cost function
    `(J)` can be written as:

    $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

    To achieve the best-fit regression line, the model aims to predict the
    target value $\\hat Y$ such that the error difference between the
    predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
    it is very important to update the `b` and `w` values, to reach the best
    value that minimizes the error between the predicted `y` value and the true
    `y` value.

    A linear regression model can be trained using the optimization algorithm
    gradient descent by iteratively modifying the model’s parameters to reduce
    the mean squared error (MSE) of the model on a training dataset. To update
    `b` and `w` values in order to reduce the Cost function (minimizing RMSE
    value) and achieve the best-fit line the model uses Gradient Descent. The
    idea is to start with random `b` and `w` values and then iteratively update
    the values, reaching minimum cost.

    On differentiating cost function `J` with respect to `b`:

    $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

    On differentiating cost function `J` with respect to `w`:

    $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

    The above derivative functions are used for updating `weights` and `bias` in
    each iteration.

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target vector.
    """
    n_samples, n_features = X.shape
    self._bias = 0
    self._weights = np.zeros(n_features)

    for _ in range(self._n_iters):
      y_pred = np.dot(X, self._weights) + self._bias

      weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
      bias_d = (1 / n_samples) * np.sum((y_pred - y))

      self._weights = self._weights - (self._alpha * weights_d)
      self._bias = self._bias - (self._alpha * bias_d)

    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    return np.dot(X, self._weights) + self._bias

`init(*, alpha=0.01, n_iters=1000)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`n_iters`	`int`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/linear_model/linear.py

def __init__(self, *, alpha: np.float16 = .01, n_iters: int = 1000):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._alpha = alpha
  self._n_iters = n_iters
  self._bias = None
  self._weights = None

`fit(X, y)` ¶

Fit the linear model on X given y.

Hypothesis function for our LinearRegression \(\hat y = b + wX\), where b is the model's intercept and w is the coefficient of X.

The cost function or the loss function that we use is the Mean Squared Error (MSE) between the predicted value and the true value. The cost function (J) can be written as:

\[J = \dfrac{1}{m}\sum_{i=1}^{n}(\hat y_{i} - y_{i})^2\]

To achieve the best-fit regression line, the model aims to predict the target value \(\hat Y\) such that the error difference between the predicted value \(\hat Y\) and the true value \(Y\) is minimum. So, it is very important to update the b and w values, to reach the best value that minimizes the error between the predicted y value and the true y value.

A linear regression model can be trained using the optimization algorithm gradient descent by iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the model on a training dataset. To update b and w values in order to reduce the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The idea is to start with random b and w values and then iteratively update the values, reaching minimum cost.

On differentiating cost function J with respect to b:

\[\dfrac{dJ}{db} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i})\]

On differentiating cost function J with respect to w:

\[\dfrac{dJ}{dw} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i}) \cdot x_{i}\]

The above derivative functions are used for updating weights and bias in each iteration.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target vector.	required

Source code in ai/linear_model/linear.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegression':
  """Fit the linear model on `X` given `y`.

  Hypothesis function for our `LinearRegression` $\\hat y = b + wX$,
  where `b` is the model's intercept and `w` is the coefficient of `X`.

  The cost function or the loss function that we use is the Mean Squared Error
  (MSE) between the predicted value and the true value. The cost function
  `(J)` can be written as:

  $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

  To achieve the best-fit regression line, the model aims to predict the
  target value $\\hat Y$ such that the error difference between the
  predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
  it is very important to update the `b` and `w` values, to reach the best
  value that minimizes the error between the predicted `y` value and the true
  `y` value.

  A linear regression model can be trained using the optimization algorithm
  gradient descent by iteratively modifying the model’s parameters to reduce
  the mean squared error (MSE) of the model on a training dataset. To update
  `b` and `w` values in order to reduce the Cost function (minimizing RMSE
  value) and achieve the best-fit line the model uses Gradient Descent. The
  idea is to start with random `b` and `w` values and then iteratively update
  the values, reaching minimum cost.

  On differentiating cost function `J` with respect to `b`:

  $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

  On differentiating cost function `J` with respect to `w`:

  $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

  The above derivative functions are used for updating `weights` and `bias` in
  each iteration.

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target vector.
  """
  n_samples, n_features = X.shape
  self._bias = 0
  self._weights = np.zeros(n_features)

  for _ in range(self._n_iters):
    y_pred = np.dot(X, self._weights) + self._bias

    weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
    bias_d = (1 / n_samples) * np.sum((y_pred - y))

    self._weights = self._weights - (self._alpha * weights_d)
    self._bias = self._bias - (self._alpha * bias_d)

  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/linear_model/linear.py

def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  return np.dot(X, self._weights) + self._bias

`LogisticRegression` ¶

Logistic Regression (aka logit) classifier.

The logistic regression model transforms the linear regression function continuous value output into categorical value output using a sigmoid function, which maps any real-valued set of independent variables input into a value between 0 and 1. This function is known as the logistic function.

\[z = w \cdot X + b\]

Now we use the sigmoid function where the input will be z and we find the probability between 0 and 1. i.e predicted y.

\[\sigma (z) = \dfrac{1}{1 - e^{-z}}\]

Source code in ai/linear_model/logistic.py

class LogisticRegression:
  """Logistic Regression (aka logit) classifier.

  The logistic regression model transforms the linear regression function
  continuous value output into categorical value output using a sigmoid
  function, which maps any real-valued set of independent variables input into
  a value between 0 and 1. This function is known as the logistic function.

  $$z = w \\cdot X + b$$

  Now we use the sigmoid function where the input will be z and we find the
  probability between 0 and 1. i.e predicted y.

  $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$
  """
  def __init__(self, alpha: np.float16 = .01, n_iters: np.int64 = 1000):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._n_iters = n_iters
    self._alpha = alpha
    self._weights = None
    self._bias = None

  @staticmethod
  def _sigmoid(t: np.ndarray) -> np.ndarray:
    """Sigmoid function to find the probability of `t` between 0 and 1.

    Args:
      t: Model predictions.

    Returns:
      A value between 0 and 1 based on the sigmoid function.
    """
    return 1 / (1 + np.exp(-t))

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LogisticRegression':
    """Fit Logistic Regression according to X, y.

    Hypothesis function for our `LogisticRegression` is the same as for the
    `LinearRegression` $\\hat y = b + wX$, where `b` is the model's
    intercept and `w` is the coefficient of `X`.

    The cost function or the loss function that we use is the Mean Squared Error
    (MSE) between the predicted value and the true value. The cost function
    `(J)` can be written as:

    $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

    To achieve the best-fit regression line, the model aims to predict the
    target value $\\hat Y$ such that the error difference between the
    predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
    it is very important to update the `b` and `w` values, to reach the best
    value that minimizes the error between the predicted `y` value and the true
    `y` value.

    A logistic regression model can be trained using the optimization algorithm
    gradient descent by iteratively modifying the model’s parameters to reduce
    the mean squared error (MSE) of the model on a training dataset. To update
    `b` and `w` values in order to reduce the Cost function (minimizing RMSE
    value) and achieve the best-fit line the model uses Gradient Descent. The
    idea is to start with random `b` and `w` values and then iteratively update
    the values, reaching minimum cost.

    On differentiating cost function `J` with respect to `b`:

    $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

    On differentiating cost function `J` with respect to `w`:

    $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

    The above derivative functions are used for updating `weights` and `bias` in
    each iteration.

    The sigmoid function is then used for mapping the predictions between 0 and
    1.

    $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$

    where $z$ can be replaced with our hypothesis function
    $\\hat y = b + wX$.

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target values.

    Returns:
      Returns the instance itself.
    """
    n_samples, n_features = X.shape
    self._bias = 0
    self._weights = np.zeros(n_features)

    for _ in range(self._n_iters):
      y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)

      weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
      bias_d = (1 / n_samples) * np.sum((y_pred - y))

      self._weights = self._weights - (self._alpha * weights_d)
      self._bias = self._bias - (self._alpha * bias_d)

    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)
    c_pred = [0 if y <= .5 else 1 for y in y_pred]
    return np.array(c_pred)

`init(alpha=0.01, n_iters=1000)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`n_iters`	`int64`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/linear_model/logistic.py

def __init__(self, alpha: np.float16 = .01, n_iters: np.int64 = 1000):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._n_iters = n_iters
  self._alpha = alpha
  self._weights = None
  self._bias = None

`fit(X, y)` ¶

Fit Logistic Regression according to X, y.

Hypothesis function for our LogisticRegression is the same as for the LinearRegression \(\hat y = b + wX\), where b is the model's intercept and w is the coefficient of X.

The cost function or the loss function that we use is the Mean Squared Error (MSE) between the predicted value and the true value. The cost function (J) can be written as:

\[J = \dfrac{1}{m}\sum_{i=1}^{n}(\hat y_{i} - y_{i})^2\]

To achieve the best-fit regression line, the model aims to predict the target value \(\hat Y\) such that the error difference between the predicted value \(\hat Y\) and the true value \(Y\) is minimum. So, it is very important to update the b and w values, to reach the best value that minimizes the error between the predicted y value and the true y value.

A logistic regression model can be trained using the optimization algorithm gradient descent by iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the model on a training dataset. To update b and w values in order to reduce the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The idea is to start with random b and w values and then iteratively update the values, reaching minimum cost.

On differentiating cost function J with respect to b:

\[\dfrac{dJ}{db} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i})\]

On differentiating cost function J with respect to w:

\[\dfrac{dJ}{dw} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i}) \cdot x_{i}\]

The above derivative functions are used for updating weights and bias in each iteration.

The sigmoid function is then used for mapping the predictions between 0 and 1.

\[\sigma (z) = \dfrac{1}{1 - e^{-z}}\]

where \(z\) can be replaced with our hypothesis function \(\hat y = b + wX\).

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Type	Description
`LogisticRegression`	Returns the instance itself.

Source code in ai/linear_model/logistic.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LogisticRegression':
  """Fit Logistic Regression according to X, y.

  Hypothesis function for our `LogisticRegression` is the same as for the
  `LinearRegression` $\\hat y = b + wX$, where `b` is the model's
  intercept and `w` is the coefficient of `X`.

  The cost function or the loss function that we use is the Mean Squared Error
  (MSE) between the predicted value and the true value. The cost function
  `(J)` can be written as:

  $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

  To achieve the best-fit regression line, the model aims to predict the
  target value $\\hat Y$ such that the error difference between the
  predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
  it is very important to update the `b` and `w` values, to reach the best
  value that minimizes the error between the predicted `y` value and the true
  `y` value.

  A logistic regression model can be trained using the optimization algorithm
  gradient descent by iteratively modifying the model’s parameters to reduce
  the mean squared error (MSE) of the model on a training dataset. To update
  `b` and `w` values in order to reduce the Cost function (minimizing RMSE
  value) and achieve the best-fit line the model uses Gradient Descent. The
  idea is to start with random `b` and `w` values and then iteratively update
  the values, reaching minimum cost.

  On differentiating cost function `J` with respect to `b`:

  $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

  On differentiating cost function `J` with respect to `w`:

  $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

  The above derivative functions are used for updating `weights` and `bias` in
  each iteration.

  The sigmoid function is then used for mapping the predictions between 0 and
  1.

  $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$

  where $z$ can be replaced with our hypothesis function
  $\\hat y = b + wX$.

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target values.

  Returns:
    Returns the instance itself.
  """
  n_samples, n_features = X.shape
  self._bias = 0
  self._weights = np.zeros(n_features)

  for _ in range(self._n_iters):
    y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)

    weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
    bias_d = (1 / n_samples) * np.sum((y_pred - y))

    self._weights = self._weights - (self._alpha * weights_d)
    self._bias = self._bias - (self._alpha * bias_d)

  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/linear_model/logistic.py

def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)
  c_pred = [0 if y <= .5 else 1 for y in y_pred]
  return np.array(c_pred)

`Perceptron` ¶

Perceptron classifier.

The perceptron is a simple supervised machine learning algorithm used to classify data into binary outcomes. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

\[y = \begin{cases} 1 & \text{if } w \cdot x + b > 0 \\ 0 & \text{otherwise} \end{cases}\]

Source code in ai/linear_model/perceptron.py

class Perceptron:
  """Perceptron classifier.

  The perceptron is a simple supervised machine learning algorithm used to
  classify data into binary outcomes. It is a type of linear classifier, i.e. a
  classification algorithm that makes its predictions based on a linear
  predictor function combining a set of weights with the feature vector.

  $$y = \\begin{cases} 1 & \\text{if } w \\cdot x + b > 0 \\\\
  0 & \\text{otherwise} \\end{cases}$$
  """

  def __init__(self,
               *,
               alpha: np.float16 = np.float16(.01),
               n_iters: np.int64 = np.int64(1000),
               random_state: int = 1):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot
        the minimum loss, while low values might make the model to take forever
        to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
      random_state: Seed to generate random weights and bias.
    """
    self.alpha = alpha
    self.n_iters = n_iters
    self.random_state = random_state

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'Perceptron':
    """Fit training data.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.
      y: Target values.

    Returns:
      self: An instance of self.
    """
    rgen = np.random.RandomState(self.random_state)
    self.weights = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
    self.b_ = np.float64(0.)
    self.errors_ = []
    for _ in range(self.n_iters):
      errors = 0
      for xi, target in zip(X, y):
        update = self.alpha * (target - self.predict(xi))
        self.weights += update * xi
        self.b_ += update
        errors += int(update != 0.0)
      self.errors_.append(errors)
    return self

  def net_input(self, X):
    """Calculate net input.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.

    Returns:
      The dot product of `X` and `weights` plus `b`.
    """
    return np.dot(X, self.weights) + self.b_

  def predict(self, X):
    """Return class label after unit step.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.

    Returns:
      The class label after unit step.
    """
    return np.where(self.net_input(X) >= 0.0, 1, 0)

`init(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`float16(0.01)`
`n_iters`	`int64`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`int64(1000)`
`random_state`	`int`	Seed to generate random weights and bias.	`1`

Source code in ai/linear_model/perceptron.py

def __init__(self,
             *,
             alpha: np.float16 = np.float16(.01),
             n_iters: np.int64 = np.int64(1000),
             random_state: int = 1):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot
      the minimum loss, while low values might make the model to take forever
      to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
    random_state: Seed to generate random weights and bias.
  """
  self.alpha = alpha
  self.n_iters = n_iters
  self.random_state = random_state

`fit(X, y)` ¶

Fit training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where n_samples is the number of samples and n_features is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Name	Type	Description
`self`	`Perceptron`	An instance of self.

Source code in ai/linear_model/perceptron.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'Perceptron':
  """Fit training data.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.
    y: Target values.

  Returns:
    self: An instance of self.
  """
  rgen = np.random.RandomState(self.random_state)
  self.weights = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
  self.b_ = np.float64(0.)
  self.errors_ = []
  for _ in range(self.n_iters):
    errors = 0
    for xi, target in zip(X, y):
      update = self.alpha * (target - self.predict(xi))
      self.weights += update * xi
      self.b_ += update
      errors += int(update != 0.0)
    self.errors_.append(errors)
  return self

`net_input(X)` ¶

Calculate net input.

Parameters:

Name	Type	Description	Default
`X`		Training vectors, where n_samples is the number of samples and n_features is the number of features.	required

Returns:

Type	Description
	The dot product of `X` and `weights` plus `b`.

Source code in ai/linear_model/perceptron.py

def net_input(self, X):
  """Calculate net input.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.

  Returns:
    The dot product of `X` and `weights` plus `b`.
  """
  return np.dot(X, self.weights) + self.b_

`predict(X)` ¶

Return class label after unit step.

Parameters:

Name	Type	Description	Default
`X`		Training vectors, where n_samples is the number of samples and n_features is the number of features.	required

Returns:

Type	Description
	The class label after unit step.

Source code in ai/linear_model/perceptron.py

def predict(self, X):
  """Return class label after unit step.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.

  Returns:
    The class label after unit step.
  """
  return np.where(self.net_input(X) >= 0.0, 1, 0)

`linear` ¶

A numpy-compatible linear regression implementation.

Example:

```python
import numpy as np

from sklearn import datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

from ai.linear_model import LinearRegression

X, y = datasets.make_regression(
n_samples=100, n_features=1, noise=20, random_state=4
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(accuracy_score(y_pred, y_test))
```

`LinearRegression` ¶

LinearRegression fits a linear model with coefficients w = (w1, ..., wp) to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation.

Source code in ai/linear_model/linear.py

class LinearRegression:
  """`LinearRegression` fits a linear model with coefficients w = (w1, ..., wp)
  to minimize the residual sum of squares between the observed targets in the
  dataset, and the targets predicted by the linear approximation.
  """
  def __init__(self, *, alpha: np.float16 = .01, n_iters: int = 1000):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._alpha = alpha
    self._n_iters = n_iters
    self._bias = None
    self._weights = None

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegression':
    """Fit the linear model on `X` given `y`.

    Hypothesis function for our `LinearRegression` $\\hat y = b + wX$,
    where `b` is the model's intercept and `w` is the coefficient of `X`.

    The cost function or the loss function that we use is the Mean Squared Error
    (MSE) between the predicted value and the true value. The cost function
    `(J)` can be written as:

    $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

    To achieve the best-fit regression line, the model aims to predict the
    target value $\\hat Y$ such that the error difference between the
    predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
    it is very important to update the `b` and `w` values, to reach the best
    value that minimizes the error between the predicted `y` value and the true
    `y` value.

    A linear regression model can be trained using the optimization algorithm
    gradient descent by iteratively modifying the model’s parameters to reduce
    the mean squared error (MSE) of the model on a training dataset. To update
    `b` and `w` values in order to reduce the Cost function (minimizing RMSE
    value) and achieve the best-fit line the model uses Gradient Descent. The
    idea is to start with random `b` and `w` values and then iteratively update
    the values, reaching minimum cost.

    On differentiating cost function `J` with respect to `b`:

    $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

    On differentiating cost function `J` with respect to `w`:

    $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

    The above derivative functions are used for updating `weights` and `bias` in
    each iteration.

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target vector.
    """
    n_samples, n_features = X.shape
    self._bias = 0
    self._weights = np.zeros(n_features)

    for _ in range(self._n_iters):
      y_pred = np.dot(X, self._weights) + self._bias

      weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
      bias_d = (1 / n_samples) * np.sum((y_pred - y))

      self._weights = self._weights - (self._alpha * weights_d)
      self._bias = self._bias - (self._alpha * bias_d)

    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    return np.dot(X, self._weights) + self._bias

`init(*, alpha=0.01, n_iters=1000)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`n_iters`	`int`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/linear_model/linear.py

def __init__(self, *, alpha: np.float16 = .01, n_iters: int = 1000):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._alpha = alpha
  self._n_iters = n_iters
  self._bias = None
  self._weights = None

`fit(X, y)` ¶

Fit the linear model on X given y.

Hypothesis function for our LinearRegression \(\hat y = b + wX\), where b is the model's intercept and w is the coefficient of X.

The cost function or the loss function that we use is the Mean Squared Error (MSE) between the predicted value and the true value. The cost function (J) can be written as:

\[J = \dfrac{1}{m}\sum_{i=1}^{n}(\hat y_{i} - y_{i})^2\]

To achieve the best-fit regression line, the model aims to predict the target value \(\hat Y\) such that the error difference between the predicted value \(\hat Y\) and the true value \(Y\) is minimum. So, it is very important to update the b and w values, to reach the best value that minimizes the error between the predicted y value and the true y value.

A linear regression model can be trained using the optimization algorithm gradient descent by iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the model on a training dataset. To update b and w values in order to reduce the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The idea is to start with random b and w values and then iteratively update the values, reaching minimum cost.

On differentiating cost function J with respect to b:

\[\dfrac{dJ}{db} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i})\]

On differentiating cost function J with respect to w:

\[\dfrac{dJ}{dw} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i}) \cdot x_{i}\]

The above derivative functions are used for updating weights and bias in each iteration.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target vector.	required

Source code in ai/linear_model/linear.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LinearRegression':
  """Fit the linear model on `X` given `y`.

  Hypothesis function for our `LinearRegression` $\\hat y = b + wX$,
  where `b` is the model's intercept and `w` is the coefficient of `X`.

  The cost function or the loss function that we use is the Mean Squared Error
  (MSE) between the predicted value and the true value. The cost function
  `(J)` can be written as:

  $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

  To achieve the best-fit regression line, the model aims to predict the
  target value $\\hat Y$ such that the error difference between the
  predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
  it is very important to update the `b` and `w` values, to reach the best
  value that minimizes the error between the predicted `y` value and the true
  `y` value.

  A linear regression model can be trained using the optimization algorithm
  gradient descent by iteratively modifying the model’s parameters to reduce
  the mean squared error (MSE) of the model on a training dataset. To update
  `b` and `w` values in order to reduce the Cost function (minimizing RMSE
  value) and achieve the best-fit line the model uses Gradient Descent. The
  idea is to start with random `b` and `w` values and then iteratively update
  the values, reaching minimum cost.

  On differentiating cost function `J` with respect to `b`:

  $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

  On differentiating cost function `J` with respect to `w`:

  $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

  The above derivative functions are used for updating `weights` and `bias` in
  each iteration.

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target vector.
  """
  n_samples, n_features = X.shape
  self._bias = 0
  self._weights = np.zeros(n_features)

  for _ in range(self._n_iters):
    y_pred = np.dot(X, self._weights) + self._bias

    weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
    bias_d = (1 / n_samples) * np.sum((y_pred - y))

    self._weights = self._weights - (self._alpha * weights_d)
    self._bias = self._bias - (self._alpha * bias_d)

  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/linear_model/linear.py

def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  return np.dot(X, self._weights) + self._bias

`logistic` ¶

`LogisticRegression` ¶

Logistic Regression (aka logit) classifier.

The logistic regression model transforms the linear regression function continuous value output into categorical value output using a sigmoid function, which maps any real-valued set of independent variables input into a value between 0 and 1. This function is known as the logistic function.

\[z = w \cdot X + b\]

Now we use the sigmoid function where the input will be z and we find the probability between 0 and 1. i.e predicted y.

\[\sigma (z) = \dfrac{1}{1 - e^{-z}}\]

Source code in ai/linear_model/logistic.py

class LogisticRegression:
  """Logistic Regression (aka logit) classifier.

  The logistic regression model transforms the linear regression function
  continuous value output into categorical value output using a sigmoid
  function, which maps any real-valued set of independent variables input into
  a value between 0 and 1. This function is known as the logistic function.

  $$z = w \\cdot X + b$$

  Now we use the sigmoid function where the input will be z and we find the
  probability between 0 and 1. i.e predicted y.

  $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$
  """
  def __init__(self, alpha: np.float16 = .01, n_iters: np.int64 = 1000):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot the minimum
        loss, while low values might make the model to take forever to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
    """
    self._n_iters = n_iters
    self._alpha = alpha
    self._weights = None
    self._bias = None

  @staticmethod
  def _sigmoid(t: np.ndarray) -> np.ndarray:
    """Sigmoid function to find the probability of `t` between 0 and 1.

    Args:
      t: Model predictions.

    Returns:
      A value between 0 and 1 based on the sigmoid function.
    """
    return 1 / (1 + np.exp(-t))

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'LogisticRegression':
    """Fit Logistic Regression according to X, y.

    Hypothesis function for our `LogisticRegression` is the same as for the
    `LinearRegression` $\\hat y = b + wX$, where `b` is the model's
    intercept and `w` is the coefficient of `X`.

    The cost function or the loss function that we use is the Mean Squared Error
    (MSE) between the predicted value and the true value. The cost function
    `(J)` can be written as:

    $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

    To achieve the best-fit regression line, the model aims to predict the
    target value $\\hat Y$ such that the error difference between the
    predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
    it is very important to update the `b` and `w` values, to reach the best
    value that minimizes the error between the predicted `y` value and the true
    `y` value.

    A logistic regression model can be trained using the optimization algorithm
    gradient descent by iteratively modifying the model’s parameters to reduce
    the mean squared error (MSE) of the model on a training dataset. To update
    `b` and `w` values in order to reduce the Cost function (minimizing RMSE
    value) and achieve the best-fit line the model uses Gradient Descent. The
    idea is to start with random `b` and `w` values and then iteratively update
    the values, reaching minimum cost.

    On differentiating cost function `J` with respect to `b`:

    $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

    On differentiating cost function `J` with respect to `w`:

    $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
    \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

    The above derivative functions are used for updating `weights` and `bias` in
    each iteration.

    The sigmoid function is then used for mapping the predictions between 0 and
    1.

    $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$

    where $z$ can be replaced with our hypothesis function
    $\\hat y = b + wX$.

    Args:
      X: Training vectors, where `n_samples` is the number of samples and
        `n_features` is the number of features.
      y: Target values.

    Returns:
      Returns the instance itself.
    """
    n_samples, n_features = X.shape
    self._bias = 0
    self._weights = np.zeros(n_features)

    for _ in range(self._n_iters):
      y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)

      weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
      bias_d = (1 / n_samples) * np.sum((y_pred - y))

      self._weights = self._weights - (self._alpha * weights_d)
      self._bias = self._bias - (self._alpha * bias_d)

    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict for `X` using the previously calculated `weights` and `bias`.

    Args:
      X: Feature vector.

    Returns:
      Target vector.

    Raises:
      RuntimeError: If `predict` is called before `fit`.
      ValueError: If shape of the given `X` differs from the shape of the `X`
        given to the `fit` function.
    """
    if self._weights is None or self._bias is None:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    if X.shape[1] != self._weights.shape[0]:
      raise ValueError(
        (
          f'Number of features {X.shape[1]} does not match previous data '
          f'{self._weights.shape[0]}.'
        )
      )

    y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)
    c_pred = [0 if y <= .5 else 1 for y in y_pred]
    return np.array(c_pred)

`init(alpha=0.01, n_iters=1000)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`0.01`
`n_iters`	`int64`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`1000`

Source code in ai/linear_model/logistic.py

def __init__(self, alpha: np.float16 = .01, n_iters: np.int64 = 1000):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot the minimum
      loss, while low values might make the model to take forever to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
  """
  self._n_iters = n_iters
  self._alpha = alpha
  self._weights = None
  self._bias = None

`fit(X, y)` ¶

Fit Logistic Regression according to X, y.

Hypothesis function for our LogisticRegression is the same as for the LinearRegression \(\hat y = b + wX\), where b is the model's intercept and w is the coefficient of X.

The cost function or the loss function that we use is the Mean Squared Error (MSE) between the predicted value and the true value. The cost function (J) can be written as:

\[J = \dfrac{1}{m}\sum_{i=1}^{n}(\hat y_{i} - y_{i})^2\]

To achieve the best-fit regression line, the model aims to predict the target value \(\hat Y\) such that the error difference between the predicted value \(\hat Y\) and the true value \(Y\) is minimum. So, it is very important to update the b and w values, to reach the best value that minimizes the error between the predicted y value and the true y value.

A logistic regression model can be trained using the optimization algorithm gradient descent by iteratively modifying the model’s parameters to reduce the mean squared error (MSE) of the model on a training dataset. To update b and w values in order to reduce the Cost function (minimizing RMSE value) and achieve the best-fit line the model uses Gradient Descent. The idea is to start with random b and w values and then iteratively update the values, reaching minimum cost.

On differentiating cost function J with respect to b:

\[\dfrac{dJ}{db} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i})\]

On differentiating cost function J with respect to w:

\[\dfrac{dJ}{dw} = \dfrac{2}{n} \cdot \sum_{i=1}^{n}(\hat y_{i} - y_{i}) \cdot x_{i}\]

The above derivative functions are used for updating weights and bias in each iteration.

The sigmoid function is then used for mapping the predictions between 0 and 1.

\[\sigma (z) = \dfrac{1}{1 - e^{-z}}\]

where \(z\) can be replaced with our hypothesis function \(\hat y = b + wX\).

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where `n_samples` is the number of samples and `n_features` is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Type	Description
`LogisticRegression`	Returns the instance itself.

Source code in ai/linear_model/logistic.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'LogisticRegression':
  """Fit Logistic Regression according to X, y.

  Hypothesis function for our `LogisticRegression` is the same as for the
  `LinearRegression` $\\hat y = b + wX$, where `b` is the model's
  intercept and `w` is the coefficient of `X`.

  The cost function or the loss function that we use is the Mean Squared Error
  (MSE) between the predicted value and the true value. The cost function
  `(J)` can be written as:

  $$J = \\dfrac{1}{m}\\sum_{i=1}^{n}(\\hat y_{i} - y_{i})^2$$

  To achieve the best-fit regression line, the model aims to predict the
  target value $\\hat Y$ such that the error difference between the
  predicted value $\\hat Y$ and the true value $Y$ is minimum. So,
  it is very important to update the `b` and `w` values, to reach the best
  value that minimizes the error between the predicted `y` value and the true
  `y` value.

  A logistic regression model can be trained using the optimization algorithm
  gradient descent by iteratively modifying the model’s parameters to reduce
  the mean squared error (MSE) of the model on a training dataset. To update
  `b` and `w` values in order to reduce the Cost function (minimizing RMSE
  value) and achieve the best-fit line the model uses Gradient Descent. The
  idea is to start with random `b` and `w` values and then iteratively update
  the values, reaching minimum cost.

  On differentiating cost function `J` with respect to `b`:

  $$\\dfrac{dJ}{db} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i})$$

  On differentiating cost function `J` with respect to `w`:

  $$\\dfrac{dJ}{dw} = \\dfrac{2}{n} \\cdot
  \\sum_{i=1}^{n}(\\hat y_{i} - y_{i}) \\cdot x_{i}$$

  The above derivative functions are used for updating `weights` and `bias` in
  each iteration.

  The sigmoid function is then used for mapping the predictions between 0 and
  1.

  $$\\sigma (z) = \\dfrac{1}{1 - e^{-z}}$$

  where $z$ can be replaced with our hypothesis function
  $\\hat y = b + wX$.

  Args:
    X: Training vectors, where `n_samples` is the number of samples and
      `n_features` is the number of features.
    y: Target values.

  Returns:
    Returns the instance itself.
  """
  n_samples, n_features = X.shape
  self._bias = 0
  self._weights = np.zeros(n_features)

  for _ in range(self._n_iters):
    y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)

    weights_d = (1 / n_samples) * np.dot(X.T, (y_pred - y))
    bias_d = (1 / n_samples) * np.sum((y_pred - y))

    self._weights = self._weights - (self._alpha * weights_d)
    self._bias = self._bias - (self._alpha * bias_d)

  return self

`predict(X)` ¶

Predict for X using the previously calculated weights and bias.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature vector.	required

Returns:

Type	Description
`ndarray`	Target vector.

Raises:

Type	Description
`RuntimeError`	If `predict` is called before `fit`.
`ValueError`	If shape of the given `X` differs from the shape of the `X` given to the `fit` function.

Source code in ai/linear_model/logistic.py

def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict for `X` using the previously calculated `weights` and `bias`.

  Args:
    X: Feature vector.

  Returns:
    Target vector.

  Raises:
    RuntimeError: If `predict` is called before `fit`.
    ValueError: If shape of the given `X` differs from the shape of the `X`
      given to the `fit` function.
  """
  if self._weights is None or self._bias is None:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  if X.shape[1] != self._weights.shape[0]:
    raise ValueError(
      (
        f'Number of features {X.shape[1]} does not match previous data '
        f'{self._weights.shape[0]}.'
      )
    )

  y_pred = self._sigmoid(np.dot(X, self._weights) + self._bias)
  c_pred = [0 if y <= .5 else 1 for y in y_pred]
  return np.array(c_pred)

`perceptron` ¶

`Perceptron` ¶

Perceptron classifier.

The perceptron is a simple supervised machine learning algorithm used to classify data into binary outcomes. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.

\[y = \begin{cases} 1 & \text{if } w \cdot x + b > 0 \\ 0 & \text{otherwise} \end{cases}\]

Source code in ai/linear_model/perceptron.py

class Perceptron:
  """Perceptron classifier.

  The perceptron is a simple supervised machine learning algorithm used to
  classify data into binary outcomes. It is a type of linear classifier, i.e. a
  classification algorithm that makes its predictions based on a linear
  predictor function combining a set of weights with the feature vector.

  $$y = \\begin{cases} 1 & \\text{if } w \\cdot x + b > 0 \\\\
  0 & \\text{otherwise} \\end{cases}$$
  """

  def __init__(self,
               *,
               alpha: np.float16 = np.float16(.01),
               n_iters: np.int64 = np.int64(1000),
               random_state: int = 1):
    """Initializes model's `learning rate` and number of `iterations`.

    Args:
      alpha: Model's learning rate. High value might over shoot
        the minimum loss, while low values might make the model to take forever
        to learn.
      n_iters: Maximum number of updations to make over the weights and bias in
        order to reach to a effecient prediction that minimizes the loss.
      random_state: Seed to generate random weights and bias.
    """
    self.alpha = alpha
    self.n_iters = n_iters
    self.random_state = random_state

  def fit(self, X: np.ndarray, y: np.ndarray) -> 'Perceptron':
    """Fit training data.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.
      y: Target values.

    Returns:
      self: An instance of self.
    """
    rgen = np.random.RandomState(self.random_state)
    self.weights = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
    self.b_ = np.float64(0.)
    self.errors_ = []
    for _ in range(self.n_iters):
      errors = 0
      for xi, target in zip(X, y):
        update = self.alpha * (target - self.predict(xi))
        self.weights += update * xi
        self.b_ += update
        errors += int(update != 0.0)
      self.errors_.append(errors)
    return self

  def net_input(self, X):
    """Calculate net input.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.

    Returns:
      The dot product of `X` and `weights` plus `b`.
    """
    return np.dot(X, self.weights) + self.b_

  def predict(self, X):
    """Return class label after unit step.

    Args:
      X: Training vectors, where n_samples is the number of samples and
        n_features is the number of features.

    Returns:
      The class label after unit step.
    """
    return np.where(self.net_input(X) >= 0.0, 1, 0)

`init(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1)` ¶

Initializes model's learning rate and number of iterations.

Parameters:

Name	Type	Description	Default
`alpha`	`float16`	Model's learning rate. High value might over shoot the minimum loss, while low values might make the model to take forever to learn.	`float16(0.01)`
`n_iters`	`int64`	Maximum number of updations to make over the weights and bias in order to reach to a effecient prediction that minimizes the loss.	`int64(1000)`
`random_state`	`int`	Seed to generate random weights and bias.	`1`

Source code in ai/linear_model/perceptron.py

def __init__(self,
             *,
             alpha: np.float16 = np.float16(.01),
             n_iters: np.int64 = np.int64(1000),
             random_state: int = 1):
  """Initializes model's `learning rate` and number of `iterations`.

  Args:
    alpha: Model's learning rate. High value might over shoot
      the minimum loss, while low values might make the model to take forever
      to learn.
    n_iters: Maximum number of updations to make over the weights and bias in
      order to reach to a effecient prediction that minimizes the loss.
    random_state: Seed to generate random weights and bias.
  """
  self.alpha = alpha
  self.n_iters = n_iters
  self.random_state = random_state

`fit(X, y)` ¶

Fit training data.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training vectors, where n_samples is the number of samples and n_features is the number of features.	required
`y`	`ndarray`	Target values.	required

Returns:

Name	Type	Description
`self`	`Perceptron`	An instance of self.

Source code in ai/linear_model/perceptron.py

def fit(self, X: np.ndarray, y: np.ndarray) -> 'Perceptron':
  """Fit training data.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.
    y: Target values.

  Returns:
    self: An instance of self.
  """
  rgen = np.random.RandomState(self.random_state)
  self.weights = rgen.normal(loc=0.0, scale=0.01, size=X.shape[1])
  self.b_ = np.float64(0.)
  self.errors_ = []
  for _ in range(self.n_iters):
    errors = 0
    for xi, target in zip(X, y):
      update = self.alpha * (target - self.predict(xi))
      self.weights += update * xi
      self.b_ += update
      errors += int(update != 0.0)
    self.errors_.append(errors)
  return self

`net_input(X)` ¶

Calculate net input.

Parameters:

Name	Type	Description	Default
`X`		Training vectors, where n_samples is the number of samples and n_features is the number of features.	required

Returns:

Type	Description
	The dot product of `X` and `weights` plus `b`.

Source code in ai/linear_model/perceptron.py

def net_input(self, X):
  """Calculate net input.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.

  Returns:
    The dot product of `X` and `weights` plus `b`.
  """
  return np.dot(X, self.weights) + self.b_

`predict(X)` ¶

Return class label after unit step.

Parameters:

Name	Type	Description	Default
`X`		Training vectors, where n_samples is the number of samples and n_features is the number of features.	required

Returns:

Type	Description
	The class label after unit step.

Source code in ai/linear_model/perceptron.py

def predict(self, X):
  """Return class label after unit step.

  Args:
    X: Training vectors, where n_samples is the number of samples and
      n_features is the number of features.

  Returns:
    The class label after unit step.
  """
  return np.where(self.net_input(X) >= 0.0, 1, 0)

linear_model¶

ai.linear_model ¶

LinearRegression ¶

__init__(*, alpha=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

LogisticRegression ¶

__init__(alpha=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

Perceptron ¶

__init__(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1) ¶

fit(X, y) ¶

net_input(X) ¶

predict(X) ¶

linear ¶

LinearRegression ¶

__init__(*, alpha=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

logistic ¶

LogisticRegression ¶

__init__(alpha=0.01, n_iters=1000) ¶

fit(X, y) ¶

predict(X) ¶

perceptron ¶

Perceptron ¶

__init__(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1) ¶

fit(X, y) ¶

net_input(X) ¶

predict(X) ¶

`ai.linear_model` ¶

`LinearRegression` ¶

`init(*, alpha=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`LogisticRegression` ¶

`init(alpha=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`Perceptron` ¶

`init(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1)` ¶

`fit(X, y)` ¶

`net_input(X)` ¶

`predict(X)` ¶

`linear` ¶

`LinearRegression` ¶

`init(*, alpha=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`logistic` ¶

`LogisticRegression` ¶

`init(alpha=0.01, n_iters=1000)` ¶

`fit(X, y)` ¶

`predict(X)` ¶

`perceptron` ¶

`Perceptron` ¶

`init(*, alpha=np.float16(0.01), n_iters=np.int64(1000), random_state=1)` ¶

`fit(X, y)` ¶

`net_input(X)` ¶

`predict(X)` ¶