Skip to content

neighbors

ai.neighbors

ai.neighbors provides functionality for unsupervised and supervised neighbors-based learning methods. Unsupervised nearest neighbors is the foundation of many other learning methods, notably manifold learning and spectral clustering. Supervised neighbors-based learning comes in two flavors: classification for data with discrete labels, and regression for data with continuous labels.

The principle behind nearest neighbor methods is to find a predefined number of training samples closest in distance to the new point, and predict the label from these. The number of samples can be a user-defined constant (k-nearest neighbor learning), or vary based on the local density of points (radius-based neighbor learning). The distance can, in general, be any metric measure: standard Euclidean distance is the most common choice. Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree).

Despite its simplicity, nearest neighbors has been successful in a large number of classification and regression problems, including handwritten digits and satellite image scenes. Being a non-parametric method, it is often successful in classification situations where the decision boundary is very irregular.

ai.neighbors implements the following nearest-neighbors algorithms:

  • ai.neighbors.knn.KNeighborsClassifier

KNeighborsClassifier

Bases: DistanceMetric

Classifier implementing the k-nearest neighbors vote.

The k-nearest neighbors algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. k-nearest neighbors algorithm is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.

For classification problems, a class label is assigned on the basis of a majority vote - i.e., the label that is most frequently represented around a give data point is used.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use. By default 3.

3
p int

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

2
metric str

Metric to use for distance computation. Default is euclidean.

'euclidean'
Source code in ai/neighbors/knn.py
class KNeighborsClassifier(DistanceMetric):
  """Classifier implementing the k-nearest neighbors vote.

  The k-nearest neighbors algorithm is a non-parametric, supervised learning
  classifier, which uses proximity to make classifications or predictions about
  the grouping of an individual data point. k-nearest neighbors algorithm is
  typically used as a classification algorithm, working off the assumption that
  similar points can be found near one another.

  For classification problems, a class label is assigned on the basis of a
  majority vote - i.e., the label that is most frequently represented around a
  give data point is used.

  Args:
    n_neighbors: Number of neighbors to use. By default `3`.
    p: Power parameter for the Minkowski metric. When `p = 1`, this is
      equivalent to using `manhattan_distance (l1)`, and
      `euclidean_distance (l2)` for
      `p = 2`. For arbitrary `p`, `minkowski_distance (l_p)` is used.
    metric: Metric to use for distance computation. Default is `euclidean`.
  """
  _parameter_constraints: dict = {
    'metric': [
      ('euclidean', 'supported'), ('minkowski', 'not-supported'),
      ('manhattan', 'not-supported'), ('hamming', 'not-supported')
    ]
  }

  @staticmethod
  def _check_if_parameters_comply_to_constraints(**kwargs: dict) -> None:
    """Private static method to ensure the compatibility of the hyperparameters
    passed to the `KNeighborsClassifier`.

    Args:
      kwargs: Passed hyperparameters.

    Raises:
      ValueError: If any hyperparameter is not compatible.
    """
    for (metric_name, metric_status
         ) in KNeighborsClassifier._parameter_constraints['metric']:
      if kwargs['metric'] == metric_name:
        if metric_status != 'supported':
          raise ValueError(
            f'distance metric {metric_name} is not supported yet'
          )
        break

  def __init__(
    self, *, n_neighbors: int = 3, p: int = 2, metric: str = 'euclidean'
  ):
    """Initializes model's hyperparameters.

    Args:
      n_neighbors: Number of neighbors to use. By default `3`.
      p: Power parameter for the Minkowski metric. When `p = 1`, this is
        equivalent to using `manhattan_distance (l1)`, and
        `euclidean_distance (l2)` for `p = 2`. For arbitrary `p`,
        `minkowski_distance (l_p)` is used.
      metric: Metric to use for distance computation. Default is `euclidean`.
    """
    self._n_neighbors = n_neighbors
    self._p = p
    self._metric = metric
    self._is_fitted = False

    self._check_if_parameters_comply_to_constraints(metric=self._metric)
    super().__init__(self._metric, self._p)

  def fit(self, X: np.ndarray, y: np.ndarray) -> None:
    """Fit the k-nearest neighbors classifier from the training dataset.

    Args:
      X: Sample vector.
      y: Target vector.

    Returns:
      The fitted `KNeighborsClassifier`.
    """
    self._X = X
    self._y = y
    self._is_fitted = True
    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict the class labels for the provided data.

    Args:
      X: Test samples.

    Returns:
      Class label for each data sample.

    Raises:
      RuntimeError: If predict method is called before fit.
    """
    if self._is_fitted is False:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    preds = []
    for x in X:
      # Compute the distance of the new point `x` from all the points in the
      # training set using the initialized distance metric
      distances = [self.distance(x, x_train) for x_train in self._X]

      # Extract all `_n_neighbors` distances from a sorted list of `distances`
      k_indices = np.argsort(distances)[:self._n_neighbors]
      # Finally extract the labels using the above `k_indices`
      k_nearest_labels = [self.y_train[i] for i in k_indices]

      # Calculate the prediction using "plurality voting"
      preds = [*preds, Counter(k_nearest_labels).most_common()[0][0]]
    return np.array(preds)

__init__(*, n_neighbors=3, p=2, metric='euclidean')

Initializes model's hyperparameters.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use. By default 3.

3
p int

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

2
metric str

Metric to use for distance computation. Default is euclidean.

'euclidean'
Source code in ai/neighbors/knn.py
def __init__(
  self, *, n_neighbors: int = 3, p: int = 2, metric: str = 'euclidean'
):
  """Initializes model's hyperparameters.

  Args:
    n_neighbors: Number of neighbors to use. By default `3`.
    p: Power parameter for the Minkowski metric. When `p = 1`, this is
      equivalent to using `manhattan_distance (l1)`, and
      `euclidean_distance (l2)` for `p = 2`. For arbitrary `p`,
      `minkowski_distance (l_p)` is used.
    metric: Metric to use for distance computation. Default is `euclidean`.
  """
  self._n_neighbors = n_neighbors
  self._p = p
  self._metric = metric
  self._is_fitted = False

  self._check_if_parameters_comply_to_constraints(metric=self._metric)
  super().__init__(self._metric, self._p)

fit(X, y)

Fit the k-nearest neighbors classifier from the training dataset.

Parameters:

Name Type Description Default
X ndarray

Sample vector.

required
y ndarray

Target vector.

required

Returns:

Type Description
None

The fitted KNeighborsClassifier.

Source code in ai/neighbors/knn.py
def fit(self, X: np.ndarray, y: np.ndarray) -> None:
  """Fit the k-nearest neighbors classifier from the training dataset.

  Args:
    X: Sample vector.
    y: Target vector.

  Returns:
    The fitted `KNeighborsClassifier`.
  """
  self._X = X
  self._y = y
  self._is_fitted = True
  return self

predict(X)

Predict the class labels for the provided data.

Parameters:

Name Type Description Default
X ndarray

Test samples.

required

Returns:

Type Description
ndarray

Class label for each data sample.

Raises:

Type Description
RuntimeError

If predict method is called before fit.

Source code in ai/neighbors/knn.py
def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict the class labels for the provided data.

  Args:
    X: Test samples.

  Returns:
    Class label for each data sample.

  Raises:
    RuntimeError: If predict method is called before fit.
  """
  if self._is_fitted is False:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  preds = []
  for x in X:
    # Compute the distance of the new point `x` from all the points in the
    # training set using the initialized distance metric
    distances = [self.distance(x, x_train) for x_train in self._X]

    # Extract all `_n_neighbors` distances from a sorted list of `distances`
    k_indices = np.argsort(distances)[:self._n_neighbors]
    # Finally extract the labels using the above `k_indices`
    k_nearest_labels = [self.y_train[i] for i in k_indices]

    # Calculate the prediction using "plurality voting"
    preds = [*preds, Counter(k_nearest_labels).most_common()[0][0]]
  return np.array(preds)

knn

DistanceMetric

Distance metrices for computing k-nearest neighbors.

These distance metrices can be used for computing the distances between two np.ndarray and not just for kNeighborsClassifier. This DistanceMetric class supports Euclidean, Minkowski, Manhattan, and Hamming distance metrices to compute the distance between two data points.

Parameters:

Name Type Description Default
metric str

The metric to use for computing distance. Default minkowski.

'minkowski'
minkowski_p int

Power parameter for the Minkowski metric.

2
Source code in ai/neighbors/knn.py
class DistanceMetric:
  """Distance metrices for computing k-nearest neighbors.

  These distance metrices can be used for computing the distances between two
  `np.ndarray` and not just for `kNeighborsClassifier`. This `DistanceMetric`
  class supports `Euclidean`, `Minkowski`, `Manhattan`, and `Hamming` distance
  metrices to compute the distance between two data points.

  Args:
    metric: The metric to use for computing distance. Default `minkowski`.
    minkowski_p: Power parameter for the Minkowski metric.
  """
  _distance_func_cache = None

  def __init__(self, metric: str = 'minkowski', minkowski_p: int = 2):
    """Initializes metric parameters.

    Args:
      metric: The metric to use for computing distance. Default `minkowski`.
      minkowski_p: Power parameter for the Minkowski metric.
    """
    self._metric = metric
    self._minkowski_p = minkowski_p

  def euclidean(
    self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
  ) -> Union[np.float32, np.ndarray]:
    """Euclidean distance.

    This is the most commonly used distance measure, and it is limited to
    real-valued vectors. Using the below formula, it measures a straight line
    between the query point and the other point being measured.

    $$\\mathrm{Euclidean\\ Distance} =
    \\sqrt{\\sum_{i=1}^{n}(y_{i} - x_{i})^2}$$

    Args:
      x1: Query point vector.
      x2: Other point vector.

    Returns:
      Measured distance point vector.
    """
    return np.sqrt(np.sum(np.power((x1 - x2)), 2))

  def minkowski(
    self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
  ) -> Union[np.float32, np.ndarray]:
    """Minkowski distance.

    This distance measure is the generalized form of Euclidean and Manhattan
    distance metrics. The parameter, p, in the formula below, allows for the
    creation of other distance metrics. Euclidean distance is represented by
    this formula when p is equal to two, and Manhattan distance is denoted with
    p equal to one.

    $$\\mathrm{Minkowski\\ Distance} =
    ((\\sum_{i=1}^{n}|x_{i} - y_{i}|) ^ \\dfrac{1}{p})$$

    Args:
      x1: Query point vector.
      x2: Other point vector.

    Returns:
      Measured distance point vector.
    """
    return np.power(np.sum(np.absolute((x1 - x2))), 1 / self._minkowski_p)

  def manhattan(
    self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
  ) -> Union[np.float32, np.ndarray]:
    """Manhattan distance.

    This is also another popular distance metric, which measures the absolute
    value between two points. It is also referred to as taxicab distance or city
    block distance as it is commonly visualized with a grid, illustrating how
    one might navigate from one address to another via city streets.

    $$\\mathrm{Manhattan\\ Distance} = \\sum_{i=1}^{m}|x_{i} - y_{i}|$$

    Args:
      x1: Query point vector.
      x2: Other point vector.

    Returns:
      Measured distance point vector.
    """
    return np.sum(np.absolute((x1 - x2)))

  def hamming(
    self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
  ) -> Union[np.float32, np.ndarray]:
    """Hamming distance.

    This technique is typically used with Boolean or string vectors, identifying
    the points where the vectors do not match. As a result, it has also been
    referred to as the overlap metric. This can be represented with the
    following formula:

    $$\\mathrm{Hamming\\ Distance} = \\sum_{i=1}^{k}|x_{i} - y_{i}|$$

    Args:
      x1: Query point vector.
      x2: Other point vector.

    Returns:
      Measured distance point vector.
    """
    return np.sum(np.absolute((x1 - x2)))

  def distance(
    self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
  ) -> Union[np.float32, np.ndarray]:
    """Distance function that uses one of `euclidean`, `minkowski`, `manhattan`,
    or `hamming` distance metric to measure distance between the given points.

    Args:
      x1: Query point vector.
      x2: Other point vector.

    Returns:
      Measured distance point vector.
    """
    if self._distance_func_cache is not None:
      return self._distance_func_cache(x1, x2)

    if self._metric == 'euclidean':
      self._distance_func_cache = self.euclidean
    elif self._metric == 'minkowski':
      self._distance_func_cache = self.minkowski
    elif self._metric == 'manhattan':
      self._distance_func_cache = self.manhattan
    elif self._metric == 'hamming':
      self._distance_func_cache = self.hamming
    else:
      raise RuntimeError(
        (
          f'{self.__class__.__name__}: {self._metric} is not one of '
          '["euclidean", "minkowski", "manhattan", "hamming"]'
        )
      )
    return self._distance_func_cache(x1, x2)
__init__(metric='minkowski', minkowski_p=2)

Initializes metric parameters.

Parameters:

Name Type Description Default
metric str

The metric to use for computing distance. Default minkowski.

'minkowski'
minkowski_p int

Power parameter for the Minkowski metric.

2
Source code in ai/neighbors/knn.py
def __init__(self, metric: str = 'minkowski', minkowski_p: int = 2):
  """Initializes metric parameters.

  Args:
    metric: The metric to use for computing distance. Default `minkowski`.
    minkowski_p: Power parameter for the Minkowski metric.
  """
  self._metric = metric
  self._minkowski_p = minkowski_p
distance(x1, x2)

Distance function that uses one of euclidean, minkowski, manhattan, or hamming distance metric to measure distance between the given points.

Parameters:

Name Type Description Default
x1 Union[float32, ndarray]

Query point vector.

required
x2 Union[float32, ndarray]

Other point vector.

required

Returns:

Type Description
Union[float32, ndarray]

Measured distance point vector.

Source code in ai/neighbors/knn.py
def distance(
  self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
) -> Union[np.float32, np.ndarray]:
  """Distance function that uses one of `euclidean`, `minkowski`, `manhattan`,
  or `hamming` distance metric to measure distance between the given points.

  Args:
    x1: Query point vector.
    x2: Other point vector.

  Returns:
    Measured distance point vector.
  """
  if self._distance_func_cache is not None:
    return self._distance_func_cache(x1, x2)

  if self._metric == 'euclidean':
    self._distance_func_cache = self.euclidean
  elif self._metric == 'minkowski':
    self._distance_func_cache = self.minkowski
  elif self._metric == 'manhattan':
    self._distance_func_cache = self.manhattan
  elif self._metric == 'hamming':
    self._distance_func_cache = self.hamming
  else:
    raise RuntimeError(
      (
        f'{self.__class__.__name__}: {self._metric} is not one of '
        '["euclidean", "minkowski", "manhattan", "hamming"]'
      )
    )
  return self._distance_func_cache(x1, x2)
euclidean(x1, x2)

Euclidean distance.

This is the most commonly used distance measure, and it is limited to real-valued vectors. Using the below formula, it measures a straight line between the query point and the other point being measured.

\[\mathrm{Euclidean\ Distance} = \sqrt{\sum_{i=1}^{n}(y_{i} - x_{i})^2}\]

Parameters:

Name Type Description Default
x1 Union[float32, ndarray]

Query point vector.

required
x2 Union[float32, ndarray]

Other point vector.

required

Returns:

Type Description
Union[float32, ndarray]

Measured distance point vector.

Source code in ai/neighbors/knn.py
def euclidean(
  self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
) -> Union[np.float32, np.ndarray]:
  """Euclidean distance.

  This is the most commonly used distance measure, and it is limited to
  real-valued vectors. Using the below formula, it measures a straight line
  between the query point and the other point being measured.

  $$\\mathrm{Euclidean\\ Distance} =
  \\sqrt{\\sum_{i=1}^{n}(y_{i} - x_{i})^2}$$

  Args:
    x1: Query point vector.
    x2: Other point vector.

  Returns:
    Measured distance point vector.
  """
  return np.sqrt(np.sum(np.power((x1 - x2)), 2))
hamming(x1, x2)

Hamming distance.

This technique is typically used with Boolean or string vectors, identifying the points where the vectors do not match. As a result, it has also been referred to as the overlap metric. This can be represented with the following formula:

\[\mathrm{Hamming\ Distance} = \sum_{i=1}^{k}|x_{i} - y_{i}|\]

Parameters:

Name Type Description Default
x1 Union[float32, ndarray]

Query point vector.

required
x2 Union[float32, ndarray]

Other point vector.

required

Returns:

Type Description
Union[float32, ndarray]

Measured distance point vector.

Source code in ai/neighbors/knn.py
def hamming(
  self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
) -> Union[np.float32, np.ndarray]:
  """Hamming distance.

  This technique is typically used with Boolean or string vectors, identifying
  the points where the vectors do not match. As a result, it has also been
  referred to as the overlap metric. This can be represented with the
  following formula:

  $$\\mathrm{Hamming\\ Distance} = \\sum_{i=1}^{k}|x_{i} - y_{i}|$$

  Args:
    x1: Query point vector.
    x2: Other point vector.

  Returns:
    Measured distance point vector.
  """
  return np.sum(np.absolute((x1 - x2)))
manhattan(x1, x2)

Manhattan distance.

This is also another popular distance metric, which measures the absolute value between two points. It is also referred to as taxicab distance or city block distance as it is commonly visualized with a grid, illustrating how one might navigate from one address to another via city streets.

\[\mathrm{Manhattan\ Distance} = \sum_{i=1}^{m}|x_{i} - y_{i}|\]

Parameters:

Name Type Description Default
x1 Union[float32, ndarray]

Query point vector.

required
x2 Union[float32, ndarray]

Other point vector.

required

Returns:

Type Description
Union[float32, ndarray]

Measured distance point vector.

Source code in ai/neighbors/knn.py
def manhattan(
  self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
) -> Union[np.float32, np.ndarray]:
  """Manhattan distance.

  This is also another popular distance metric, which measures the absolute
  value between two points. It is also referred to as taxicab distance or city
  block distance as it is commonly visualized with a grid, illustrating how
  one might navigate from one address to another via city streets.

  $$\\mathrm{Manhattan\\ Distance} = \\sum_{i=1}^{m}|x_{i} - y_{i}|$$

  Args:
    x1: Query point vector.
    x2: Other point vector.

  Returns:
    Measured distance point vector.
  """
  return np.sum(np.absolute((x1 - x2)))
minkowski(x1, x2)

Minkowski distance.

This distance measure is the generalized form of Euclidean and Manhattan distance metrics. The parameter, p, in the formula below, allows for the creation of other distance metrics. Euclidean distance is represented by this formula when p is equal to two, and Manhattan distance is denoted with p equal to one.

\[\mathrm{Minkowski\ Distance} = ((\sum_{i=1}^{n}|x_{i} - y_{i}|) ^ \dfrac{1}{p})\]

Parameters:

Name Type Description Default
x1 Union[float32, ndarray]

Query point vector.

required
x2 Union[float32, ndarray]

Other point vector.

required

Returns:

Type Description
Union[float32, ndarray]

Measured distance point vector.

Source code in ai/neighbors/knn.py
def minkowski(
  self, x1: Union[np.float32, np.ndarray], x2: Union[np.float32, np.ndarray]
) -> Union[np.float32, np.ndarray]:
  """Minkowski distance.

  This distance measure is the generalized form of Euclidean and Manhattan
  distance metrics. The parameter, p, in the formula below, allows for the
  creation of other distance metrics. Euclidean distance is represented by
  this formula when p is equal to two, and Manhattan distance is denoted with
  p equal to one.

  $$\\mathrm{Minkowski\\ Distance} =
  ((\\sum_{i=1}^{n}|x_{i} - y_{i}|) ^ \\dfrac{1}{p})$$

  Args:
    x1: Query point vector.
    x2: Other point vector.

  Returns:
    Measured distance point vector.
  """
  return np.power(np.sum(np.absolute((x1 - x2))), 1 / self._minkowski_p)

KNeighborsClassifier

Bases: DistanceMetric

Classifier implementing the k-nearest neighbors vote.

The k-nearest neighbors algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. k-nearest neighbors algorithm is typically used as a classification algorithm, working off the assumption that similar points can be found near one another.

For classification problems, a class label is assigned on the basis of a majority vote - i.e., the label that is most frequently represented around a give data point is used.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use. By default 3.

3
p int

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

2
metric str

Metric to use for distance computation. Default is euclidean.

'euclidean'
Source code in ai/neighbors/knn.py
class KNeighborsClassifier(DistanceMetric):
  """Classifier implementing the k-nearest neighbors vote.

  The k-nearest neighbors algorithm is a non-parametric, supervised learning
  classifier, which uses proximity to make classifications or predictions about
  the grouping of an individual data point. k-nearest neighbors algorithm is
  typically used as a classification algorithm, working off the assumption that
  similar points can be found near one another.

  For classification problems, a class label is assigned on the basis of a
  majority vote - i.e., the label that is most frequently represented around a
  give data point is used.

  Args:
    n_neighbors: Number of neighbors to use. By default `3`.
    p: Power parameter for the Minkowski metric. When `p = 1`, this is
      equivalent to using `manhattan_distance (l1)`, and
      `euclidean_distance (l2)` for
      `p = 2`. For arbitrary `p`, `minkowski_distance (l_p)` is used.
    metric: Metric to use for distance computation. Default is `euclidean`.
  """
  _parameter_constraints: dict = {
    'metric': [
      ('euclidean', 'supported'), ('minkowski', 'not-supported'),
      ('manhattan', 'not-supported'), ('hamming', 'not-supported')
    ]
  }

  @staticmethod
  def _check_if_parameters_comply_to_constraints(**kwargs: dict) -> None:
    """Private static method to ensure the compatibility of the hyperparameters
    passed to the `KNeighborsClassifier`.

    Args:
      kwargs: Passed hyperparameters.

    Raises:
      ValueError: If any hyperparameter is not compatible.
    """
    for (metric_name, metric_status
         ) in KNeighborsClassifier._parameter_constraints['metric']:
      if kwargs['metric'] == metric_name:
        if metric_status != 'supported':
          raise ValueError(
            f'distance metric {metric_name} is not supported yet'
          )
        break

  def __init__(
    self, *, n_neighbors: int = 3, p: int = 2, metric: str = 'euclidean'
  ):
    """Initializes model's hyperparameters.

    Args:
      n_neighbors: Number of neighbors to use. By default `3`.
      p: Power parameter for the Minkowski metric. When `p = 1`, this is
        equivalent to using `manhattan_distance (l1)`, and
        `euclidean_distance (l2)` for `p = 2`. For arbitrary `p`,
        `minkowski_distance (l_p)` is used.
      metric: Metric to use for distance computation. Default is `euclidean`.
    """
    self._n_neighbors = n_neighbors
    self._p = p
    self._metric = metric
    self._is_fitted = False

    self._check_if_parameters_comply_to_constraints(metric=self._metric)
    super().__init__(self._metric, self._p)

  def fit(self, X: np.ndarray, y: np.ndarray) -> None:
    """Fit the k-nearest neighbors classifier from the training dataset.

    Args:
      X: Sample vector.
      y: Target vector.

    Returns:
      The fitted `KNeighborsClassifier`.
    """
    self._X = X
    self._y = y
    self._is_fitted = True
    return self

  def predict(self, X: np.ndarray) -> np.ndarray:
    """Predict the class labels for the provided data.

    Args:
      X: Test samples.

    Returns:
      Class label for each data sample.

    Raises:
      RuntimeError: If predict method is called before fit.
    """
    if self._is_fitted is False:
      raise RuntimeError(
        f'{self.__class__.__name__}: predict called before fitting data'
      )

    preds = []
    for x in X:
      # Compute the distance of the new point `x` from all the points in the
      # training set using the initialized distance metric
      distances = [self.distance(x, x_train) for x_train in self._X]

      # Extract all `_n_neighbors` distances from a sorted list of `distances`
      k_indices = np.argsort(distances)[:self._n_neighbors]
      # Finally extract the labels using the above `k_indices`
      k_nearest_labels = [self.y_train[i] for i in k_indices]

      # Calculate the prediction using "plurality voting"
      preds = [*preds, Counter(k_nearest_labels).most_common()[0][0]]
    return np.array(preds)
__init__(*, n_neighbors=3, p=2, metric='euclidean')

Initializes model's hyperparameters.

Parameters:

Name Type Description Default
n_neighbors int

Number of neighbors to use. By default 3.

3
p int

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

2
metric str

Metric to use for distance computation. Default is euclidean.

'euclidean'
Source code in ai/neighbors/knn.py
def __init__(
  self, *, n_neighbors: int = 3, p: int = 2, metric: str = 'euclidean'
):
  """Initializes model's hyperparameters.

  Args:
    n_neighbors: Number of neighbors to use. By default `3`.
    p: Power parameter for the Minkowski metric. When `p = 1`, this is
      equivalent to using `manhattan_distance (l1)`, and
      `euclidean_distance (l2)` for `p = 2`. For arbitrary `p`,
      `minkowski_distance (l_p)` is used.
    metric: Metric to use for distance computation. Default is `euclidean`.
  """
  self._n_neighbors = n_neighbors
  self._p = p
  self._metric = metric
  self._is_fitted = False

  self._check_if_parameters_comply_to_constraints(metric=self._metric)
  super().__init__(self._metric, self._p)
fit(X, y)

Fit the k-nearest neighbors classifier from the training dataset.

Parameters:

Name Type Description Default
X ndarray

Sample vector.

required
y ndarray

Target vector.

required

Returns:

Type Description
None

The fitted KNeighborsClassifier.

Source code in ai/neighbors/knn.py
def fit(self, X: np.ndarray, y: np.ndarray) -> None:
  """Fit the k-nearest neighbors classifier from the training dataset.

  Args:
    X: Sample vector.
    y: Target vector.

  Returns:
    The fitted `KNeighborsClassifier`.
  """
  self._X = X
  self._y = y
  self._is_fitted = True
  return self
predict(X)

Predict the class labels for the provided data.

Parameters:

Name Type Description Default
X ndarray

Test samples.

required

Returns:

Type Description
ndarray

Class label for each data sample.

Raises:

Type Description
RuntimeError

If predict method is called before fit.

Source code in ai/neighbors/knn.py
def predict(self, X: np.ndarray) -> np.ndarray:
  """Predict the class labels for the provided data.

  Args:
    X: Test samples.

  Returns:
    Class label for each data sample.

  Raises:
    RuntimeError: If predict method is called before fit.
  """
  if self._is_fitted is False:
    raise RuntimeError(
      f'{self.__class__.__name__}: predict called before fitting data'
    )

  preds = []
  for x in X:
    # Compute the distance of the new point `x` from all the points in the
    # training set using the initialized distance metric
    distances = [self.distance(x, x_train) for x_train in self._X]

    # Extract all `_n_neighbors` distances from a sorted list of `distances`
    k_indices = np.argsort(distances)[:self._n_neighbors]
    # Finally extract the labels using the above `k_indices`
    k_nearest_labels = [self.y_train[i] for i in k_indices]

    # Calculate the prediction using "plurality voting"
    preds = [*preds, Counter(k_nearest_labels).most_common()[0][0]]
  return np.array(preds)