Skip to content

stats

ai.stats

Statistical routines for numpy arrays.

ai.stats implements variety of statistical methods for computing stats of numpy arrays:

Averages and Variances
  • ai.stats.mean
  • ai.stats.median
  • ai.stats.std
  • ai.stats.var
  • ai.stats.zscore
  • ai.stats.varcoef
Correlating
  • ai.stats.cov
  • ai.stats.corrcoef

corrcoef(a, b, /, *, ddof=1, axis=None)

Source code in ai/stats/stats.py
def corrcoef(
  a: np.ndarray,
  b: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  return _core.corrcoef(a, b, ddof=ddof, axis=axis)

cov(a, b, /, *, ddof=1, axis=None)

Compute the relationship between a and b along the specified axis.

Covariance measures the total variation of two random variables from their expected values.

\[cov_{x,y} = \dfrac{\sum_{i=1}^{N}(x_{i}-\bar x)(y_{i}-\bar y)}{N-1}\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the covariance is computed. The default is to compute the covariance of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the covariance.

Note:

  • Positive covariance indicates that the two variables tend to move in the same direction.
  • Negative covariance indicates that the two variables tend to move in inverse direction.

Example:

>>> import numpy as np
>>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
>>> x
array([[0.62809713, 0.81040891, 0.16158262],
[0.82474163, 0.08633899, 0.60068869],
[0.55120899, 0.31197217, 0.05694431]])
>>> y
array([[0.31343184, 0.54189237, 0.5759936 ],
[0.47156163, 0.07193879, 0.88730511],
[0.673533  , 0.28599424, 0.90187499]])
>>> from ai import stats
>>> stats.cov(x, y)
array([-0.99412023])
>>> stats.cov(x, y, axis=0)
array([-1.0114404 , -0.93096473, -1.01969051])
>>> stats.cov(x, y, axis=1)
array([-1.00575977, -0.94265263, -0.98948036])
Source code in ai/stats/stats.py
def cov(
  a: np.ndarray,
  b: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the relationship between `a` and `b` along the specified axis.

  Covariance measures the total variation of two random variables from their
  expected values.

  $$cov_{x,y} = \\dfrac{\\sum_{i=1}^{N}(x_{i}-\\bar x)(y_{i}-\\bar y)}{N-1}$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the covariance is computed. The default is to compute
      the covariance of the flattened array.

  Returns:
    A new array containing the covariance.

  Note:

    * Positive covariance indicates that the two variables tend to move in the
      same direction.
    * Negative covariance indicates that the two variables tend to move in
      inverse direction.

  Example:

      >>> import numpy as np
      >>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
      >>> x
      array([[0.62809713, 0.81040891, 0.16158262],
      [0.82474163, 0.08633899, 0.60068869],
      [0.55120899, 0.31197217, 0.05694431]])
      >>> y
      array([[0.31343184, 0.54189237, 0.5759936 ],
      [0.47156163, 0.07193879, 0.88730511],
      [0.673533  , 0.28599424, 0.90187499]])
      >>> from ai import stats
      >>> stats.cov(x, y)
      array([-0.99412023])
      >>> stats.cov(x, y, axis=0)
      array([-1.0114404 , -0.93096473, -1.01969051])
      >>> stats.cov(x, y, axis=1)
      array([-1.00575977, -0.94265263, -0.98948036])

  """
  return _core.cov(a, b, ddof=ddof, axis=axis)

mean(a, /, *, axis=None)

Compute the mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.

\[\bar x = \dfrac{\sum_{i=1}^{n}x_{i}}{n}\]

Parameters:

Name Type Description Default
a ndarray

A 2-D numpy vector.

required
axis Optional[int]

A integer value specifying along which axis to calculate the mean.

None

Returns:

Type Description
ndarray

The mean for each vector.

Note:

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.mean(a)
array([2.5])
>>> stats.mean(a, axis=0)
array([1.5, 3.5])
>>> stats.mean(a, axis=1)
array([2., 3.])

In single precision, `mean` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.mean(a)
array([0.54999924])
Source code in ai/stats/stats.py
def mean(a: np.ndarray, /, *, axis: Optional[int] = None) -> np.ndarray:
  """Compute the mean along the specified axis.

  Returns the average of the array elements. The average is taken over the
  flattened array by default, otherwise over the specified axis.

  $$\\bar x = \\dfrac{\\sum_{i=1}^{n}x_{i}}{n}$$

  Args:
    a: A 2-D numpy vector.
    axis: A integer value specifying along which axis to calculate the mean.

  Returns:
    The mean for each vector.

  Raises:
    `ValueError` if `a` has more than 1 dimension.

  Note:

    The arithmetic mean is the sum of the elements along the axis divided by the
    number of elements.

    Note that for floating-point input, the mean is computed using the same
    precision the input has.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.mean(a)
      array([2.5])
      >>> stats.mean(a, axis=0)
      array([1.5, 3.5])
      >>> stats.mean(a, axis=1)
      array([2., 3.])

      In single precision, `mean` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.mean(a)
      array([0.54999924])

  """
  return _core.mean(a, axis=axis)

median(a, /, *, axis=None)

Compute the median along the specified axis.

The median is often compared with other descriptive statistics such as the mean (average), mode, and std (standard deviation) and is robust to outliers.

For odd number of numbers in a dataset the median is calculated using:

\[M = \dfrac{n+1}{2}; \quad \text{odd}\]

For even number of numbers in a dataset the median is calculated using:

\[M = \dfrac{1}{2}(\dfrac{n}{2} + \dfrac{n+1}{2}); \quad \text{even}\]

Parameters:

Name Type Description Default
a ndarray

A 2-D numpy vector.

required
axis Optional[int]

A integer value specifying along which axis to calculate the mean.

None

Returns:

Type Description
ndarray

The median for each vector.

Note:

Given a vector V of length N, the median of V is the middle value of a sorted copy of V, V_sorted - i e., V_sorted[(N-1)/2], when N is odd, and the average of the two middle values of V_sorted when N is even.

Example:

>>> import numpy as np
>>> x = np.random.random((3, 3))
>>> x
array([[0.1676058 , 0.21633727, 0.12763747],
[0.36879157, 0.45505013, 0.06045118],
[0.88213891, 0.95437981, 0.61791297]])
>>> from ai import stats
>>> stats.median(x)
array([0.36879157])
>>> stats.median(x, axis=0)
array([0.1676058 , 0.36879157, 0.88213891])
>>> stats.median(x, axis=1)
array([0.36879157, 0.45505013, 0.12763747])
Source code in ai/stats/stats.py
def median(a: np.ndarray, /, *, axis: Optional[int] = None) -> np.ndarray:
  """Compute the median along the specified axis.

  The median is often compared with other descriptive statistics such as the
  `mean` (average), `mode`, and `std` (standard deviation) and is robust to
  outliers.

  For odd number of numbers in a dataset the median is calculated using:

  $$M = \\dfrac{n+1}{2}; \\quad \\text{odd}$$

  For even number of numbers in a dataset the median is calculated using:

  $$M = \\dfrac{1}{2}(\\dfrac{n}{2} + \\dfrac{n+1}{2}); \\quad \\text{even}$$


  Args:
    a: A 2-D numpy vector.
    axis: A integer value specifying along which axis to calculate the mean.

  Returns:
    The median for each vector.

  Note:

    Given a vector `V` of length `N`, the median of `V` is the middle value of a
    sorted copy of `V`, `V_sorted` - i e., `V_sorted[(N-1)/2]`, when `N` is odd,
    and the average of the two middle values of `V_sorted` when `N` is even.

  Example:

      >>> import numpy as np
      >>> x = np.random.random((3, 3))
      >>> x
      array([[0.1676058 , 0.21633727, 0.12763747],
      [0.36879157, 0.45505013, 0.06045118],
      [0.88213891, 0.95437981, 0.61791297]])
      >>> from ai import stats
      >>> stats.median(x)
      array([0.36879157])
      >>> stats.median(x, axis=0)
      array([0.1676058 , 0.36879157, 0.88213891])
      >>> stats.median(x, axis=1)
      array([0.36879157, 0.45505013, 0.12763747])

  """
  return _core.median(a, axis=axis)

std(a, /, *, ddof=1, axis=None)

Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

\[s = \sqrt{\dfrac{\sum_{i=1}^{n}(x_{i} - \bar x)^{2}}{n - 1}}\]

Parameters:

Name Type Description Default
a ndarray

Calculate the standard deviation of these values.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

None

Returns:

Type Description
ndarray

Return a new array containing the standard deviation.

Note:

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)), where x = abs(a - a.mean())**2.

The average squared deviation is typically calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.std(a)
array([1.29099445])
>>> stats.std(a, axis=0)
array([0.70710678, 0.70710678])
>>> stats.std(a, axis=1)
array([1.41421356, 1.41421356])

In single precision, `std` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.std(a)
array([0.45000043])
Source code in ai/stats/stats.py
def std(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the standard deviation along the specified axis.

  Returns the standard deviation, a measure of the spread of a distribution, of
  the array elements. The standard deviation is computed for the flattened array
  by default, otherwise over the specified axis.

  $$s = \\sqrt{\\dfrac{\\sum_{i=1}^{n}(x_{i} - \\bar x)^{2}}{n - 1}}$$

  Args:
    a: Calculate the standard deviation of these values.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the standard deviation is computed. The default is to
      compute the standard deviation of the flattened array.

  Returns:
    Return a new array containing the standard deviation.

  Note:

    The standard deviation is the square root of the average of the squared
    deviations from the `mean`, i.e., `std = sqrt(mean(x))`, where
    `x = abs(a - a.mean())**2`.

    The average squared deviation is typically calculated as `x.sum() / N`,
    where `N = len(x)`. If, however, `ddof` is specified, the divisor `N - ddof`
    is used instead. In standard statistical practice, `ddof=1` provides an
    unbiased estimator of the variance of the infinite population. `ddof=0`
    provides a maximum likelihood estimate of the variance for normally
    distributed variables. The standard deviation computed in this function is
    the square root of the estimated variance, so even with `ddof=1`, it will
    not be an unbiased estimate of the standard deviation per se.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.std(a)
      array([1.29099445])
      >>> stats.std(a, axis=0)
      array([0.70710678, 0.70710678])
      >>> stats.std(a, axis=1)
      array([1.41421356, 1.41421356])

      In single precision, `std` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.std(a)
      array([0.45000043])

  """
  return _core.std(a, ddof=ddof, axis=axis)

var(a, /, *, ddof=1, axis=None)

Compute the variance along the specified axis.

Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

Parameters:

Name Type Description Default
a ndarray

Array containing numbers whose variance is desired.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the variance is computed. The default is to compute the variance of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the variance.

\[s^{2} = \dfrac{\sum_{i=1}^{n}(x_{i} - \bar x)^{2}}{n - 1}\]

Note:

The variance is the average of the squared deviations from the mean, i.e., var = mean(x), where x = abs(a - a.mean())**2.

The mean is typically calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.var(a)
array([1.66666667])
>>> stats.var(a, axis=0)
array([0.5, 0.5])
>>> stats.var(a, axis=1)
array([2., 2.])

In single precision, `var` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.var(a)
array([0.20250039])
Source code in ai/stats/stats.py
def var(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the variance along the specified axis.

  Returns the variance of the array elements, a measure of the spread of a
  distribution. The variance is computed for the flattened array by default,
  otherwise over the specified axis.

  Args:
    a: Array containing numbers whose variance is desired.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the variance is computed. The default is to compute
      the variance of the flattened array.

  Returns:
    A new array containing the variance.

  $$s^{2} = \\dfrac{\\sum_{i=1}^{n}(x_{i} - \\bar x)^{2}}{n - 1}$$

  Note:

    The variance is the average of the squared deviations from the `mean`, i.e.,
    `var = mean(x)`, where `x = abs(a - a.mean())**2`.

    The mean is typically calculated as `x.sum() / N`, where `N = len(x)`. If,
    however, `ddof` is specified, the divisor `N - ddof` is used instead. In
    standard statistical practice, `ddof=1` provides an unbiased estimator of
    the variance of a hypothetical infinite population. `ddof=0` provides a
    maximum likelihood estimate of the variance for normally distributed
    variables.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.var(a)
      array([1.66666667])
      >>> stats.var(a, axis=0)
      array([0.5, 0.5])
      >>> stats.var(a, axis=1)
      array([2., 2.])

      In single precision, `var` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.var(a)
      array([0.20250039])

  """
  return _core.var(a, ddof=ddof, axis=axis)

varcoef(a, /, *, ddof=1, axis=None)

Compute the coefficients of variation along the specified axis.

The coefficient of variation (CV) is the ratio of the standard deviation to the mean. The higher the coefficient of variation, the greater the level of dispersion around the mean.

\[cv = \dfrac{s}{\bar x} * 100 \%\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the coefficient of variation is computed. The default is to compute the coefficient of variation of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the coefficients of variation.

Note:

The coefficient of variation is the ratio of the relative std (standard deviation) over the mean and is represented as percentage.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.varcoef(a)
array([51.63977795])
>>> stats.varcoef(a, axis=0)
array([47.14045208, 20.20305089])
>>> stats.varcoef(a, axis=1)
array([70.71067812, 47.14045208])
Source code in ai/stats/stats.py
def varcoef(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the coefficients of variation along the specified axis.

  The coefficient of variation (CV) is the ratio of the standard deviation to
  the mean. The higher the coefficient of variation, the greater the level of
  dispersion around the mean.

  $$cv = \\dfrac{s}{\\bar x} * 100 \\%$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the coefficient of variation is computed. The default
      is to compute the coefficient of variation of the flattened array.

  Returns:
    A new array containing the coefficients of variation.

  Note:

    The coefficient of variation is the ratio of the relative `std`
    (standard deviation) over the `mean` and is represented as percentage.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.varcoef(a)
      array([51.63977795])
      >>> stats.varcoef(a, axis=0)
      array([47.14045208, 20.20305089])
      >>> stats.varcoef(a, axis=1)
      array([70.71067812, 47.14045208])

  """
  return _core.varcoef(a, ddof=ddof, axis=axis)

zscore(a, /, *, ddof=1, axis=None)

Compute the z-score along the specified axis.

Compute the z score of each value in the sample, relative to the sample mean and standard deviation.

\[z = \dfrac{x - \bar x}{s}\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the z-score is computed. The default is to compute the z-score of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the z-scores.

Note:

The z-score for a data value is its difference from the relative mean divided by the relative std (standard deviation), i.e., z = x - x.mean() / x.std().

Example:

>>> import numpy as np
>>> a = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
...                0.1954,  0.6307,  0.6599,  0.1065,  0.0508])
>>> a
array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
0.1065, 0.0508])
>>> from ai import stats
>>> stats.zscore(a)
array([[ 1.06939901, -1.1830039 , -0.05258212,  1.03626165,  1.10660039,
-0.81192795,  0.5488923 ,  0.64017636, -1.08984414, -1.26397161]])

Computing along a specified axis.

>>> b = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],
...               [ 0.7149,  0.0775,  0.6072,  0.9656],
...               [ 0.6341,  0.1403,  0.9759,  0.4064],
...               [ 0.5918,  0.6948,  0.904 ,  0.3721],
...               [ 0.0921,  0.2481,  0.1188,  0.1366]])
>>> b
array([[0.3148, 0.0478, 0.6243, 0.4608],
[0.7149, 0.0775, 0.6072, 0.9656],
[0.6341, 0.1403, 0.9759, 0.4064],
[0.5918, 0.6948, 0.904 , 0.3721],
[0.0921, 0.2481, 0.1188, 0.1366]])
>>> stats.zscore(b, axis=0)
array([[-0.19264823, -1.28415119,  1.07259584,  0.40420358],
[ 0.33048416, -1.37380874,  0.04251374,  1.00081084],
[ 0.26796377, -1.12598418,  1.23283094, -0.37481053],
[-0.22095197,  0.24468594,  1.19042819, -1.21416216],
[-0.82780366,  1.4457416 , -0.43867764, -0.1792603 ]])
>>> stats.zscore(b, axis=1)
array([[-0.59710641,  0.94678835,  0.63499955,  0.47177349, -1.45645498],
[-0.73263586, -0.62041675, -0.3831319 ,  1.71200261,  0.0241819 ],
[-0.0644368 , -0.11512076,  0.97769651,  0.76458677, -1.56272573],
[-0.02464405,  1.63406492, -0.20339557, -0.31610104, -1.08992426]])
Source code in ai/stats/stats.py
def zscore(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the z-score along the specified axis.

  Compute the `z` score of each value in the sample, relative to the sample mean
  and standard deviation.

  $$z = \\dfrac{x - \\bar x}{s}$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the z-score is computed. The default is to compute
      the z-score of the flattened array.

  Returns:
    A new array containing the z-scores.

  Note:

    The `z-score` for a data value is its difference from the relative `mean`
    divided by the relative `std` (standard deviation), i.e.,
    `z = x - x.mean() / x.std()`.

  Example:

      >>> import numpy as np
      >>> a = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
      ...                0.1954,  0.6307,  0.6599,  0.1065,  0.0508])
      >>> a
      array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
      0.1065, 0.0508])
      >>> from ai import stats
      >>> stats.zscore(a)
      array([[ 1.06939901, -1.1830039 , -0.05258212,  1.03626165,  1.10660039,
      -0.81192795,  0.5488923 ,  0.64017636, -1.08984414, -1.26397161]])

      Computing along a specified axis.

      >>> b = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],
      ...               [ 0.7149,  0.0775,  0.6072,  0.9656],
      ...               [ 0.6341,  0.1403,  0.9759,  0.4064],
      ...               [ 0.5918,  0.6948,  0.904 ,  0.3721],
      ...               [ 0.0921,  0.2481,  0.1188,  0.1366]])
      >>> b
      array([[0.3148, 0.0478, 0.6243, 0.4608],
      [0.7149, 0.0775, 0.6072, 0.9656],
      [0.6341, 0.1403, 0.9759, 0.4064],
      [0.5918, 0.6948, 0.904 , 0.3721],
      [0.0921, 0.2481, 0.1188, 0.1366]])
      >>> stats.zscore(b, axis=0)
      array([[-0.19264823, -1.28415119,  1.07259584,  0.40420358],
      [ 0.33048416, -1.37380874,  0.04251374,  1.00081084],
      [ 0.26796377, -1.12598418,  1.23283094, -0.37481053],
      [-0.22095197,  0.24468594,  1.19042819, -1.21416216],
      [-0.82780366,  1.4457416 , -0.43867764, -0.1792603 ]])
      >>> stats.zscore(b, axis=1)
      array([[-0.59710641,  0.94678835,  0.63499955,  0.47177349, -1.45645498],
      [-0.73263586, -0.62041675, -0.3831319 ,  1.71200261,  0.0241819 ],
      [-0.0644368 , -0.11512076,  0.97769651,  0.76458677, -1.56272573],
      [-0.02464405,  1.63406492, -0.20339557, -0.31610104, -1.08992426]])

  """
  return _core.zscore(a, ddof=ddof, axis=axis)

_core

correlation

An implementation of Pearson's correlation coefficient algorithm.

corrcoef(x, y=None, rowvar=True, *, dtype=None)

Return Pearson product-moment correlation coefficient.

The relationship between the correlation coefficient matrix, R, and the covariance matrix, C, is

\[R_{ij} = \frac{ C_{ij} } { \sqrt{ C_{ii} C_{jj} } }\]

The values of R are between -1 and 1, inclusive.

Parameters:

Name Type Description Default
m

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

required
y ndarray

An additional set of variables and observations. y has the same form as that of m.

None
rowvar bool

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, relationship is transposed: each column represents a variable, while the rows contains observations.

True
dtype dtype

Data type of the result. By default the return data type will have at least numpy.float64 precision.

None

Returns:

Type Description
ndarray

The correlation coefficient matrix of the variables.

Source code in ai/stats/correlation.py
def corrcoef(
  x: np.ndarray,
  y: np.ndarray = None,
  rowvar: bool = True,
  *,
  dtype: np.dtype = None
) -> np.ndarray:
  """Return Pearson product-moment correlation coefficient.

  The relationship between the correlation coefficient matrix, `R`, and the
  covariance matrix, `C`, is

  $$R_{ij} = \\frac{ C_{ij} } { \\sqrt{ C_{ii} C_{jj} } }$$

  The values of `R` are between -1 and 1, inclusive.

  Args:
    m: A 1-D or 2-D array containing multiple variables and observations. Each
      row of `m` represents a variable, and each column a single observation of
      all those variables. Also see `rowvar` below.
    y: An additional set of variables and observations. `y` has the same form as
      that of `m`.
    rowvar: If `rowvar` is True (default), then each row represents a variable,
      with observations in the columns. Otherwise, relationship is transposed:
      each column represents a variable, while the rows contains observations.
    dtype: Data type of the result. By default the return data type will have at
      least `numpy.float64` precision.

  Returns:
    The correlation coefficient matrix of the variables.
  """
  c = cov(x, y, rowvar, dtype=dtype)
  try:
    d = np.diag(c)
  except ValueError:
    # Scalar covariance; NaN if incorrect value (NaN, Inf, 0), 1 otherwise.
    return c / c
  stddev = np.sqrt(d.real)
  c /= stddev[:, None]
  c /= stddev[None, :]

  # Clip real and imaginary parts to [-1, 1]. This does not guarantee
  # abs([i,j]) <= 1 for complex arrays, but is the best we can do without
  # excessive work.
  np.clip(c.real, -1, 1, out=c.real)
  if np.iscomplexobj(c):
    np.clip(c.imag, -1, 1, out=c.imag)
  return c

cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)

Estimate a covariance matrix, given data and weights.

Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, ... x_N]^T\), then the covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\).

Parameters:

Name Type Description Default
m ndarray

A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.

required
y ndarray

An additional set of variables and observations. y has the same form as that of m.

None
rowvar bool

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, relationship is transposed: each column represents a variable, while the rows contains observations.

True
bias bool

Default normalization (False) is by (N - 1), where N is the number of observations given (unbiased estimate). If bias is True, then normalization is by N. These values can be overriden by using the keyword ddof.

False
ddof int

If not None then the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified, and ddof=0 will return the simple average.

None
fweights ndarray

1-D array of integer frequency weight; the number of times each observation vector should be repeated.

None
aweights ndarray

1-D array of observation vector weights. These relative weights are typically large for observations considered "important" and smaller for observations considered less "important". If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

None
dtype dtype

Data type of the result. By default the return data type will have at least numpy.float64 precision.

None

Returns:

Type Description
ndarray

The covariance matrix of the variables.

Note:

Assume that the observations are in the columns of the observation array
`m` and let ``f=fweights`` and ``a=aweights`` for brevity. The steps to
compute the weighted covariance are as follows:

>>> m = np.arange(10, dtype=np.float64)
>>> f = np.arange(10) * 2
>>> a = np.arange(10) ** 2
>>> ddof = 1
>>> w = f * a
>>> v1 = np.sum(w)
>>> v2 = np.sum(w * a)
>>> m -= np.sum(m * w, axis=None, keepdims=True) / v1
>>> cov = np.dot(m * w, m.T) * v1 / ((v1 ** 2) - (ddof * v2))

Note that when ``a == 1``, the normalization factor
``v1 / (v1**2 - ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)`` as it
should.
Source code in ai/stats/correlation.py
def cov(  # pylint: disable=too-many-positional-arguments
  m: np.ndarray,
  y: np.ndarray = None,
  rowvar: bool = True,
  bias: bool = False,
  ddof: int = None,
  fweights: np.ndarray = None,
  aweights: np.ndarray = None,
  *,
  dtype: np.dtype = None
) -> np.ndarray:
  """Estimate a covariance matrix, given data and weights.

  Covariance indicates the level to which two variables vary together.
  If we examine N-dimensional samples, $X = [x_1, x_2, ... x_N]^T$, then
  the covariance matrix element $C_{ij}$ is the covariance of $x_i$
  and $x_j$. The element $C_{ii}$ is the variance of $x_i$.

  Args:
    m: A 1-D or 2-D array containing multiple variables and observations. Each
      row of `m` represents a variable, and each column a single observation of
      all those variables. Also see `rowvar` below.
    y: An additional set of variables and observations. `y` has the same form as
      that of `m`.
    rowvar: If `rowvar` is True (default), then each row represents a variable,
      with observations in the columns. Otherwise, relationship is transposed:
      each column represents a variable, while the rows contains observations.
    bias: Default normalization (False) is by ``(N - 1)``, where ``N`` is the
      number of observations given (unbiased estimate). If `bias` is True, then
      normalization is by ``N``. These values can be overriden by using the
      keyword ``ddof``.
    ddof: If not ``None`` then the default value implied by `bias` is
      overridden. Note that ``ddof=1`` will return the unbiased estimate, even
      if both `fweights` and `aweights` are specified, and ``ddof=0`` will
      return the simple average.
    fweights: 1-D array of integer frequency weight; the number of times each
      observation vector should be repeated.
    aweights: 1-D array of observation vector weights. These relative weights
      are typically large for observations considered "important" and smaller
      for observations considered less "important". If ``ddof=0`` the array of
      weights can be used to assign probabilities to observation vectors.
    dtype: Data type of the result. By default the return data type will have at
      least `numpy.float64` precision.

  Returns:
    The covariance matrix of the variables.

  Note:

      Assume that the observations are in the columns of the observation array
      `m` and let ``f=fweights`` and ``a=aweights`` for brevity. The steps to
      compute the weighted covariance are as follows:

      >>> m = np.arange(10, dtype=np.float64)
      >>> f = np.arange(10) * 2
      >>> a = np.arange(10) ** 2
      >>> ddof = 1
      >>> w = f * a
      >>> v1 = np.sum(w)
      >>> v2 = np.sum(w * a)
      >>> m -= np.sum(m * w, axis=None, keepdims=True) / v1
      >>> cov = np.dot(m * w, m.T) * v1 / ((v1 ** 2) - (ddof * v2))

      Note that when ``a == 1``, the normalization factor
      ``v1 / (v1**2 - ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)`` as it
      should.
  """
  if ddof is not None and ddof != int(ddof):
    raise ValueError('ddof must be integer')

  # Handling complex arrays too.
  m = np.asarray(m)
  if m.ndim > 2:
    raise ValueError('m has more than 2 dimensions')

  if y is not None:
    y = np.asarray(y)
    if y.ndim > 2:
      raise ValueError('y has more than 2 dimensions')

  if dtype is None:
    if y is None:
      dtype = np.result_type(m, np.float64)
    else:
      dtype = np.result_type(m, y, np.float64)

  x = np.array(m, ndmin=2, dtype=dtype)
  if not rowvar and x.shape[0] != 1:
    x = x.T
  if x.shape[0] == 0:
    return np.array([]).reshape(0, 0)
  if y is not None:
    y = np.array(y, copy=False, ndmin=2, dtype=dtype)
    if not rowvar and y.shape[0] != 1:
      y = y.T
    x = np.concatenate((x, y), axis=0)

  if ddof is None:
    if bias == 0:
      ddof = 1
    else:
      ddof = 0

  # Get the product of frequencies and weights.
  w = None
  if fweights is not None:
    fweights = np.asarray(fweights, dtype=float)
    if not np.all(fweights == np.around(fweights)):
      raise TypeError('fweights must be integer')
    if fweights.ndim > 1:
      raise RuntimeError('Cannot handle multidimensional fweights')
    if fweights.shape[0] != x.shape[1]:
      raise RuntimeError('Incompatible number of samples and fweights')
    if any(fweights < 0):
      raise ValueError('fweights cannot be negative')
    w = fweights

  if aweights is not None:
    aweights = np.asarray(aweights, dtype=float)
    if aweights.ndim > 1:
      raise RuntimeError('Cannot handle multidimensional aweights')
    if aweights.shape[0] != x.shape[1]:
      raise RuntimeError('Incompatible number of samples and fweights')
    if any(aweights < 0):
      raise ValueError('aweights cannot be negative')
    if w is None:
      w = aweights
    else:
      w *= aweights

  avg, w_sum = np.average(x, axis=1, weights=w, returned=True)
  w_sum = w_sum[0]

  # Determine the normalization.
  if w is None:
    fact = x.shape[1] - ddof
  elif ddof == 0:
    fact = w_sum
  elif aweights is None:
    fact = w_sum - ddof
  else:
    fact = w_sum - ddof * sum(w * aweights) / w_sum

  if fact <= 0:
    warnings.warn(
      'Degrees of freedom <= 0 for slice', RuntimeWarning, stacklevel=3
    )
    fact = 0.0

  x -= avg[:, None]
  xtrans = None
  if w is None:
    xtrans = x.T
  else:
    xtrans = (x * w).T
  c = np.dot(x, xtrans.conj())
  c *= np.true_divide(1, fact)
  return c.squeeze()

stats

An implementation of statistical operations for ai.

Note:

We don't support numpy arrays of more than 2 dimensions but plan to do it in future.

Also, unlike numpy functions our implementation assumes the given dataset to be a sample not a population hence uses ddof=1.

cov(a, b, /, *, ddof=1, axis=None)

Compute the relationship between a and b along the specified axis.

Covariance measures the total variation of two random variables from their expected values.

\[cov_{x,y} = \dfrac{\sum_{i=1}^{N}(x_{i}-\bar x)(y_{i}-\bar y)}{N-1}\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the covariance is computed. The default is to compute the covariance of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the covariance.

Note:

  • Positive covariance indicates that the two variables tend to move in the same direction.
  • Negative covariance indicates that the two variables tend to move in inverse direction.

Example:

>>> import numpy as np
>>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
>>> x
array([[0.62809713, 0.81040891, 0.16158262],
[0.82474163, 0.08633899, 0.60068869],
[0.55120899, 0.31197217, 0.05694431]])
>>> y
array([[0.31343184, 0.54189237, 0.5759936 ],
[0.47156163, 0.07193879, 0.88730511],
[0.673533  , 0.28599424, 0.90187499]])
>>> from ai import stats
>>> stats.cov(x, y)
array([-0.99412023])
>>> stats.cov(x, y, axis=0)
array([-1.0114404 , -0.93096473, -1.01969051])
>>> stats.cov(x, y, axis=1)
array([-1.00575977, -0.94265263, -0.98948036])
Source code in ai/stats/stats.py
def cov(
  a: np.ndarray,
  b: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the relationship between `a` and `b` along the specified axis.

  Covariance measures the total variation of two random variables from their
  expected values.

  $$cov_{x,y} = \\dfrac{\\sum_{i=1}^{N}(x_{i}-\\bar x)(y_{i}-\\bar y)}{N-1}$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the covariance is computed. The default is to compute
      the covariance of the flattened array.

  Returns:
    A new array containing the covariance.

  Note:

    * Positive covariance indicates that the two variables tend to move in the
      same direction.
    * Negative covariance indicates that the two variables tend to move in
      inverse direction.

  Example:

      >>> import numpy as np
      >>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
      >>> x
      array([[0.62809713, 0.81040891, 0.16158262],
      [0.82474163, 0.08633899, 0.60068869],
      [0.55120899, 0.31197217, 0.05694431]])
      >>> y
      array([[0.31343184, 0.54189237, 0.5759936 ],
      [0.47156163, 0.07193879, 0.88730511],
      [0.673533  , 0.28599424, 0.90187499]])
      >>> from ai import stats
      >>> stats.cov(x, y)
      array([-0.99412023])
      >>> stats.cov(x, y, axis=0)
      array([-1.0114404 , -0.93096473, -1.01969051])
      >>> stats.cov(x, y, axis=1)
      array([-1.00575977, -0.94265263, -0.98948036])

  """
  return _core.cov(a, b, ddof=ddof, axis=axis)

mean(a, /, *, axis=None)

Compute the mean along the specified axis.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.

\[\bar x = \dfrac{\sum_{i=1}^{n}x_{i}}{n}\]

Parameters:

Name Type Description Default
a ndarray

A 2-D numpy vector.

required
axis Optional[int]

A integer value specifying along which axis to calculate the mean.

None

Returns:

Type Description
ndarray

The mean for each vector.

Note:

The arithmetic mean is the sum of the elements along the axis divided by the number of elements.

Note that for floating-point input, the mean is computed using the same precision the input has.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.mean(a)
array([2.5])
>>> stats.mean(a, axis=0)
array([1.5, 3.5])
>>> stats.mean(a, axis=1)
array([2., 3.])

In single precision, `mean` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.mean(a)
array([0.54999924])
Source code in ai/stats/stats.py
def mean(a: np.ndarray, /, *, axis: Optional[int] = None) -> np.ndarray:
  """Compute the mean along the specified axis.

  Returns the average of the array elements. The average is taken over the
  flattened array by default, otherwise over the specified axis.

  $$\\bar x = \\dfrac{\\sum_{i=1}^{n}x_{i}}{n}$$

  Args:
    a: A 2-D numpy vector.
    axis: A integer value specifying along which axis to calculate the mean.

  Returns:
    The mean for each vector.

  Raises:
    `ValueError` if `a` has more than 1 dimension.

  Note:

    The arithmetic mean is the sum of the elements along the axis divided by the
    number of elements.

    Note that for floating-point input, the mean is computed using the same
    precision the input has.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.mean(a)
      array([2.5])
      >>> stats.mean(a, axis=0)
      array([1.5, 3.5])
      >>> stats.mean(a, axis=1)
      array([2., 3.])

      In single precision, `mean` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.mean(a)
      array([0.54999924])

  """
  return _core.mean(a, axis=axis)

median(a, /, *, axis=None)

Compute the median along the specified axis.

The median is often compared with other descriptive statistics such as the mean (average), mode, and std (standard deviation) and is robust to outliers.

For odd number of numbers in a dataset the median is calculated using:

\[M = \dfrac{n+1}{2}; \quad \text{odd}\]

For even number of numbers in a dataset the median is calculated using:

\[M = \dfrac{1}{2}(\dfrac{n}{2} + \dfrac{n+1}{2}); \quad \text{even}\]

Parameters:

Name Type Description Default
a ndarray

A 2-D numpy vector.

required
axis Optional[int]

A integer value specifying along which axis to calculate the mean.

None

Returns:

Type Description
ndarray

The median for each vector.

Note:

Given a vector V of length N, the median of V is the middle value of a sorted copy of V, V_sorted - i e., V_sorted[(N-1)/2], when N is odd, and the average of the two middle values of V_sorted when N is even.

Example:

>>> import numpy as np
>>> x = np.random.random((3, 3))
>>> x
array([[0.1676058 , 0.21633727, 0.12763747],
[0.36879157, 0.45505013, 0.06045118],
[0.88213891, 0.95437981, 0.61791297]])
>>> from ai import stats
>>> stats.median(x)
array([0.36879157])
>>> stats.median(x, axis=0)
array([0.1676058 , 0.36879157, 0.88213891])
>>> stats.median(x, axis=1)
array([0.36879157, 0.45505013, 0.12763747])
Source code in ai/stats/stats.py
def median(a: np.ndarray, /, *, axis: Optional[int] = None) -> np.ndarray:
  """Compute the median along the specified axis.

  The median is often compared with other descriptive statistics such as the
  `mean` (average), `mode`, and `std` (standard deviation) and is robust to
  outliers.

  For odd number of numbers in a dataset the median is calculated using:

  $$M = \\dfrac{n+1}{2}; \\quad \\text{odd}$$

  For even number of numbers in a dataset the median is calculated using:

  $$M = \\dfrac{1}{2}(\\dfrac{n}{2} + \\dfrac{n+1}{2}); \\quad \\text{even}$$


  Args:
    a: A 2-D numpy vector.
    axis: A integer value specifying along which axis to calculate the mean.

  Returns:
    The median for each vector.

  Note:

    Given a vector `V` of length `N`, the median of `V` is the middle value of a
    sorted copy of `V`, `V_sorted` - i e., `V_sorted[(N-1)/2]`, when `N` is odd,
    and the average of the two middle values of `V_sorted` when `N` is even.

  Example:

      >>> import numpy as np
      >>> x = np.random.random((3, 3))
      >>> x
      array([[0.1676058 , 0.21633727, 0.12763747],
      [0.36879157, 0.45505013, 0.06045118],
      [0.88213891, 0.95437981, 0.61791297]])
      >>> from ai import stats
      >>> stats.median(x)
      array([0.36879157])
      >>> stats.median(x, axis=0)
      array([0.1676058 , 0.36879157, 0.88213891])
      >>> stats.median(x, axis=1)
      array([0.36879157, 0.45505013, 0.12763747])

  """
  return _core.median(a, axis=axis)

std(a, /, *, ddof=1, axis=None)

Compute the standard deviation along the specified axis.

Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.

\[s = \sqrt{\dfrac{\sum_{i=1}^{n}(x_{i} - \bar x)^{2}}{n - 1}}\]

Parameters:

Name Type Description Default
a ndarray

Calculate the standard deviation of these values.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.

None

Returns:

Type Description
ndarray

Return a new array containing the standard deviation.

Note:

The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)), where x = abs(a - a.mean())**2.

The average squared deviation is typically calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even with ddof=1, it will not be an unbiased estimate of the standard deviation per se.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.std(a)
array([1.29099445])
>>> stats.std(a, axis=0)
array([0.70710678, 0.70710678])
>>> stats.std(a, axis=1)
array([1.41421356, 1.41421356])

In single precision, `std` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.std(a)
array([0.45000043])
Source code in ai/stats/stats.py
def std(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the standard deviation along the specified axis.

  Returns the standard deviation, a measure of the spread of a distribution, of
  the array elements. The standard deviation is computed for the flattened array
  by default, otherwise over the specified axis.

  $$s = \\sqrt{\\dfrac{\\sum_{i=1}^{n}(x_{i} - \\bar x)^{2}}{n - 1}}$$

  Args:
    a: Calculate the standard deviation of these values.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the standard deviation is computed. The default is to
      compute the standard deviation of the flattened array.

  Returns:
    Return a new array containing the standard deviation.

  Note:

    The standard deviation is the square root of the average of the squared
    deviations from the `mean`, i.e., `std = sqrt(mean(x))`, where
    `x = abs(a - a.mean())**2`.

    The average squared deviation is typically calculated as `x.sum() / N`,
    where `N = len(x)`. If, however, `ddof` is specified, the divisor `N - ddof`
    is used instead. In standard statistical practice, `ddof=1` provides an
    unbiased estimator of the variance of the infinite population. `ddof=0`
    provides a maximum likelihood estimate of the variance for normally
    distributed variables. The standard deviation computed in this function is
    the square root of the estimated variance, so even with `ddof=1`, it will
    not be an unbiased estimate of the standard deviation per se.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.std(a)
      array([1.29099445])
      >>> stats.std(a, axis=0)
      array([0.70710678, 0.70710678])
      >>> stats.std(a, axis=1)
      array([1.41421356, 1.41421356])

      In single precision, `std` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.std(a)
      array([0.45000043])

  """
  return _core.std(a, ddof=ddof, axis=axis)

var(a, /, *, ddof=1, axis=None)

Compute the variance along the specified axis.

Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

Parameters:

Name Type Description Default
a ndarray

Array containing numbers whose variance is desired.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the variance is computed. The default is to compute the variance of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the variance.

\[s^{2} = \dfrac{\sum_{i=1}^{n}(x_{i} - \bar x)^{2}}{n - 1}\]

Note:

The variance is the average of the squared deviations from the mean, i.e., var = mean(x), where x = abs(a - a.mean())**2.

The mean is typically calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0 provides a maximum likelihood estimate of the variance for normally distributed variables.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.var(a)
array([1.66666667])
>>> stats.var(a, axis=0)
array([0.5, 0.5])
>>> stats.var(a, axis=1)
array([2., 2.])

In single precision, `var` can be inaccurate:

>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.var(a)
array([0.20250039])
Source code in ai/stats/stats.py
def var(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the variance along the specified axis.

  Returns the variance of the array elements, a measure of the spread of a
  distribution. The variance is computed for the flattened array by default,
  otherwise over the specified axis.

  Args:
    a: Array containing numbers whose variance is desired.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the variance is computed. The default is to compute
      the variance of the flattened array.

  Returns:
    A new array containing the variance.

  $$s^{2} = \\dfrac{\\sum_{i=1}^{n}(x_{i} - \\bar x)^{2}}{n - 1}$$

  Note:

    The variance is the average of the squared deviations from the `mean`, i.e.,
    `var = mean(x)`, where `x = abs(a - a.mean())**2`.

    The mean is typically calculated as `x.sum() / N`, where `N = len(x)`. If,
    however, `ddof` is specified, the divisor `N - ddof` is used instead. In
    standard statistical practice, `ddof=1` provides an unbiased estimator of
    the variance of a hypothetical infinite population. `ddof=0` provides a
    maximum likelihood estimate of the variance for normally distributed
    variables.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.var(a)
      array([1.66666667])
      >>> stats.var(a, axis=0)
      array([0.5, 0.5])
      >>> stats.var(a, axis=1)
      array([2., 2.])

      In single precision, `var` can be inaccurate:

      >>> a = np.zeros((2, 512*512), dtype=np.float32)
      >>> a[0, :] = 1.0
      >>> a[1, :] = 0.1
      >>> a
      array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
      [0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
      >>> stats.var(a)
      array([0.20250039])

  """
  return _core.var(a, ddof=ddof, axis=axis)

varcoef(a, /, *, ddof=1, axis=None)

Compute the coefficients of variation along the specified axis.

The coefficient of variation (CV) is the ratio of the standard deviation to the mean. The higher the coefficient of variation, the greater the level of dispersion around the mean.

\[cv = \dfrac{s}{\bar x} * 100 \%\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the coefficient of variation is computed. The default is to compute the coefficient of variation of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the coefficients of variation.

Note:

The coefficient of variation is the ratio of the relative std (standard deviation) over the mean and is represented as percentage.

Example:

>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.varcoef(a)
array([51.63977795])
>>> stats.varcoef(a, axis=0)
array([47.14045208, 20.20305089])
>>> stats.varcoef(a, axis=1)
array([70.71067812, 47.14045208])
Source code in ai/stats/stats.py
def varcoef(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the coefficients of variation along the specified axis.

  The coefficient of variation (CV) is the ratio of the standard deviation to
  the mean. The higher the coefficient of variation, the greater the level of
  dispersion around the mean.

  $$cv = \\dfrac{s}{\\bar x} * 100 \\%$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the coefficient of variation is computed. The default
      is to compute the coefficient of variation of the flattened array.

  Returns:
    A new array containing the coefficients of variation.

  Note:

    The coefficient of variation is the ratio of the relative `std`
    (standard deviation) over the `mean` and is represented as percentage.

  Example:

      >>> import numpy as np
      >>> a = np.array([[1, 2], [3, 4]])
      >>> a
      array([[1, 2],
      [3, 4]])
      >>> from ai import stats
      >>> stats.varcoef(a)
      array([51.63977795])
      >>> stats.varcoef(a, axis=0)
      array([47.14045208, 20.20305089])
      >>> stats.varcoef(a, axis=1)
      array([70.71067812, 47.14045208])

  """
  return _core.varcoef(a, ddof=ddof, axis=axis)

zscore(a, /, *, ddof=1, axis=None)

Compute the z-score along the specified axis.

Compute the z score of each value in the sample, relative to the sample mean and standard deviation.

\[z = \dfrac{x - \bar x}{s}\]

Parameters:

Name Type Description Default
a ndarray

An array like object containing the sample data.

required
ddof Optional[int]

Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is one.

1
axis Optional[int]

Axis along which the z-score is computed. The default is to compute the z-score of the flattened array.

None

Returns:

Type Description
ndarray

A new array containing the z-scores.

Note:

The z-score for a data value is its difference from the relative mean divided by the relative std (standard deviation), i.e., z = x - x.mean() / x.std().

Example:

>>> import numpy as np
>>> a = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
...                0.1954,  0.6307,  0.6599,  0.1065,  0.0508])
>>> a
array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
0.1065, 0.0508])
>>> from ai import stats
>>> stats.zscore(a)
array([[ 1.06939901, -1.1830039 , -0.05258212,  1.03626165,  1.10660039,
-0.81192795,  0.5488923 ,  0.64017636, -1.08984414, -1.26397161]])

Computing along a specified axis.

>>> b = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],
...               [ 0.7149,  0.0775,  0.6072,  0.9656],
...               [ 0.6341,  0.1403,  0.9759,  0.4064],
...               [ 0.5918,  0.6948,  0.904 ,  0.3721],
...               [ 0.0921,  0.2481,  0.1188,  0.1366]])
>>> b
array([[0.3148, 0.0478, 0.6243, 0.4608],
[0.7149, 0.0775, 0.6072, 0.9656],
[0.6341, 0.1403, 0.9759, 0.4064],
[0.5918, 0.6948, 0.904 , 0.3721],
[0.0921, 0.2481, 0.1188, 0.1366]])
>>> stats.zscore(b, axis=0)
array([[-0.19264823, -1.28415119,  1.07259584,  0.40420358],
[ 0.33048416, -1.37380874,  0.04251374,  1.00081084],
[ 0.26796377, -1.12598418,  1.23283094, -0.37481053],
[-0.22095197,  0.24468594,  1.19042819, -1.21416216],
[-0.82780366,  1.4457416 , -0.43867764, -0.1792603 ]])
>>> stats.zscore(b, axis=1)
array([[-0.59710641,  0.94678835,  0.63499955,  0.47177349, -1.45645498],
[-0.73263586, -0.62041675, -0.3831319 ,  1.71200261,  0.0241819 ],
[-0.0644368 , -0.11512076,  0.97769651,  0.76458677, -1.56272573],
[-0.02464405,  1.63406492, -0.20339557, -0.31610104, -1.08992426]])
Source code in ai/stats/stats.py
def zscore(
  a: np.ndarray,
  /,
  *,
  ddof: Optional[int] = 1,
  axis: Optional[int] = None
) -> np.ndarray:
  """Compute the z-score along the specified axis.

  Compute the `z` score of each value in the sample, relative to the sample mean
  and standard deviation.

  $$z = \\dfrac{x - \\bar x}{s}$$

  Args:
    a: An array like object containing the sample data.
    ddof: Means Delta Degrees of Freedom. The divisor used in calculations is
      `N - ddof`, where `N` represents the number of elements. By default `ddof`
      is one.
    axis: Axis along which the z-score is computed. The default is to compute
      the z-score of the flattened array.

  Returns:
    A new array containing the z-scores.

  Note:

    The `z-score` for a data value is its difference from the relative `mean`
    divided by the relative `std` (standard deviation), i.e.,
    `z = x - x.mean() / x.std()`.

  Example:

      >>> import numpy as np
      >>> a = np.array([ 0.7972,  0.0767,  0.4383,  0.7866,  0.8091,
      ...                0.1954,  0.6307,  0.6599,  0.1065,  0.0508])
      >>> a
      array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
      0.1065, 0.0508])
      >>> from ai import stats
      >>> stats.zscore(a)
      array([[ 1.06939901, -1.1830039 , -0.05258212,  1.03626165,  1.10660039,
      -0.81192795,  0.5488923 ,  0.64017636, -1.08984414, -1.26397161]])

      Computing along a specified axis.

      >>> b = np.array([[ 0.3148,  0.0478,  0.6243,  0.4608],
      ...               [ 0.7149,  0.0775,  0.6072,  0.9656],
      ...               [ 0.6341,  0.1403,  0.9759,  0.4064],
      ...               [ 0.5918,  0.6948,  0.904 ,  0.3721],
      ...               [ 0.0921,  0.2481,  0.1188,  0.1366]])
      >>> b
      array([[0.3148, 0.0478, 0.6243, 0.4608],
      [0.7149, 0.0775, 0.6072, 0.9656],
      [0.6341, 0.1403, 0.9759, 0.4064],
      [0.5918, 0.6948, 0.904 , 0.3721],
      [0.0921, 0.2481, 0.1188, 0.1366]])
      >>> stats.zscore(b, axis=0)
      array([[-0.19264823, -1.28415119,  1.07259584,  0.40420358],
      [ 0.33048416, -1.37380874,  0.04251374,  1.00081084],
      [ 0.26796377, -1.12598418,  1.23283094, -0.37481053],
      [-0.22095197,  0.24468594,  1.19042819, -1.21416216],
      [-0.82780366,  1.4457416 , -0.43867764, -0.1792603 ]])
      >>> stats.zscore(b, axis=1)
      array([[-0.59710641,  0.94678835,  0.63499955,  0.47177349, -1.45645498],
      [-0.73263586, -0.62041675, -0.3831319 ,  1.71200261,  0.0241819 ],
      [-0.0644368 , -0.11512076,  0.97769651,  0.76458677, -1.56272573],
      [-0.02464405,  1.63406492, -0.20339557, -0.31610104, -1.08992426]])

  """
  return _core.zscore(a, ddof=ddof, axis=axis)