stats¶
ai.stats
¶
Statistical routines for numpy arrays.
ai.stats implements variety of statistical methods for computing stats of
numpy arrays:
Averages and Variances¶
ai.stats.meanai.stats.medianai.stats.stdai.stats.varai.stats.zscoreai.stats.varcoef
Correlating¶
ai.stats.covai.stats.corrcoef
corrcoef(a, b, /, *, ddof=1, axis=None)
¶
cov(a, b, /, *, ddof=1, axis=None)
¶
Compute the relationship between a and b along the specified axis.
Covariance measures the total variation of two random variables from their expected values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the covariance is computed. The default is to compute the covariance of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the covariance. |
Note:
- Positive covariance indicates that the two variables tend to move in the same direction.
- Negative covariance indicates that the two variables tend to move in inverse direction.
Example:
>>> import numpy as np
>>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
>>> x
array([[0.62809713, 0.81040891, 0.16158262],
[0.82474163, 0.08633899, 0.60068869],
[0.55120899, 0.31197217, 0.05694431]])
>>> y
array([[0.31343184, 0.54189237, 0.5759936 ],
[0.47156163, 0.07193879, 0.88730511],
[0.673533 , 0.28599424, 0.90187499]])
>>> from ai import stats
>>> stats.cov(x, y)
array([-0.99412023])
>>> stats.cov(x, y, axis=0)
array([-1.0114404 , -0.93096473, -1.01969051])
>>> stats.cov(x, y, axis=1)
array([-1.00575977, -0.94265263, -0.98948036])
Source code in ai/stats/stats.py
mean(a, /, *, axis=None)
¶
Compute the mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
A 2-D numpy vector. |
required |
axis
|
Optional[int]
|
A integer value specifying along which axis to calculate the mean. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The mean for each vector. |
Note:
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.mean(a)
array([2.5])
>>> stats.mean(a, axis=0)
array([1.5, 3.5])
>>> stats.mean(a, axis=1)
array([2., 3.])
In single precision, `mean` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.mean(a)
array([0.54999924])
Source code in ai/stats/stats.py
median(a, /, *, axis=None)
¶
Compute the median along the specified axis.
The median is often compared with other descriptive statistics such as the
mean (average), mode, and std (standard deviation) and is robust to
outliers.
For odd number of numbers in a dataset the median is calculated using:
For even number of numbers in a dataset the median is calculated using:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
A 2-D numpy vector. |
required |
axis
|
Optional[int]
|
A integer value specifying along which axis to calculate the mean. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The median for each vector. |
Note:
Given a vector V of length N, the median of V is the middle value of a
sorted copy of V, V_sorted - i e., V_sorted[(N-1)/2], when N is odd,
and the average of the two middle values of V_sorted when N is even.
Example:
>>> import numpy as np
>>> x = np.random.random((3, 3))
>>> x
array([[0.1676058 , 0.21633727, 0.12763747],
[0.36879157, 0.45505013, 0.06045118],
[0.88213891, 0.95437981, 0.61791297]])
>>> from ai import stats
>>> stats.median(x)
array([0.36879157])
>>> stats.median(x, axis=0)
array([0.1676058 , 0.36879157, 0.88213891])
>>> stats.median(x, axis=1)
array([0.36879157, 0.45505013, 0.12763747])
Source code in ai/stats/stats.py
std(a, /, *, ddof=1, axis=None)
¶
Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
Calculate the standard deviation of these values. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Return a new array containing the standard deviation. |
Note:
The standard deviation is the square root of the average of the squared
deviations from the mean, i.e., std = sqrt(mean(x)), where
x = abs(a - a.mean())**2.
The average squared deviation is typically calculated as x.sum() / N,
where N = len(x). If, however, ddof is specified, the divisor N - ddof
is used instead. In standard statistical practice, ddof=1 provides an
unbiased estimator of the variance of the infinite population. ddof=0
provides a maximum likelihood estimate of the variance for normally
distributed variables. The standard deviation computed in this function is
the square root of the estimated variance, so even with ddof=1, it will
not be an unbiased estimate of the standard deviation per se.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.std(a)
array([1.29099445])
>>> stats.std(a, axis=0)
array([0.70710678, 0.70710678])
>>> stats.std(a, axis=1)
array([1.41421356, 1.41421356])
In single precision, `std` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.std(a)
array([0.45000043])
Source code in ai/stats/stats.py
var(a, /, *, ddof=1, axis=None)
¶
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
Array containing numbers whose variance is desired. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the variance is computed. The default is to compute the variance of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the variance. |
Note:
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(x), where x = abs(a - a.mean())**2.
The mean is typically calculated as x.sum() / N, where N = len(x). If,
however, ddof is specified, the divisor N - ddof is used instead. In
standard statistical practice, ddof=1 provides an unbiased estimator of
the variance of a hypothetical infinite population. ddof=0 provides a
maximum likelihood estimate of the variance for normally distributed
variables.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.var(a)
array([1.66666667])
>>> stats.var(a, axis=0)
array([0.5, 0.5])
>>> stats.var(a, axis=1)
array([2., 2.])
In single precision, `var` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.var(a)
array([0.20250039])
Source code in ai/stats/stats.py
varcoef(a, /, *, ddof=1, axis=None)
¶
Compute the coefficients of variation along the specified axis.
The coefficient of variation (CV) is the ratio of the standard deviation to the mean. The higher the coefficient of variation, the greater the level of dispersion around the mean.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the coefficient of variation is computed. The default is to compute the coefficient of variation of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the coefficients of variation. |
Note:
The coefficient of variation is the ratio of the relative std
(standard deviation) over the mean and is represented as percentage.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.varcoef(a)
array([51.63977795])
>>> stats.varcoef(a, axis=0)
array([47.14045208, 20.20305089])
>>> stats.varcoef(a, axis=1)
array([70.71067812, 47.14045208])
Source code in ai/stats/stats.py
zscore(a, /, *, ddof=1, axis=None)
¶
Compute the z-score along the specified axis.
Compute the z score of each value in the sample, relative to the sample mean
and standard deviation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the z-score is computed. The default is to compute the z-score of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the z-scores. |
Note:
The z-score for a data value is its difference from the relative mean
divided by the relative std (standard deviation), i.e.,
z = x - x.mean() / x.std().
Example:
>>> import numpy as np
>>> a = np.array([ 0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
... 0.1954, 0.6307, 0.6599, 0.1065, 0.0508])
>>> a
array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
0.1065, 0.0508])
>>> from ai import stats
>>> stats.zscore(a)
array([[ 1.06939901, -1.1830039 , -0.05258212, 1.03626165, 1.10660039,
-0.81192795, 0.5488923 , 0.64017636, -1.08984414, -1.26397161]])
Computing along a specified axis.
>>> b = np.array([[ 0.3148, 0.0478, 0.6243, 0.4608],
... [ 0.7149, 0.0775, 0.6072, 0.9656],
... [ 0.6341, 0.1403, 0.9759, 0.4064],
... [ 0.5918, 0.6948, 0.904 , 0.3721],
... [ 0.0921, 0.2481, 0.1188, 0.1366]])
>>> b
array([[0.3148, 0.0478, 0.6243, 0.4608],
[0.7149, 0.0775, 0.6072, 0.9656],
[0.6341, 0.1403, 0.9759, 0.4064],
[0.5918, 0.6948, 0.904 , 0.3721],
[0.0921, 0.2481, 0.1188, 0.1366]])
>>> stats.zscore(b, axis=0)
array([[-0.19264823, -1.28415119, 1.07259584, 0.40420358],
[ 0.33048416, -1.37380874, 0.04251374, 1.00081084],
[ 0.26796377, -1.12598418, 1.23283094, -0.37481053],
[-0.22095197, 0.24468594, 1.19042819, -1.21416216],
[-0.82780366, 1.4457416 , -0.43867764, -0.1792603 ]])
>>> stats.zscore(b, axis=1)
array([[-0.59710641, 0.94678835, 0.63499955, 0.47177349, -1.45645498],
[-0.73263586, -0.62041675, -0.3831319 , 1.71200261, 0.0241819 ],
[-0.0644368 , -0.11512076, 0.97769651, 0.76458677, -1.56272573],
[-0.02464405, 1.63406492, -0.20339557, -0.31610104, -1.08992426]])
Source code in ai/stats/stats.py
_core
¶
correlation
¶
An implementation of Pearson's correlation coefficient algorithm.
corrcoef(x, y=None, rowvar=True, *, dtype=None)
¶
Return Pearson product-moment correlation coefficient.
The relationship between the correlation coefficient matrix, R, and the
covariance matrix, C, is
The values of R are between -1 and 1, inclusive.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
m
|
A 1-D or 2-D array containing multiple variables and observations. Each
row of |
required | |
y
|
ndarray
|
An additional set of variables and observations. |
None
|
rowvar
|
bool
|
If |
True
|
dtype
|
dtype
|
Data type of the result. By default the return data type will have at
least |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The correlation coefficient matrix of the variables. |
Source code in ai/stats/correlation.py
cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None, *, dtype=None)
¶
Estimate a covariance matrix, given data and weights.
Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, ... x_N]^T\), then the covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
m
|
ndarray
|
A 1-D or 2-D array containing multiple variables and observations. Each
row of |
required |
y
|
ndarray
|
An additional set of variables and observations. |
None
|
rowvar
|
bool
|
If |
True
|
bias
|
bool
|
Default normalization (False) is by |
False
|
ddof
|
int
|
If not |
None
|
fweights
|
ndarray
|
1-D array of integer frequency weight; the number of times each observation vector should be repeated. |
None
|
aweights
|
ndarray
|
1-D array of observation vector weights. These relative weights
are typically large for observations considered "important" and smaller
for observations considered less "important". If |
None
|
dtype
|
dtype
|
Data type of the result. By default the return data type will have at
least |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The covariance matrix of the variables. |
Note:
Assume that the observations are in the columns of the observation array
`m` and let ``f=fweights`` and ``a=aweights`` for brevity. The steps to
compute the weighted covariance are as follows:
>>> m = np.arange(10, dtype=np.float64)
>>> f = np.arange(10) * 2
>>> a = np.arange(10) ** 2
>>> ddof = 1
>>> w = f * a
>>> v1 = np.sum(w)
>>> v2 = np.sum(w * a)
>>> m -= np.sum(m * w, axis=None, keepdims=True) / v1
>>> cov = np.dot(m * w, m.T) * v1 / ((v1 ** 2) - (ddof * v2))
Note that when ``a == 1``, the normalization factor
``v1 / (v1**2 - ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)`` as it
should.
Source code in ai/stats/correlation.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | |
stats
¶
An implementation of statistical operations for ai.
Note:
We don't support numpy arrays of more than 2 dimensions but plan to do it in
future.
Also, unlike numpy functions our implementation assumes the given dataset to
be a sample not a population hence uses ddof=1.
cov(a, b, /, *, ddof=1, axis=None)
¶
Compute the relationship between a and b along the specified axis.
Covariance measures the total variation of two random variables from their expected values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the covariance is computed. The default is to compute the covariance of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the covariance. |
Note:
- Positive covariance indicates that the two variables tend to move in the same direction.
- Negative covariance indicates that the two variables tend to move in inverse direction.
Example:
>>> import numpy as np
>>> x, y = np.random.random((3, 3)), np.random.random((3, 3))
>>> x
array([[0.62809713, 0.81040891, 0.16158262],
[0.82474163, 0.08633899, 0.60068869],
[0.55120899, 0.31197217, 0.05694431]])
>>> y
array([[0.31343184, 0.54189237, 0.5759936 ],
[0.47156163, 0.07193879, 0.88730511],
[0.673533 , 0.28599424, 0.90187499]])
>>> from ai import stats
>>> stats.cov(x, y)
array([-0.99412023])
>>> stats.cov(x, y, axis=0)
array([-1.0114404 , -0.93096473, -1.01969051])
>>> stats.cov(x, y, axis=1)
array([-1.00575977, -0.94265263, -0.98948036])
Source code in ai/stats/stats.py
mean(a, /, *, axis=None)
¶
Compute the mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
A 2-D numpy vector. |
required |
axis
|
Optional[int]
|
A integer value specifying along which axis to calculate the mean. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The mean for each vector. |
Note:
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.mean(a)
array([2.5])
>>> stats.mean(a, axis=0)
array([1.5, 3.5])
>>> stats.mean(a, axis=1)
array([2., 3.])
In single precision, `mean` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.mean(a)
array([0.54999924])
Source code in ai/stats/stats.py
median(a, /, *, axis=None)
¶
Compute the median along the specified axis.
The median is often compared with other descriptive statistics such as the
mean (average), mode, and std (standard deviation) and is robust to
outliers.
For odd number of numbers in a dataset the median is calculated using:
For even number of numbers in a dataset the median is calculated using:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
A 2-D numpy vector. |
required |
axis
|
Optional[int]
|
A integer value specifying along which axis to calculate the mean. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
The median for each vector. |
Note:
Given a vector V of length N, the median of V is the middle value of a
sorted copy of V, V_sorted - i e., V_sorted[(N-1)/2], when N is odd,
and the average of the two middle values of V_sorted when N is even.
Example:
>>> import numpy as np
>>> x = np.random.random((3, 3))
>>> x
array([[0.1676058 , 0.21633727, 0.12763747],
[0.36879157, 0.45505013, 0.06045118],
[0.88213891, 0.95437981, 0.61791297]])
>>> from ai import stats
>>> stats.median(x)
array([0.36879157])
>>> stats.median(x, axis=0)
array([0.1676058 , 0.36879157, 0.88213891])
>>> stats.median(x, axis=1)
array([0.36879157, 0.45505013, 0.12763747])
Source code in ai/stats/stats.py
std(a, /, *, ddof=1, axis=None)
¶
Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
Calculate the standard deviation of these values. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Return a new array containing the standard deviation. |
Note:
The standard deviation is the square root of the average of the squared
deviations from the mean, i.e., std = sqrt(mean(x)), where
x = abs(a - a.mean())**2.
The average squared deviation is typically calculated as x.sum() / N,
where N = len(x). If, however, ddof is specified, the divisor N - ddof
is used instead. In standard statistical practice, ddof=1 provides an
unbiased estimator of the variance of the infinite population. ddof=0
provides a maximum likelihood estimate of the variance for normally
distributed variables. The standard deviation computed in this function is
the square root of the estimated variance, so even with ddof=1, it will
not be an unbiased estimate of the standard deviation per se.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.std(a)
array([1.29099445])
>>> stats.std(a, axis=0)
array([0.70710678, 0.70710678])
>>> stats.std(a, axis=1)
array([1.41421356, 1.41421356])
In single precision, `std` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.std(a)
array([0.45000043])
Source code in ai/stats/stats.py
var(a, /, *, ddof=1, axis=None)
¶
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
Array containing numbers whose variance is desired. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the variance is computed. The default is to compute the variance of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the variance. |
Note:
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(x), where x = abs(a - a.mean())**2.
The mean is typically calculated as x.sum() / N, where N = len(x). If,
however, ddof is specified, the divisor N - ddof is used instead. In
standard statistical practice, ddof=1 provides an unbiased estimator of
the variance of a hypothetical infinite population. ddof=0 provides a
maximum likelihood estimate of the variance for normally distributed
variables.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.var(a)
array([1.66666667])
>>> stats.var(a, axis=0)
array([0.5, 0.5])
>>> stats.var(a, axis=1)
array([2., 2.])
In single precision, `var` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> a
array([[1. , 1. , 1. , ..., 1. , 1. , 1. ],
[0.1, 0.1, 0.1, ..., 0.1, 0.1, 0.1]], dtype=float32)
>>> stats.var(a)
array([0.20250039])
Source code in ai/stats/stats.py
varcoef(a, /, *, ddof=1, axis=None)
¶
Compute the coefficients of variation along the specified axis.
The coefficient of variation (CV) is the ratio of the standard deviation to the mean. The higher the coefficient of variation, the greater the level of dispersion around the mean.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the coefficient of variation is computed. The default is to compute the coefficient of variation of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the coefficients of variation. |
Note:
The coefficient of variation is the ratio of the relative std
(standard deviation) over the mean and is represented as percentage.
Example:
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> a
array([[1, 2],
[3, 4]])
>>> from ai import stats
>>> stats.varcoef(a)
array([51.63977795])
>>> stats.varcoef(a, axis=0)
array([47.14045208, 20.20305089])
>>> stats.varcoef(a, axis=1)
array([70.71067812, 47.14045208])
Source code in ai/stats/stats.py
zscore(a, /, *, ddof=1, axis=None)
¶
Compute the z-score along the specified axis.
Compute the z score of each value in the sample, relative to the sample mean
and standard deviation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
ndarray
|
An array like object containing the sample data. |
required |
ddof
|
Optional[int]
|
Means Delta Degrees of Freedom. The divisor used in calculations is
|
1
|
axis
|
Optional[int]
|
Axis along which the z-score is computed. The default is to compute the z-score of the flattened array. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A new array containing the z-scores. |
Note:
The z-score for a data value is its difference from the relative mean
divided by the relative std (standard deviation), i.e.,
z = x - x.mean() / x.std().
Example:
>>> import numpy as np
>>> a = np.array([ 0.7972, 0.0767, 0.4383, 0.7866, 0.8091,
... 0.1954, 0.6307, 0.6599, 0.1065, 0.0508])
>>> a
array([0.7972, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599,
0.1065, 0.0508])
>>> from ai import stats
>>> stats.zscore(a)
array([[ 1.06939901, -1.1830039 , -0.05258212, 1.03626165, 1.10660039,
-0.81192795, 0.5488923 , 0.64017636, -1.08984414, -1.26397161]])
Computing along a specified axis.
>>> b = np.array([[ 0.3148, 0.0478, 0.6243, 0.4608],
... [ 0.7149, 0.0775, 0.6072, 0.9656],
... [ 0.6341, 0.1403, 0.9759, 0.4064],
... [ 0.5918, 0.6948, 0.904 , 0.3721],
... [ 0.0921, 0.2481, 0.1188, 0.1366]])
>>> b
array([[0.3148, 0.0478, 0.6243, 0.4608],
[0.7149, 0.0775, 0.6072, 0.9656],
[0.6341, 0.1403, 0.9759, 0.4064],
[0.5918, 0.6948, 0.904 , 0.3721],
[0.0921, 0.2481, 0.1188, 0.1366]])
>>> stats.zscore(b, axis=0)
array([[-0.19264823, -1.28415119, 1.07259584, 0.40420358],
[ 0.33048416, -1.37380874, 0.04251374, 1.00081084],
[ 0.26796377, -1.12598418, 1.23283094, -0.37481053],
[-0.22095197, 0.24468594, 1.19042819, -1.21416216],
[-0.82780366, 1.4457416 , -0.43867764, -0.1792603 ]])
>>> stats.zscore(b, axis=1)
array([[-0.59710641, 0.94678835, 0.63499955, 0.47177349, -1.45645498],
[-0.73263586, -0.62041675, -0.3831319 , 1.71200261, 0.0241819 ],
[-0.0644368 , -0.11512076, 0.97769651, 0.76458677, -1.56272573],
[-0.02464405, 1.63406492, -0.20339557, -0.31610104, -1.08992426]])