====== 全距(range) ====== * 定义:指分布分数最大值X的精确上限和分布分数最小值X的精确下限的差值,用符号R表示,又叫极差 * 注意:如果分数是连续型,必须用精确上下限 * 例子:若X是离散型,range=10-5=5;若X是连续型,range=10.5-4.5=6 * 全距的代表性较差,只依据两个极端值 * Range describes the fractional maximum distance in a distribution and is obtained by subtracting the exact upper limit of the maximum value of the distribution from the exact lower limit of the minimum value of the distribution. The value of range depends only on the two extreme values. ---- ====== 标准差(standard deviation) ====== * 定义:描述了分布中每一个个体与某一标准偏移的距离,这个标准就是均值 * 最重要最常用的差异量数 * 包含所有的信息,代表性强 * Standard deviation describes the distance of each individual in the distribution from a certain standard, which is the mean.It is the most important and commonly used amount of difference, contains all the information and is highly representative. - **离差 Dispersion** * 定义:某数据点到均值的距离 * 离差=X-μ * 离差由正负符号和数值组成,如果分数的值大于均值,离差是正数;如果分数的值小于均值,离差是负数 * 任何一个分布中所有个体的离差值之和必然为零 * Dispersion is the distance from a data point to the mean,which is consists of positive and negative signs and numeric values.If the value is greater than the mean, the dispersion is positive; if the value is less than the mean, the dispersion is negative.The sum of dispersion values in any distribution must be zero. - **和方 Sum of squares** * 定义:SS=∑(X-μ)²=ΣX²-(∑X)²/N * 解决了正负符号的问题 *There are two ways to remove the influence of signs when we want to count the sum of the dispersion, take the absolute value or the square. The latter is much simpler in the implementation of computer operations, so it is widely used. - **总体的方差和标准差 Variance and standard deviation of the population** * 定义:总体的方差是和方除以总体的容量,也被称为均方;总体的标准差是总体方差的平方根 * 总体方差=σ²=SS/N * 总体标准差=σ=√(SS/N) * The variance of a population is the sum squared divided by the capacity of the population, also known as the mean square; The standard deviation of the population is the square root of the variance of the population. - **样本的方差和标准差 Variance and standard deviation of the sample** * 样本是从总体中抽取出的一部分,变异程度应该小于总体 * The sample is a portion of the population and should be less varied than the population * 如果样本统计量高估或低估了总体参数,就称为有偏估计。如果用样本统计量作总体方差,就低估了总体方差,是有偏估计。 * If a sample statistic overestimates or underestimates a population parameter, it is called a biased estimate. If the sample statistic is used as the population variance, the population variance is underestimated and biased. * 样本方差的分母是n-1,即s²=SS/n-1,标准差s=√(SS/n-1) * 用n-1作分母是用自由度来校正样本离差,以利于对总体参数的无偏差估计 * The denominator of the sample variance is n-1, i.e., s²=SS/n-1, and the standard deviation s=√(SS/n-1).Using n-1 as the denominator is to use the degrees of freedom to correct the sample dispersion, so as to facilitate the unbiased estimation of the population parameters. - **标准差 Standard deviation** * 拇指原则:对于对称分布,均值常常在分布的中点,标准差常常在全距的1/4左右 * Thumb principle: For symmetrical distributions, the mean is often at the midpoint of the distribution, and the standard deviation is often about 1/4 of the full distance. * 对分布中每一个分数加上一个常数不会改变其标准差 * Adding a constant to each score in the distribution does not change its standard deviation. * 对分布中每一个分数乘上一个常数,所得分布的标准差是原分布的标准差乘上这个常数 * Multiply a constant for each fraction in the distribution, and the resulting standard deviation of the distribution is the standard deviation of the original distribution multiplied by this constant. ---- ====== 四分位距(interquartile range) ====== * 定义:数据中间50%数据的全距,常常使用在用中数作为集中量数的情况下 * IQR=Q3-Q1 * Q1是第一四分位数或者下四分位数,即比Q1小的数据占数据总数的25%;Q3是第三四分位数或者上四分位数,即比Q3小的数据占数据总数的75%,四分位距就是指25%和75%之间的距离(2Q) * 半四分位距又叫四分差,是四分位距的一半,即SIQR=(Q3-Q1)/2 * 四分位距不易受极端分数的影响,适用于有不确定值的数据 * The interquartile range portrays the full range of the data in the middle 50% of the distribution, obtained by subtracting the first and third quartiles. Compared to range, the interquartile distance is less susceptible to extreme scores.