Descriptive statistics summarize the characteristics of a data set; inferential statistics are used to make probabilistic statements about a population based on a sample.
A population includes all members of a specified group, while a sample is a subset of the population used to draw inferences about the population.
Nominal scale — data is put into categories that have no particular order
Ordinal scale – data is put into categories that can be ordered with respect to some characteristic.
Interval scale – differences in data values are meaningful, but ratios, such at twice as much or twice as large are not meaningful
Ratio scale — ratios of values, such as twice as much or half as large are meaningful, and zero represents the complete absence of the characteristic being measured.
Any measurable characteristic of a population is called a parameter.
A characteristic of a sample is given by a sample statistic.
An interval is a range of values.
A frequency distribution groups observations into classes or intervals.
Relative frequency is the percentage of total observations falling within an interval; cumulative relative frequency for an interval is the sum of the relative frequencies for all values less than or equal to a given maximum value.
Relative frequency is found by dividing the frequency of the interval by the total number of frequencies
Histograms and frequency polygons are graphical tools used to illustrate frequency distributions.
median – midpoint of dataset
mode – most frequent value
Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies. Examples of quantiles include:
- Quartiles – distribution is divided into quarters
- Quintile – distribution is divided into fifths
- Decile – distribution is divided into tenths
- Percentile – distribution is divided inot hundreths
The range is the difference between the largest and smallest values in the dataset
Mean absolute deviation (MAD) is the average of the absolute values of the deviations from the arithmetic mean:
Standard deviation is the positive square root of the variance and is frequently used as a quantitative measure of risk.
Chebyshev’s inequality states that the proportion of the observations within k standard deviations of the mean is at least 1-1/k2 for all k > 1
The Sharpe ratio measures excess return per unit of risk
Skewness describes the degree to which a distribution is not symmetric about its mean.
- A right skewed distribution has positive sample skewness and has a mean that is greater than its median that is greater than its mode
- A left skewed distribution has a negative skewness and has a mean that is less than its median that is less than its mode.
- Sample skew with an absolute value greater than .5 is considered significantly different from zero
Kurtosis measures the peakedness of a distribution and the probability of extreme outcomes (thickness of tails)
- Excess kurtosis is measured realtive to a normal distribution, which has a kurtosis of 3.
- Positive values of excess kurtosis indicate a distribution that is leptokurtic (fat tails, more peaked) so that the probability of extreme outcomes is greater than the normal distribution.
- Negative values of excess kurtosis indicate a platykurtic distribution (thin tails, less peaked)
- Excess kurtosis with an absolute value greater that 1 is considered significant