# CFA – Ethical & Professional Standards & Quantitative Methods – Statistical concepts and market returns

# LOS 7a

Descriptive statistics summarize the characteristics of a data set; inferential statistics are used to make probabilistic statements about a population based on a sample.

A population includes all members of a specified group, while a sample is a subset of the population used to draw inferences about the population.

Nominal scale — data is put into categories that have no particular order

Ordinal scale – data is put into categories that can be ordered with respect to some characteristic.

Interval scale – differences in data values are meaningful, but ratios, such at twice as much or twice as large are not meaningful

Ratio scale — ratios of values, such as twice as much or half as large are meaningful, and zero represents the complete absence of the characteristic being measured.

# LOS 7b

Any measurable characteristic of a population is called a parameter.

A characteristic of a sample is given by a sample statistic.

An interval is a range of values.

A frequency distribution groups observations into classes or intervals.

# LOS 7c

Relative frequency is the percentage of total observations falling within an interval; cumulative relative frequency for an interval is the sum of the relative frequencies for all values less than or equal to a given maximum value.

Relative frequency is found by dividing the frequency of the interval by the total number of frequencies

Histograms and frequency polygons are graphical tools used to illustrate frequency distributions.

# LOS 7d

median – midpoint of dataset

mode – most frequent value

# LOS 7e

Quantile is the general term for a value at or below which a stated proportion of the data in a distribution lies. Examples of quantiles include:

- Quartiles – distribution is divided into quarters
- Quintile – distribution is divided into fifths
- Decile – distribution is divided into tenths
- Percentile – distribution is divided inot hundreths

# LOS 7f

The range is the difference between the largest and smallest values in the dataset

Mean absolute deviation (MAD) is the average of the absolute values of the deviations from the arithmetic mean:

Standard deviation is the positive square root of the variance and is frequently used as a quantitative measure of risk.

# LOS 7g

Chebyshev’s inequality states that the proportion of the observations within k standard deviations of the mean is at least 1-1/k^{2} for all k > 1

# LOS 7h

The coefficient of variation for sample data, is the ratio of the standard deviation of the sample to its mean (expected value of the underlying distribution)

The Sharpe ratio measures excess return per unit of risk

# LOS 7i

Skewness describes the degree to which a distribution is not symmetric about its mean.

- A right skewed distribution has positive sample skewness and has a mean that is greater than its median that is greater than its mode
- A left skewed distribution has a negative skewness and has a mean that is less than its median that is less than its mode.
- Sample skew with an absolute value greater than .5 is considered significantly different from zero

# LOS 7j

Kurtosis measures the peakedness of a distribution and the probability of extreme outcomes (thickness of tails)

- Excess kurtosis is measured realtive to a normal distribution, which has a kurtosis of 3.
- Positive values of excess kurtosis indicate a distribution that is leptokurtic (fat tails, more peaked) so that the probability of extreme outcomes is greater than the normal distribution.
- Negative values of excess kurtosis indicate a platykurtic distribution (thin tails, less peaked)
- Excess kurtosis with an absolute value greater that 1 is considered significant