Section Links

Statistics

Statistics allow us to collect, organize, and analyze quantititive data after they are converted, or standardized, into test statistics.

The point of most statistical tests is to determine how likely it is that a given test statistic is within a relevant distribution.

ANOVA
chi-squared tests
confidence intervals
correlation
linear regression
logistic regression
measures of central tendency
standard deviation
standard error
standard deviation vs standard error
survival analysis
t test
z scores

ANOVA

ANOVA, or analysis of variance, is used to compare continuous variables across different levels of one or more categorical variables. If there are only two levels, ANOVA is like a t-test.

F ratios are obtained and p values are used to judge significance.

return to top

Chi-Square Tests

Chi-square tests examine dichotomous dependent variables. A 2x2 table is normally used, with actual values compared with expected values to examine whether the difference is statistically significant.

The test statistic is χ², and significance is judged by the p value.

The Fisher's exact test is similar to two dichotomous variables, but used when there are small expected frequencies in one or more cells.

return to top

Confidence Intervals

Confidence intervals are used to show the precision with which the mean has been derived. Given a 95% CI, there is a 5% chance that the population mean is outside the CI, or 2.5% in each tail.

90% CI = sample mean +/- 1.64 se
95% CI = sample mean +/- 1.96 se
99$ CI = sample mean +/- 2.58 se

return to top

Correlation

Correlations measure how one continuous variable is affected by another continuous variable, ie systolic bp and heart rate. Correlation is actually a type of linear regression.

Test statistic is r, the correlation coefficient. r =0 is no correlation while an r = - / + 1 is the extremes. P values are sued to judge statistical significance.

return to top

Linear Regression

Linear regression examines continuous dependent variables and (usually) continuous independent variables and shows how useful independent variables are in predicting dependent variables.

The main outcome is R², or the coefficient of determination, which describes the percentage of variation in the dependent variable explained by the variation in the independent variable.

Slope coefficients for each independent variable describe the effect of the independent variable. For example a coefficient of 0.5 means that for every 1.0 increase in the independent, there is a 1.0 increase int he dependent variable.

return to top

Logistic Regression

Logistic regression is used for dichotomous dependent variables and any type of independent variables.

The great thing about logistic regression is that for each independent variable, when an exponent is used for the slop coefficient, and estimate of the odds ratio can be obtained, adjusted for all other variables in the regression.

return to top

Measures of Central Tendency

Mean: average

Median: middle value

Mode: most common value

return to top

Standard Deviation

Standard deviation describes the normal variation, or scatter, seen in a population and is represented by a bell curve extending from the true mean.
SD measures the average distance of individual values from the mean.
SD doesn't get smaller with an increasing number of samples.
It is calculated by adding the variance of observations around the mean, summing them, and dividing by (n-1)
The SD can be used to describe the individual observations that make up raw data, but is not useful for determining how close a sample mean from data is from the true mean.
In a normal population, 68% of values will be within 1 SD, 95% will be within 2 SD, and 99.7% will be within 3 SD.

return to top

Standard Error

Standard error is the standard deviation of a population of sample means, rather than of individual observations. Standard error therefore refers to the variability of means and measures how accurately you know the mean.
SD should decrease with increasing number
SE = SD/( sqrt N)

return to top

Standard Deviation vs Standard Error

to show variation, use SD
to show mean, use SE (plus 95% CI)
if variation is biological, SD is more accurate; if variation is experimental, SE is better

return to top

Survival Analysis

return to top

T Test

T-tests are used for continuous dependent variables across two independent variables, with means often being compared. If there are more than two independent variables, ANOVA is used instead.

The 'paired' t-test is often used to compare two groups of subjects, with two sets of measurements on each group - a before/after type thing.

A one-tailed T test is not as rigorous as a two tailed test, as all of α is in one tail and therefore has relaxed criteria for accepting the alternative hypothesis.

return to top

Z Scores

Z scores are used to test two sample means of normally distributed, continuous data, and describe how far away a sample mean is away from the population mean, using standard error as the measurement. The mean score is standardized to 0, and the standard error is 1.

Z score = (sample mean - population mean) / se

A positive z score is a value above the mean, while a negative z score is a value below.

Charts provide the area under the curve to give the distance the sample mean is from the reference mean.

P values represent the probability that the observed value could be at least as extreme as it is, due to sampling alone.

return to top