Statistics

Statistics allow us to collect, organize, and analyze quantititive data after they are converted, or standardized, into test statistics.

The point of most statistical tests is to determine how likely it is that a given test statistic is within a relevant distribution.

ANOVA

ANOVA, or analysis of variance, is used to compare continuous variables across different levels of one or more categorical variables. If there are only two levels, ANOVA is like a t-test.

F ratios are obtained and p values are used to judge significance.

 

return to top

 

 

Chi-Square Tests

Chi-square tests examine dichotomous dependent variables. A 2x2 table is normally used, with actual values compared with expected values to examine whether the difference is statistically significant.

The test statistic is χ2, and significance is judged by the p value.

The Fisher's exact test is similar to two dichotomous variables, but used when there are small expected frequencies in one or more cells.

 

return to top

 

 

Confidence Intervals

Confidence intervals are used to show the precision with which the mean has been derived. Given a 95% CI, there is a 5% chance that the population mean is outside the CI, or 2.5% in each tail.

return to top

 

 

Correlation

Correlations measure how one continuous variable is affected by another continuous variable, ie systolic bp and heart rate. Correlation is actually a type of linear regression.

Test statistic is r, the correlation coefficient. r =0 is no correlation while an r = - / + 1 is the extremes. P values are sued to judge statistical significance.

 

return to top

 

 

Linear Regression

Linear regression examines continuous dependent variables and (usually) continuous independent variables and shows how useful independent variables are in predicting dependent variables.

 

The main outcome is R2, or the coefficient of determination, which describes the percentage of variation in the dependent variable explained by the variation in the independent variable.

Slope coefficients for each independent variable describe the effect of the independent variable. For example a coefficient of 0.5 means that for every 1.0 increase in the independent, there is a 1.0 increase int he dependent variable.

 

return to top

 

Logistic Regression

Logistic regression is used for dichotomous dependent variables and any type of independent variables.

The great thing about logistic regression is that for each independent variable, when an exponent is used for the slop coefficient, and estimate of the odds ratio can be obtained, adjusted for all other variables in the regression.

 

return to top

 

 

Measures of Central Tendency

 

Mean: average

Median: middle value

Mode: most common value

 

 

return to top

 

Standard Deviation

 

return to top

 

Standard Error

return to top

 

Standard Deviation vs Standard Error

return to top

 

 

Survival Analysis

 

return to top

 

 

T Test

T-tests are used for continuous dependent variables across two independent variables, with means often being compared. If there are more than two independent variables, ANOVA is used instead.

 

The 'paired' t-test is often used to compare two groups of subjects, with two sets of measurements on each group - a before/after type thing.

 

A one-tailed T test is not as rigorous as a two tailed test, as all of α is in one tail and therefore has relaxed criteria for accepting the alternative hypothesis.

 

return to top

 

Z Scores

Z scores are used to test two sample means of normally distributed, continuous data, and describe how far away a sample mean is away from the population mean, using standard error as the measurement. The mean score is standardized to 0, and the standard error is 1.

Z score = (sample mean - population mean) / se

A positive z score is a value above the mean, while a negative z score is a value below.

Charts provide the area under the curve to give the distance the sample mean is from the reference mean.

P values represent the probability that the observed value could be at least as extreme as it is, due to sampling alone.

 

return to top