- Inference Many statistical tests and methods assume that the data is normally distributed. Parametric tests like as t-tests, analysis of variance (ANOVA), linear regression, and others fall under this category. Normalcy violations can have an impact on the validity and reliability of the results produced from these tests.
- Estimation: When data follows a normal distribution, the mean, median, and mode all coincide at the centre. This characteristic enables us to properly estimate population parameters and make reasonable forecasts.
- Simplification: Normality assumptions make data analysis easier by allowing the use of strong mathematical tools and statistical models. Normal distributions have well-defined features that many statistical methods rely on for accuracy and efficiency.
If the blood pressure changes are roughly normally distributed, the researcher may reliably use the paired t-test to determine the significance of the drug's impact. However, if the differences deviate significantly from normality, non-parametric testing may be preferable to prevent biased or misleading outcomes.
In summary, normality is crucial in statistics because it enables the application of parametric tests, allows for correct parameter estimates, and simplifies data interpretation. However, before using statistical tests, it is critical to validate the assumption of normality, as breaches of normality might result in incorrect findings.
In R, there are several ways to check the normality of a dataset. Here are four commonly used methods:
Histogram and QQ-Plot:
- Use the hist() function to create a histogram of your data. A histogram visually displays the distribution of your data.
- Use the qqnorm() function to create a quantile-quantile (QQ) plot. A QQ plot compares the quantiles of your data to the quantiles of a theoretical normal distribution. If the points in the plot closely follow a straight line, it suggests that your data are approximately normally distributed.
Example:
# Generate some data data = rnorm(100) # Create histogram hist(data) # Create a QQ plot qqnorm(data) qqline(data)
Shapiro-Wilk Test:
- Use the shapiro.test() function to perform the Shapiro-Wilk test, which tests the null hypothesis that a sample is drawn from a normal distribution.
- If the p-value of the test is greater than the chosen significance level (e.g., 0.05), you would fail to reject the null hypothesis, suggesting that your data are approximately normally distributed.
Example:
Kolmogorov-Smirnov Test:# Perform the Shapiro-Wilk test shapiro.test(data)
- Use the ks.test() function to perform the Kolmogorov-Smirnov test, which compares the empirical distribution function of your data to the theoretical cumulative distribution function of a normal distribution.
- If the p-value of the test is greater than the chosen significance level, you would fail to reject the null hypothesis, indicating that your data are approximately normally distributed.
Example:
Anderson-Darling Test:# Perform Kolmogorov-Smirnov test ks.test(data, "pnorm")
- Use the ad.test() function from the nortest package to perform the Anderson-Darling test, which is another test for the null hypothesis of normality.
- The test provides a p-value, and if it exceeds the significance level(> 0.05), you would fail to reject the null hypothesis, suggesting approximate normality.
Example:
# Install and load the nortest package install.packages("nortest") library(nortest) # Perform Anderson-Darling test ad.test(data)