What is Normality? Why do we have to use Normality? with example

In statistics, Normality refers to the distribution of data whose shape is symmetric and bell-shaped, following the pattern of the normal distribution. A normal distribution, also known as a Gaussian distribution, is distinguished by its mean and standard deviation and possesses several key characteristics.

The application of normalcy in statistics is critical for the following reasons:

Inference Many statistical tests and methods assume that the data is normally distributed. Parametric tests like as t-tests, analysis of variance (ANOVA), linear regression, and others fall under this category. Normalcy violations can have an impact on the validity and reliability of the results produced from these tests.
Estimation: When data follows a normal distribution, the mean, median, and mode all coincide at the centre. This characteristic enables us to properly estimate population parameters and make reasonable forecasts.
Simplification: Normality assumptions make data analysis easier by allowing the use of strong mathematical tools and statistical models. Normal distributions have well-defined features that many statistical methods rely on for accuracy and efficiency.

Example: Assume a researcher wishes to look into the impact of a new medicine on blood pressure. They take blood pressure readings from a group of people before and after the medicine is administered. To analyse the data, the researcher might employ a paired t-test, which assumes that blood pressure differences follow a normal distribution.

If the blood pressure changes are roughly normally distributed, the researcher may reliably use the paired t-test to determine the significance of the drug's impact. However, if the differences deviate significantly from normality, non-parametric testing may be preferable to prevent biased or misleading outcomes.

In summary, normality is crucial in statistics because it enables the application of parametric tests, allows for correct parameter estimates, and simplifies data interpretation. However, before using statistical tests, it is critical to validate the assumption of normality, as breaches of normality might result in incorrect findings.

How to check the normality in R and how many ways are there???

In R, there are several ways to check the normality of a dataset. Here are four commonly used methods:

Histogram and QQ-Plot:

Use the hist() function to create a histogram of your data. A histogram visually displays the distribution of your data.
Use the qqnorm() function to create a quantile-quantile (QQ) plot. A QQ plot compares the quantiles of your data to the quantiles of a theoretical normal distribution. If the points in the plot closely follow a straight line, it suggests that your data are approximately normally distributed.

Example:

# Generate some data
data = rnorm(100)

# Create histogram
hist(data)

# Create a QQ plot
qqnorm(data)
qqline(data)



Shapiro-Wilk Test:

Use the shapiro.test() function to perform the Shapiro-Wilk test, which tests the null hypothesis that a sample is drawn from a normal distribution.
If the p-value of the test is greater than the chosen significance level (e.g., 0.05), you would fail to reject the null hypothesis, suggesting that your data are approximately normally distributed.

Example:

# Perform the Shapiro-Wilk test
shapiro.test(data)

Kolmogorov-Smirnov Test:

Use the ks.test() function to perform the Kolmogorov-Smirnov test, which compares the empirical distribution function of your data to the theoretical cumulative distribution function of a normal distribution.
If the p-value of the test is greater than the chosen significance level, you would fail to reject the null hypothesis, indicating that your data are approximately normally distributed.

Example:

# Perform Kolmogorov-Smirnov test
ks.test(data, "pnorm")

Anderson-Darling Test:

Use the ad.test() function from the nortest package to perform the Anderson-Darling test, which is another test for the null hypothesis of normality.
The test provides a p-value, and if it exceeds the significance level(> 0.05), you would fail to reject the null hypothesis, suggesting approximate normality.

Example:

# Install and load the nortest package
install.packages("nortest")

library(nortest)

# Perform Anderson-Darling test
ad.test(data)

These methods provide different ways to assess the normality of your data. It's important to note that no single method can definitively prove normality, but by considering the results of multiple tests and examining graphical representations, you can make an informed judgment about the normality assumption for your data.

Translate

AKSTATS

Contact Form