Introduction
We will talk about shapiro-wilk, kruskal-wallis, Mann-whitney, wilcoxon rank test and some other tests in this and other continuing posts.These tests are crucial in establishing different assumptions about samples, tests and modelings. Let's start this series with this post describing shapiro wilk test.Shapiro wilk test:
The first brick in buildings of statistics is samples and assumptions about them. When basic statistics courses are taught, we assume normality in a majority of things and later on going into details we drop this assumption from time to time and then we face number of difficulties. Now, this test in hand,shapiro wilk test is to test the sample for normality.
Basically the test was developed by S.S.Shapiro and M.B. Wilk in a 1965 paper published in biometrica. Here is a link for the paper,(it may not open if you or your institute does not have a j-stor subscription).
Description of the test:
The test runs by producing a test statistics, W from the random sample x1, x2,...,xn. This test statistics( as mentioned in the paper) is obtained by dividing the square of an appropriate linear combination of the sample points by the usual estimate of variance obtainable from the sample.If this test statistics is small, then in general it depicts deviation from normal distribution, while the bigger values indicate normality.
Now, I will add clips from the original paper about the derivation and justifications below:
So, I hope after reading the above snippets this is clear that the linear combination coefficients are derived from the variance covariance matrix and higher values take us towards normality and lesser values deviate the distribution away from normal distribution.
report a Shapiro-Wilk test:
In general, to report a Shapiro Wilk test, one has to report both the values of the w-statistics, as well as the p-value of the hypothesis test. In this case, a higher p-value indicates that if the p-value is more than the significance value, you can not reject the null hypothesis and therefore the data normality also can not be rejected.python use and stats.shapiro package:
In python, there is direct function for shapiro wilk test. I am posting a clip for a shapiro wilk test being performed in a jupyter notebook below:
So clearly from scipy.stats scipy.stats you can get shapiro test done. Now, the sample can be fed as the data in numpy array format to the shapiro function and it generates the p-value and the w-statistics value. Clearly as it is close to 1 therefore it leads us to think already that the data comes from normal. Other thing which comes here is that the p-value for more ensurance. You can use this code with some modification according to your data and check for shapiro data.
shapiro test in r
For R,there is a shapiro.test function where you can provide the data as a vector to the function and then get the w value and the p value back. The link for the shapiro.test official documentation here. Now, follow the code to see how I test a normal sample in r and a poisson sample in r both with shapiro.test and how the result comes out.normality test in Minitab:
I tried to find Minitab Shapiro Wilk test. It seems Minitab does not support Shapiro Wilk test. It supports the anderson-darling test, Kolmogorov-Smirnov test, and Ryan-joiner test. To perform normality test in Minitab, follow this simplilearn Minitab article.Kruskal-wallis test:
We will update about kruskal wallis test soon. But if you want to know what we will discuss upon, see
(1) wiki link
Comments
Post a Comment