chi square goodness of fit

Chi-square test 

The chi-square test is one of the most important non-parametric statistics that can be used to determine whether observed frequencies are significantly different from expected frequencies. The test can be used for several purposes; so, Guilford (1956) has called it the general-purpose statistic. It is a non-parametric statistics because it involves no assumption regarding the normally of distribution or homogeneity of the variance. This statistical tool was first discovered by Helmert in 1875 and then rediscovered independently by Karl Pearson in 1990.

Pearson chi-square (X2 ) test is encountered when the data are expressed in terms of frequencies of proportions or percentages. In other words, the test represents a useful method of comparing experimentally obtained results with those to be expected theoretically on some hypothesis. This test does not require the assumption of a normal distribution like z or other parametric tests. Chi-square(X2 ) the completely distribution free non-parametric test is used for two major purposes. Firstly it is used as a test of ‘goodness of fit’ and secondly, as a test of independence.

Goodness of fit- As a test of goodness of fit, X2 tries to determine how well the observed results on some experiment or study fit in the in the results expected theoretically on some hypothesis. Like, hypothesis of chance, hypothesis of equal probability, and hypothesis of normal distribution. Hypothesis of equal probability demands the equal distribution of the total number of frequencies into the categories of responses. And in a normal distribution hypothesis, the expected results or frequencies are determined on the basis of the normal distribution of observed frequencies in the entire population.

Test of independence- As a test of independence X2 is usually applied for testing the relationship between two variables in two ways, first, by testing the null hypothesis of independence, saying that the two given variables are independent of each other and second, by computing the value of contingency coefficient a measurement of relationship existing between the two variables. The formula for calculating X2 is, 
Where, X2 = Chi-square; 
fo = obtained or observed frequency; and 
fe = expected frequency or theoretical frequency. 

Assumption of chi-square test

1. Chi-square is used as a test of significance when we have data that are expressed in frequencies or in terms of percentages or proportions that can be reduced to frequencies.
 
2. Usually the test is used with discrete data. In case when any continuous data is reduced to categories, then also we can apply the chi-square test. X 2 = ∑ ( f O - f e) 2 f e 

3. Where tests of significance like z and t are based upon the assumption of normal distribution in the population studied and are referred to as parametric tests, X2 is altogether free from such assumption. We can use it with any type of distribution. That is why, it is usually called distribution free or non parametric test of significance.
 
4. The test demands that individual observations be independent of each other. The response that one individual gives to an item should have no influence on the response of any other individual in the study.
 
5. The total number of observation should be large. The chi-square test should not be used if n>50.
 
6. The sum of the expected frequencies must always be equal to the sum of the observed frequencies in an X2 test.
 
7. In the case of a 2x2 table and df =1 with small cell frequencies less than five, it needs the use of yates’ correction.

Use of the chi square test
1. It is used as a test of equal probability hypothesis. 
2. It is used in testing the significance of independence hypothesis. 
3. It is used in testing a hypothesis regarding the normal shape of frequency distribution. In this sense, it is called as test of goodness of fit. 
4. It is used in testing the significance of several statistics. e.g., 

values of phi coefficient, coefficient of contingency are converted in chi-square values for test of significance.

Important term related with chi-square (X2 ) – 

Non-parametric Test – A statistics worked out without using any precomputed statistics as an estimate of parameter. 

Degree of freedom – The number of values that are free to vary, assuming that the sum of values and the number of values are fixed. df = (C-1) (R-1) 

Contingency table – A two-way table constructed for classifying data, with the major objective of determining whether the two directions of classifications are dependent upon one another.
 
Yates’ correction – A correction for the discreteness of the data that is made in the chi-square test. This correction to be applied on each difference between the observed and the expected frequencies in a chi-square test, if any expected frequency is less than 5 and the chi square moreover has the df of 1 only

No comments

Powered by Blogger.