chi square goodness of fit
Chi-square test
The chi-square test is one of the most important non-parametric statistics that can be used to
determine whether observed frequencies are significantly different from expected
frequencies. The test can be used for several purposes; so, Guilford (1956) has called it the
general-purpose statistic. It is a non-parametric statistics because it involves no assumption
regarding the normally of distribution or homogeneity of the variance. This statistical tool
was first discovered by Helmert in 1875 and then rediscovered independently by Karl
Pearson in 1990.
Pearson chi-square (X2
) test is encountered when the data are expressed in
terms of frequencies of proportions or percentages. In other words, the test represents a
useful method of comparing experimentally obtained results with those to be expected
theoretically on some hypothesis. This test does not require the assumption of a normal
distribution like z or other parametric tests. Chi-square(X2
) the completely distribution free
non-parametric test is used for two major purposes. Firstly it is used as a test of ‘goodness of
fit’ and secondly, as a test of independence.
Goodness of fit- As a test of goodness of fit, X2
tries to determine how well the observed
results on some experiment or study fit in the in the results expected theoretically on some
hypothesis. Like, hypothesis of chance, hypothesis of equal probability, and hypothesis of
normal distribution. Hypothesis of equal probability demands the equal distribution of the
total number of frequencies into the categories of responses. And in a normal distribution
hypothesis, the expected results or frequencies are determined on the basis of the normal
distribution of observed frequencies in the entire population.
Test of independence- As a test of independence X2
is usually applied for testing the
relationship between two variables in two ways, first, by testing the null hypothesis of
independence, saying that the two given variables are independent of each other and second,
by computing the value of contingency coefficient a measurement of relationship existing
between the two variables.
The formula for calculating X2 is,
fo = obtained or observed frequency; and
fe = expected frequency
or theoretical frequency.
Assumption of chi-square test
1. Chi-square is used as a test of significance when we have data that are expressed
in frequencies or in terms of percentages or proportions that can be reduced to
frequencies.
2. Usually the test is used with discrete data. In case when any continuous data is
reduced to categories, then also we can apply the chi-square test.
X
2 = ∑
( f O - f e)
2
f e
3. Where tests of significance like z and t are based upon the assumption of normal
distribution in the population studied and are referred to as parametric tests, X2
is
altogether free from such assumption. We can use it with any type of distribution.
That is why, it is usually called distribution free or non parametric test of
significance.
4. The test demands that individual observations be independent of each other. The
response that one individual gives to an item should have no influence on the
response of any other individual in the study.
5. The total number of observation should be large. The chi-square test should not
be used if n>50.
6. The sum of the expected frequencies must always be equal to the sum of the
observed frequencies in an X2
test.
7. In the case of a 2x2 table and df =1 with small cell frequencies less than five, it
needs the use of yates’ correction.
Use of the chi square test
1. It is used as a test of equal probability hypothesis.
2. It is used in testing the significance of independence hypothesis.
3. It is used in testing a hypothesis regarding the normal shape of frequency
distribution. In this sense, it is called as test of goodness of fit.
4. It is used in testing the significance of several statistics. e.g.,
values of phi coefficient, coefficient of contingency are converted in chi-square values for
test of significance.
Important term related with chi-square (X2
) –
Non-parametric Test – A statistics worked out without using any precomputed
statistics as an estimate of parameter.
Degree of freedom – The number of values that are free to vary, assuming that
the sum of values and the number of values are fixed.
df = (C-1) (R-1)
Contingency table – A two-way table constructed for classifying data, with the
major objective of determining whether the two directions of classifications are
dependent upon one another.
Yates’ correction – A correction for the discreteness of the data that is made in
the chi-square test. This correction to be applied on each difference between the
observed and the expected frequencies in a chi-square test, if any expected
frequency is less than 5 and the chi square moreover has the df of 1 only
Post a Comment