- There's no way to tell chisq.test the correct degrees of freedom ("df") - it cannot figure this out itself. My stats text tells me to reduce df by 1 for each value derived from the observations. If I have 10 buckets of counts AND I have calculated the mean and sd from the data, df = 7 NOT 9 as assumed by chisq.test().
- In the test distribution, again according to my text, no expected count should be less than 5. All the chisq.test does is tell you the results are unreliable - it doesn't fix them leaving the pre-processing up to you.
So here's how to use R's chisq.test()
First, I use a little function to set up some likely buckets for the observations "obs":
This returns an object of class "histogram" which can be passed to the next step once the distribution to be tested for goodness of fit is set up. For example: test.cdf <- pnorm(obs.hist$breaks, mean(obs), sd(obs)) generates a cdf that corresponds to the buckets in the histogram of the observations.
Passing the histogram of the observations and the cdf to the next function accumulates the buckets to ensure a minimum count of 5 in every bucket for the expected distribution, and accumulates the buckets in parallel for the observed distribution. Then the chisq.test() is run, setting rescale.p=T, because counts not probabilities are being passed.
The chisq.test() results are returned with a reminder to think about df.
If df is actually different, then pass the whole set of test results plus the correct df to the last function:
The correct P-Statistic is returned!