Richardthughes

Posts: 9769
Joined: Jan. 2006

for a freind, actually:

"I’ve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."

Halp!

Henry J

Posts: 3914
Joined: Mar. 2005

Maybe draw bell curves for both, and see if the subset curve straddles the mean of the whole, or is lumped on one side of it?

Richardthughes

Posts: 9769
Joined: Jan. 2006

I need an actual statistical test, probablywith confidence and what not.

Henry J

Posts: 3914
Joined: Mar. 2005

Oh. Well, I'm confident that I don't have one of those.

Erasmus, FCD

Posts: 6349
Joined: June 2007

what does he mean by "population is a subset of the other"?  counts and means of what

Richardthughes

Posts: 9769
Joined: Jan. 2006

say there's 100 (count) thingies with a mean of 5 and a standard deviation of 3. He's pulled out 10 (count) of those (a subset) with a mean of 4.5 and a standard deviation of 2.

*made up example/ edited & enhanced for moar better clarity and diplomatic relations.

Schroedinger's Dog

Posts: 1686
Joined: Jan. 2009

Rich: could you correct spelling and such so that we non-US can have a clear picture of what you need?

:)

Damn; I feel like a grammar nazi today!

Richardthughes

Posts: 9769
Joined: Jan. 2006

Zut alors! Done.

Dr.GH

Posts: 1922
Joined: May 2002

If I understand your question, T-test. Delete the subsample from the initial sample.

Richardthughes

Posts: 9769
Joined: Jan. 2006

 Quote (Dr.GH @ Mar. 10 2011,18:37) If I understand your question, T-test. Delete the subsample from the initial sample.

Thank you sir. Would Anova make any sense?

George

Posts: 310
Joined: Feb. 2006

A two-sample ANOVA and a t-test are equivalent.

Raevmo

Posts: 235
Joined: Oct. 2006

 Quote (Richardthughes @ Mar. 10 2011,11:59) for a freind, actually:"I’ve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out  mu_0 is in the bootstrap distribution of the mean.

Bob O'H

Posts: 1917
Joined: Oct. 2005

A practical suggestion: if the subset is a small subset of the full data, don't worry and treat it as independent.

Means usually become normally distributed very quickly, so bog standard t- and z- tests are fine.

George

Posts: 310
Joined: Feb. 2006

Another thought: counts data are often Poisson distributed and not suitable for t-tests / ANOVA.  However, a simple transformation should work the trick.  IIRC, a square root transformation is often best for counts data.

Raevmo

Posts: 235
Joined: Oct. 2006

The problem is a bit weird. Suppose it turns out that a t-test says it's very unlikely that the sample was taken from a population with mean mu=mu_0, i.e. has a very small p-value, even though we know for sure that the sample was taken from a population with mean mu_0. Then what? The only sensible conclusion then seems to be that the sampling procedure was "non-random" in some sense. Does that make sense, Oh Bob?

Raevmo

Posts: 235
Joined: Oct. 2006

 Quote (George @ Mar. 11 2011,07:15) Another thought: counts data are often Poisson distributed and not suitable for t-tests / ANOVA.  However, a simple transformation should work the trick.  IIRC, a square root transformation is often best for counts data.

Or a log-transformation. Or run a glm with a log-link and family=poisson option [R code: glm(counts~1,family=poisson,data=teh.sample] and test whether the intercept is significantly different from what's expected: |intercept - exp(mu_0)|/se(intercept)~N(0,1).

Erasmus, FCD

Posts: 6349
Joined: June 2007

DO NOT LOG TRANSFORM COUNT DATA HOMO

Schroedinger's Dog

Posts: 1686
Joined: Jan. 2009

I just realised that the "mafs" in the topic heading was standing for "maths". I should have avoided this topic from the very start. Head is aching now...

Raevmo

Posts: 235
Joined: Oct. 2006

 Quote (Erasmus @ FCD,Mar. 11 2011,07:26) DO NOT LOG TRANSFORM COUNT DATA HOMO

Here in the modern day Sodom (the Netherlands) we consider that perfectly acceptable. (Admittedly, we sometimes add 1 to the counts to prevent plugging zero-counts into the log).

Bob O'H

Posts: 1917
Joined: Oct. 2005

 Quote (Raevmo @ Mar. 11 2011,07:16) The problem is a bit weird. Suppose it turns out that a t-test says it's very unlikely that the sample was taken from a population with mean mu=mu_0, i.e. has a very small p-value, even though we know for sure that the sample was taken from a population with mean mu_0. Then what? The only sensible conclusion then seems to be that the sampling procedure was "non-random" in some sense. Does that make sense, Oh Bob?

Yes, that makes sense.

Assuming there's a decent amount of data, I wouldn't worry too much about the distribution: if you've got the means and standard errors, you'll be fine. If I had the original data, I would have used a GLM. But if the original data were to hand, we wouldn't have this problem!

I'll let Erasmus explain the sins of log-transformation, and how it relates to cricket.

Richardthughes

Posts: 9769
Joined: Jan. 2006

Thanks all, you has been moast halpfull.

Did you hear about when Bob'O was constipated? He worked in out with a pencil.

Richardthughes

Posts: 9769
Joined: Jan. 2006

Quote (Raevmo @ Mar. 11 2011,05:37)
 Quote (Richardthughes @ Mar. 10 2011,11:59) for a freind, actually:"I’ve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out  mu_0 is in the bootstrap distribution of the mean.

Just to clarift above - counts are the population sizes. Oh pivot tables, you harsh mistress!

Richardthughes

Posts: 9769
Joined: Jan. 2006

Quote (Raevmo @ Mar. 11 2011,05:37)
 Quote (Richardthughes @ Mar. 10 2011,11:59) for a freind, actually:"I’ve got one population that is a sub-set of another. I know the counts in both, the means and the standard deviations. What stat test can I use to teel if the mean of the sub-set is statistically different from the known mean of the whole? I think a t-test assumes two independent sets of data which these are not."Halp!

If the mean (say mu) of the total population is known (say mu=mu_0), and the "counts" in the total population are normally distributed (the word "counts" actually suggests that the data are count data. i.e. non-negative integers, rather than continuous normal data), then a one-sample t-test could be used to test whether the sample is from a population with mu=mu_0.

Alternatively, take bootstrap samples from the sample and see how far out  mu_0 is in the bootstrap distribution of the mean.

Just to clarift above - counts are the population sizes. Oh pivot tables, you harsh mistress!

Dr.GH

Posts: 1922
Joined: May 2002

Quote (Richardthughes @ Mar. 10 2011,16:39)
 Quote (Dr.GH @ Mar. 10 2011,18:37) If I understand your question, T-test. Delete the subsample from the initial sample.

Thank you sir. Would Anova make any sense?

Not for such a small sample (10?).

Richardthughes

Posts: 9769
Joined: Jan. 2006

Quote (Dr.GH @ Mar. 11 2011,11:50)
Quote (Richardthughes @ Mar. 10 2011,16:39)
 Quote (Dr.GH @ Mar. 10 2011,18:37) If I understand your question, T-test. Delete the subsample from the initial sample.

Thank you sir. Would Anova make any sense?

Not for such a small sample (10?).

Gotcha. I'm not sure how big his sample is (ohh-err) I was just trying to come up with an example.

Dr.GH

Posts: 1922
Joined: May 2002

I am still not clear on what the real question is. For example, if the question is "Have I a large enough sample to accurately estimate the mean, and standard deviation of the population?" you might split the data set, and then calculate t, and f for each half, and compare them to the total sample's parameters. If the results indicated that the sub-samples were basically the same, then problem solved.

ETA: There is a statistic to test if one has sampled the SD of a population that is an application of Chebsyev's Theorem, but I am having a caffeine deficit disorder.

Edited by Dr.GH on Mar. 11 2011,10:13

fnxtr

Posts: 1959
Joined: June 2006

