Newsgroup sci.stat.consult 21761

Articles

Subject: Re: Help : Mann-Whitney U test or t-test ?
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 31 Dec 1996 15:31:10 GMT

Tony Mak (tonymak@sco1.med.cuhk.edu.hk) wrote:
: I need to compare a parameter in two independent groups(a new 
: 'cancer-marker' in patients with and without cancer).  I have 29 and 31 
: patients in each group.  The parameter is skewed to the right in both 
: groups.  I used Mann-Whitney U test, the p value is 0.010.   I have 
: submitted the paper to a scientific journal, one of the referree commented 
: that I should be able to use t-test because of the sample size.  
: My question is : I understand that the parametric test is more powerful, 
: but how can I tell if the distribution is suitable for the t-test or not?
 -- Is the distribution suitable for a t-test?  If you should be
comfortable in using the *means*  to describe the outcomes, and to
compare them, then the t-test is proper and appropriate.  If the
means do not give a good representation, then it is hard for mere
sample size to compensate, even though the t-test is rather robust
(especially with equal n).  If you considered the means useful to
present, then there is implicit support for the use of a t-test.
What might be ideal would be to perform the appropriate transformation,
(if it is usual, for instance, that the biological parameter should be
looked at as the log of what is measured, and that produces a symmetric
distribution,)  and then carry out a t-test.
: (I used ROC curve analysis to test the usefulness of the marker also)
 -- You might try to explain to the referee that a rank-order test
was used because (inexplicable?) outliers make these means unreliable, 
regardless of the sample size, and so the ranks were considered to be
appropriate for the testing.  If what you cited was the totality of the 
reasoning of the referee, then you may hope that the editor realizes this
possibility, that a referee is not always a good source on statistics.
Rich Ulrich, biostatistician                wpilib+@pitt.edu
http://www.pitt.edu/~wpilib/index.html   Univ. of Pittsburgh

Return to Top

Subject: uniform distribution: distrib. of frequencies as f. of frq.value
From: Jozef Verhulst
Date: Wed, 1 Jan 1997 15:34:10 +0100

Suppose that I have n frequency values, that constitute a uniform
distribution.What is the expected distribution for the number of
frequencies, as a function of the frequency value.
For instance, I throw a die 600 times, and I
find for the 6 possible outcomes the following frequencies: 112, 100,
92, 84, 94, 116. If the six possible outcomes of a throw have equal
probabilities,the distribution is uniform (this can be tested with the
 chi square test, or the Kolmogorov-Smirnov test).
But what about the distribution of the 6
frequencies as a function of the frequency value? I guess that it is
not a normal distribution centered around the mean expected value of
100, because the six values are not indepedent. What is the true nature
of this distribution, and what are the means to test it?

Return to Top

Subject: Re: Monte Carlo methods and linear regression
From: eweiss@winchendon.com (Eric Weiss)
Date: Wed, 01 Jan 97 19:17:45 GMT

In article <32C42C49.75B6@telepost.no>, Bjørn Odvar Eriksen  wrote:
>In a problem of multivariate linear regression with n=422 and 14
>independent variables (13 of them dichotomous), I found non-constant
>variance of the dependent variable when inspecting the plot of residuals
>vs. predicted values. I tried various transformations of the dependent
>variable, but none of them worked.
>
>To find the distributions of the regression coefficients, I have instead
>tried a Monte Carlo method:
>        
>1.Select a random sample of n=422 with replacement from the original 422
>observations.
>2.Estimate the regression coefficients by the least square method.
>3.Repeat steps 1 and 2 a great number of times and find the
>distributions of the regression coefficients from all the runs.
>
>This method was described in Bradley Efron's "The Jackknife, the
>Bootstrap and Other Resampling Plans", but I would like to know more
>about its merits and drawbacks. Could someone point me in the right
>direction?
>
>Bjørn
>
>Tromsø,
>Norway
First of all with 13 dichotomous variables your regression can exactly 
analyze 2^13 observations.  You need to take a look at your regression
coefficients to see which ones are perfectly collinear and drop them
Eric Weiss
eweiss@winchendon.com

Return to Top

Subject: INSECT AND PLANT: Needs suggestions for SAS!!!
From: Chun Liang
Date: Wed, 1 Jan 1997 15:10:40 EST

Dear Networkers:
I am a Ph.D. student in Forestry, and doing some research in the relationship
between insect (a kind of moth) and plant (loblolly pine). Last year, I
had a very big experiment: one part of it is that I used 1-year-old seedlings
from 11 pine families (here, family means different seed-source pine)
with 11 seedlings as replicate in each family. After growing under the same
environmental conditions for one month, I brough all pine seedlings to a
field site to get insect egg deposition for two weeks. The purpose of
this experiment part is to determine whether there is any preference of
insect egg deposition among different pine families.
After field exposure, all seedlings were brought back to lab for egg counting,
on 9 shoots (3 old-growth shoots, 3 first-flush-growth shoots and 3 second-
flush-growth shoots) per seedling. In addition, shoot height, diameter and
needle length were measured for each shoot while needle density was estimated
based on 0, 1, 2, 3, 4 and 5 subjective classes (high number indicate dense
needle).
Right now, I am runing SAS GLM and ANOVA to analyse the family difference, and
to establish the REG model by using egg as dependable variable and family and
other measured features as independable variables. The results are very
interesting. There is significant difference in egg amount and egg density
(egg number per shoot area) among different pine families, and needle density
seems to be an important contribute factors. In fact, needle density here is
a discrete quantitative variable. My question is: Can I regard it as continuous
variable? If not, is there other proper method that I can use to consider the
discrete quantitative variable?
Any suggestion and comment are highly appreciated!
John C. Liang
Email address: CLIANG@UGA.CC.UGA.EDU

Return to Top

Subject: listserv
From: mwarshaw@tiac.net (Meredith Warshaw)
Date: Thu, 02 Jan 1997 03:06:03 GMT

Could someone please send me the information on how to
subscribe to this list via the listserv?  I've lost it,
and know someone who wants to post a question but doesn't
have access to usenet.
TIA,
Meredith Warshaw
mwarshaw@tiac.net

Return to Top

Subject: Re: Help : Mann-Whitney U test or t-test ?
From: cannella@ozemail.com.au (Eddy Cannella)
Date: Thu, 02 Jan 1997 04:29:54 GMT

wpilib+@pitt.edu (Richard F Ulrich) wrote:
>Tony Mak (tonymak@sco1.med.cuhk.edu.hk) wrote:
>
>: I need to compare a parameter in two independent groups(a new 
>: 'cancer-marker' in patients with and without cancer).  I have 29 and 31 
>: patients in each group.  The parameter is skewed to the right in both 
>: groups.  I used Mann-Whitney U test, the p value is 0.010.   I have 
>: submitted the paper to a scientific journal, one of the referree commented 
>: that I should be able to use t-test because of the sample size.  
>
>: My question is : I understand that the parametric test is more powerful, 
>: but how can I tell if the distribution is suitable for the t-test or not?
>
>
> -- Is the distribution suitable for a t-test?  If you should be
>comfortable in using the *means*  to describe the outcomes, and to
>compare them, then the t-test is proper and appropriate.  If the
>means do not give a good representation, then it is hard for mere
>sample size to compensate, even though the t-test is rather robust
>(especially with equal n).  If you considered the means useful to
>present, then there is implicit support for the use of a t-test.
>
>What might be ideal would be to perform the appropriate transformation,
>(if it is usual, for instance, that the biological parameter should be
>looked at as the log of what is measured, and that produces a symmetric
>distribution,)  and then carry out a t-test.
>
>
>: (I used ROC curve analysis to test the usefulness of the marker also)
>
> -- You might try to explain to the referee that a rank-order test
>was used because (inexplicable?) outliers make these means unreliable, 
>regardless of the sample size, and so the ranks were considered to be
>appropriate for the testing.  If what you cited was the totality of the 
>reasoning of the referee, then you may hope that the editor realizes this
>possibility, that a referee is not always a good source on statistics.
>
>
>
>Rich Ulrich, biostatistician                wpilib+@pitt.edu
>http://www.pitt.edu/~wpilib/index.html   Univ. of Pittsburgh
Just an aside.
I have been confronted with a similar problem by a colleague. He
collected frequency data according to a 5 point scale for two sites,
eg., 
      f1   f2
1    0     5
2    3   15
3  10   40
4  50   10
5  30     5
The question posed is whether a t-test or a M-W U test should be used
to compared these two sites. Essentially this is a problem of
comparing distribution. Since the measurement are based on an integer
value scale, I suggested that the use of t-test may mask the
distribution shape of these two data sets. Instead I suggested he
consider either using M-W U test to compare the statistics of location
or, at a pinch, use the Kolg.-Smirn. two sample test to compare
distributions (even though the scale of measurement is
non-continuous).
Cheers
Eddy
___________________________________________________
Eddy Cannella
cannella@ozemail.com.au
BIOSTAT
116 Carr St
West Perth  WA  6005
Australia

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.consult 21761

Directory

Articles