![]() |
![]() |
Back |
Tony Mak (tonymak@sco1.med.cuhk.edu.hk) wrote: : I need to compare a parameter in two independent groups(a new : 'cancer-marker' in patients with and without cancer). I have 29 and 31 : patients in each group. The parameter is skewed to the right in both : groups. I used Mann-Whitney U test, the p value is 0.010. I have : submitted the paper to a scientific journal, one of the referree commented : that I should be able to use t-test because of the sample size. : My question is : I understand that the parametric test is more powerful, : but how can I tell if the distribution is suitable for the t-test or not? -- Is the distribution suitable for a t-test? If you should be comfortable in using the *means* to describe the outcomes, and to compare them, then the t-test is proper and appropriate. If the means do not give a good representation, then it is hard for mere sample size to compensate, even though the t-test is rather robust (especially with equal n). If you considered the means useful to present, then there is implicit support for the use of a t-test. What might be ideal would be to perform the appropriate transformation, (if it is usual, for instance, that the biological parameter should be looked at as the log of what is measured, and that produces a symmetric distribution,) and then carry out a t-test. : (I used ROC curve analysis to test the usefulness of the marker also) -- You might try to explain to the referee that a rank-order test was used because (inexplicable?) outliers make these means unreliable, regardless of the sample size, and so the ranks were considered to be appropriate for the testing. If what you cited was the totality of the reasoning of the referee, then you may hope that the editor realizes this possibility, that a referee is not always a good source on statistics. Rich Ulrich, biostatistician wpilib+@pitt.edu http://www.pitt.edu/~wpilib/index.html Univ. of PittsburghReturn to Top
Suppose that I have n frequency values, that constitute a uniform distribution.What is the expected distribution for the number of frequencies, as a function of the frequency value. For instance, I throw a die 600 times, and I find for the 6 possible outcomes the following frequencies: 112, 100, 92, 84, 94, 116. If the six possible outcomes of a throw have equal probabilities,the distribution is uniform (this can be tested with the chi square test, or the Kolmogorov-Smirnov test). But what about the distribution of the 6 frequencies as a function of the frequency value? I guess that it is not a normal distribution centered around the mean expected value of 100, because the six values are not indepedent. What is the true nature of this distribution, and what are the means to test it?Return to Top
In article <32C42C49.75B6@telepost.no>, Bjørn Odvar EriksenReturn to Topwrote: >In a problem of multivariate linear regression with n=422 and 14 >independent variables (13 of them dichotomous), I found non-constant >variance of the dependent variable when inspecting the plot of residuals >vs. predicted values. I tried various transformations of the dependent >variable, but none of them worked. > >To find the distributions of the regression coefficients, I have instead >tried a Monte Carlo method: > >1.Select a random sample of n=422 with replacement from the original 422 >observations. >2.Estimate the regression coefficients by the least square method. >3.Repeat steps 1 and 2 a great number of times and find the >distributions of the regression coefficients from all the runs. > >This method was described in Bradley Efron's "The Jackknife, the >Bootstrap and Other Resampling Plans", but I would like to know more >about its merits and drawbacks. Could someone point me in the right >direction? > >Bjørn > >Tromsø, >Norway First of all with 13 dichotomous variables your regression can exactly analyze 2^13 observations. You need to take a look at your regression coefficients to see which ones are perfectly collinear and drop them Eric Weiss eweiss@winchendon.com
Dear Networkers: I am a Ph.D. student in Forestry, and doing some research in the relationship between insect (a kind of moth) and plant (loblolly pine). Last year, I had a very big experiment: one part of it is that I used 1-year-old seedlings from 11 pine families (here, family means different seed-source pine) with 11 seedlings as replicate in each family. After growing under the same environmental conditions for one month, I brough all pine seedlings to a field site to get insect egg deposition for two weeks. The purpose of this experiment part is to determine whether there is any preference of insect egg deposition among different pine families. After field exposure, all seedlings were brought back to lab for egg counting, on 9 shoots (3 old-growth shoots, 3 first-flush-growth shoots and 3 second- flush-growth shoots) per seedling. In addition, shoot height, diameter and needle length were measured for each shoot while needle density was estimated based on 0, 1, 2, 3, 4 and 5 subjective classes (high number indicate dense needle). Right now, I am runing SAS GLM and ANOVA to analyse the family difference, and to establish the REG model by using egg as dependable variable and family and other measured features as independable variables. The results are very interesting. There is significant difference in egg amount and egg density (egg number per shoot area) among different pine families, and needle density seems to be an important contribute factors. In fact, needle density here is a discrete quantitative variable. My question is: Can I regard it as continuous variable? If not, is there other proper method that I can use to consider the discrete quantitative variable? Any suggestion and comment are highly appreciated! John C. Liang Email address: CLIANG@UGA.CC.UGA.EDUReturn to Top
Could someone please send me the information on how to subscribe to this list via the listserv? I've lost it, and know someone who wants to post a question but doesn't have access to usenet. TIA, Meredith Warshaw mwarshaw@tiac.netReturn to Top
wpilib+@pitt.edu (Richard F Ulrich) wrote: >Tony Mak (tonymak@sco1.med.cuhk.edu.hk) wrote: > >: I need to compare a parameter in two independent groups(a new >: 'cancer-marker' in patients with and without cancer). I have 29 and 31 >: patients in each group. The parameter is skewed to the right in both >: groups. I used Mann-Whitney U test, the p value is 0.010. I have >: submitted the paper to a scientific journal, one of the referree commented >: that I should be able to use t-test because of the sample size. > >: My question is : I understand that the parametric test is more powerful, >: but how can I tell if the distribution is suitable for the t-test or not? > > > -- Is the distribution suitable for a t-test? If you should be >comfortable in using the *means* to describe the outcomes, and to >compare them, then the t-test is proper and appropriate. If the >means do not give a good representation, then it is hard for mere >sample size to compensate, even though the t-test is rather robust >(especially with equal n). If you considered the means useful to >present, then there is implicit support for the use of a t-test. > >What might be ideal would be to perform the appropriate transformation, >(if it is usual, for instance, that the biological parameter should be >looked at as the log of what is measured, and that produces a symmetric >distribution,) and then carry out a t-test. > > >: (I used ROC curve analysis to test the usefulness of the marker also) > > -- You might try to explain to the referee that a rank-order test >was used because (inexplicable?) outliers make these means unreliable, >regardless of the sample size, and so the ranks were considered to be >appropriate for the testing. If what you cited was the totality of the >reasoning of the referee, then you may hope that the editor realizes this >possibility, that a referee is not always a good source on statistics. > > > >Rich Ulrich, biostatistician wpilib+@pitt.edu >http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh Just an aside. I have been confronted with a similar problem by a colleague. He collected frequency data according to a 5 point scale for two sites, eg., f1 f2 1 0 5 2 3 15 3 10 40 4 50 10 5 30 5 The question posed is whether a t-test or a M-W U test should be used to compared these two sites. Essentially this is a problem of comparing distribution. Since the measurement are based on an integer value scale, I suggested that the use of t-test may mask the distribution shape of these two data sets. Instead I suggested he consider either using M-W U test to compare the statistics of location or, at a pinch, use the Kolg.-Smirn. two sample test to compare distributions (even though the scale of measurement is non-continuous). Cheers Eddy ___________________________________________________ Eddy Cannella cannella@ozemail.com.au BIOSTAT 116 Carr St West Perth WA 6005 AustraliaReturn to Top