![]() |
![]() |
Back |
DOES A SIMPLE REGRESSION ANALYSIS ... REQUIRE A NORMALLY DISTRIBUTED DEPENDENT MEASURE? WHAT ABOUT THE INDEPENDENT VARIABLE? At 10:46 PM 12/17/96 EST, you wrote: > We need to do regression analysis with RATIO (e.g., cost/benefit) as a >dependent variable, but I was told that I might violate normal distribution >assumption if I do so. Any comments, suggestions? > >Thanks, > >Haiyi > >Return to Top
YOU MEAN THERE IS NO ONE LOCAL ... A GRAD STUDENT OR SOMEONE ELSE ... WHO IS IN A POSTION TO PROVIDE SOME ASSISTANCE???? I WOULD FIND THAT HARD TO BELIEVE ... At 12:40 AM 12/18/96 GMT, you wrote: >I am in the middle of a business stats course that is killing me. I would >like to find some help on-line to help me get throgh this course. Please >e-mail me if you are willing to help > >Return to Top
Please email me if you have any information. Thanks. LiangReturn to Top
Responsibilities Include: Statistical Programming to write SAS programs for data management and analysis. Minimum Skills Required Include: Base SAS and Statistical Procedures. Must have a minimum of two years SAS programming experience. Candidates must have good statistical knowledge and communication skills. Ref-3659 Our regional offices located in Waukegan, IL, Durham NC, Palo Alto, CA, and Princeton, NJ provide off-site and on-site services to clients in over 30 states. Our Clinical Trial Management group assists Research and Development organizations in the monitoring and quality assurance of clinical trial studies. Or System Professionals function in all aspects of the application development life cycle, manage data, and develop reports. In Research Statistics, we design statistical experiments, evaluate the results for the areas of Life Science, Finance, Marketing, Economics, and Engineering. Please reply to: rtpresumes@trilogycnslt.com Trilogy Consulting Corporation 1000 Park Forty Plaza, Suite 190 Research Triangle Park, NC 27713 http://www.trilogycnslt.com/TCC_Home fax: 919.361.2415Return to Top
In article <5990pt$l3l@news-central.tiac.net>, mwarshaw@tiac.net (Meredith Warshaw) writes: |> I've been asked for help by someone who has paired nominal data, and |> I'm not sure what to suggest. She's looking at working mothers and |> has some hypotheses regarding differences in work/child-care for |> first and second born kids. If these involved numerical data then |> paired t-tests would be the obvious solution. Is there anything |> analagous for either dichotomous or multi-level nominal variables? |> |> TIA, |> Meredith Warshaw Dichotomous response: McNemar's test Polytomous response: Stuart's (1955, Biometrika) test (Cochran-Mantel-Haenszel general association statistic) Ordered categorical: Agresti (1983, Biometrics); CMH mean score test If there are additional covariates, conditional logistic regression for matched sets (Breslow and Day, 1980, _Statistical Methods in Cancer Research: Vol 1_) can be used. See also Cox and Snell (1989, _Analysis of Binary Data_), Lipsitz, Laird and Harrington (Statistics in Medicine, 1990). Chuck DavisReturn to Top
I'm looking for references on performing interim analyses on equivalence trials - specifically for a time to failure analysis involving two types of surgery. Any references to current journal articles or macros would be very much appreciated. TIA and Happy Holidays!! J. Cater jrcr@phila.acr.org P.S. I will post to group a compilation of responses.Return to Top
Warren (wlmay@umsmed.edu) wrote: : What is the alternative? That Rho is less than 1? If you reject, then : what could you say? That there is less than perfect correlation? : Perfect correlation would imply a functional relationship instead of a : statistical one. : In large samples, I would think the null would almost always be rejected << rest deleted ...> I think you must have missed the earliest commentary on the question. Yes, The question, worded as "Pearson r=1", only arises from naive misunderstandings of what testing is about - But the notion of "perfect correlation, if it were not for accidents and measurement error", arises in several areas. Computed correlations are 'attentuated' by the unreliability of the component measures. You may adjust the observed correlation for the purpose of comparing to a theoretical value, for instance, 1.0. Thus, if two 'IQ' tests were to correlate at .95, then they are almost 100% measuring the same thing; because that is the limit of the reliability of any (either) IQ test. However, for purposes of testing, one needs to work more directly with the variance components, and/or the OBSERVED correlations, as well as with the *evidence* for the reliability of the measures. - If a correlation is .65 between two variables, then it still owns the variability associated with a correlation of .65, even if you think you have 'corrected' it to 1.0. When reliabilies are poor, adjusting for attenuation can produce some impressively big-sounding correlations, near 1.0, even though the original correlations might not manage to be nominally significant. Nunnally has discussed attentuation in "Psychometric Theory", among other places. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
>I hope someone can take a little time to help us, as our whole >department of medical statistics seems to have disappeared for the >holidays!! > >We are looking at the effects of the amount of time paramedic personnel >spend at the scene of an accident. Our outcome measures are >death, intensive care stay, total hospital stay, and 12 month functional >impairment measure (0-100 integer scale) Don't do it! The amount of time that paramedical personnel spend on the scene of an accident is related to a number of properties of the accident itself. That is, the sort of accidents that they stay at for a long time are not the same as the ones where their stay is short. For instance, it may take time to cut people out of the wreckage of a really nasty accident. This means that your model is wrong about the direction of cause-and-effect; things that cause death will also cause differences in the time variable. I doubt the value of statistical adjustment for type of accident too - there are too many intangibles that are evident to the person on the scene which cannot be included in the model. (We looked at the decision to admit people with chest pain; in a significant number of instances where there were no positive clinical indications that the person was having a heart attack, the doctor on casualty admitted the person 'because they didn't look right' and the person indeed went on to develop a heart attack.) One potential solution is a controlled trial. You could try to retrain some units to minimise the delay to hospital admission and compare the outcome of their patients with patients attended by conventional teams. This is actually quite hard; a contamination effect will probably occur where conventional teams will 'race' the fast-track teams and thereby reduce their own delay times. Incidentally, this question reflects a whole side of the practice of statistics which is not maths but craft. It has to do with the appropriate mapping of mathematical models onto real-world processes. Statistical models, as opposed to mathematical models, are models in which this mapping is assumed. The utility of the model is as much to do with the assumptions of the mapping as with the assumptions of the mathematics it uses. The trouble is that it is easier to discuss the mathematics. _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _/_/_/ _/_/ _/_/_/ _/ Ronan M Conroy _/ _/ _/ _/ _/ _/ Lecturer in Biostatistics _/_/_/ _/ _/_/_/ _/ Royal College of Surgeons _/ _/ _/ _/ _/ Dublin 2, Ireland _/ _/ _/_/ _/_/_/ _/ voice +353 1 402 2431 fax +353 1 402 2329 _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ 'Do not try to be a genius in every bar' [Brendan Behan? - No, Faure!]Return to Top
Re: What do we mean by "The Null Hypothesis"?. Or, the null as nil? << Clay Helberg Internet: helberg@execpc.com >> wrote: : Richard F Ulrich wrote: << ... our details, deleted >> -- Okay. Rather than engage in a dissection of the dialog, I will try to address the central issue. Clay is endorsing Hays, but he does not accept the fact that I find Hays less-than-satisfactory. With that in mind, I will try to convey my argument by re-writing part of Hays. CH:Return to Top: Incidentally, there is an impression in some quarters that the term : "null hypothesis" refers to the fact that in experimental work the : parameter value specified in Ho is very often zero. Thus, in many : experiments the hypothetical situation "no experimental effect" is : represented by a statement that some mean or difference between means is : exactly zero. >>to be replaced>> However, as we have seen, the tested hypothesis can : specify any of the possible values for one or more parameters, and this : use of the word *null* is only incidental. It is far better for the : student to think of the null hypthesis Ho as simply designating that : hypothesis actually being tested, the one which, if true, determines the : sampling distribution referred to in the test. : > : How about, "A tested hypothesis must specify a value that does have a particular meaning, or _gravitas_. Though that may be any of the possible values for one (or more) parameters, the use of the word *null* is always appropriate because the test is looking for "no experimental effect" (in the words of Hays, above) - even though 'no-effect' sometimes is represented by a number. "Metaphorically, the null is also reminiscent of a singularity, or a black-hole, which is a sort of zero - it is what your conclusions have to collapse to, if your data come out totally noisy. It is certainly different from the way we regard 'alternate" hypotheses'." What we are discussing here is a pedagogical question, rather than a statistical one. In the TECHNICAL terms, I am right and Hays is wrong, I think, because every hypothesis *is* reduced to what Clay termed a 'tautological' form, where there is a zero. (At least, that is the way for writing formal, mathematical hypotheses for t-tests and ANOVAs, where you show that the computed term does have the intended distribution, of t or chisquared. I don't really remember writing hypotheses for anything else.) Further, I am using "effect size" in the same technical sense that Hays uses the phrase, above, where the effect size *is* zero under the null. (Note: Clay has been saying it differently, using effect-size as synonomous with, say, raw-change-score. I would rather keep it as a technical term.) For the sake of pedagogy, the Hays approach does de-emphasize zero as COMPARISON value. Is that a major problem? Personally, I have not had trouble explaining the difference between effect-size and comparison-value. But I do my explaining to one or two persons at a time. Also, I have not read Hays, so I do not know what further use he might make of the ideas in the course of his presentation. If the citation came from his introduction, then maybe he had a lot more to say. If it came from his summary, then I think that he just made a meager point, where he could have argued more fruitfully. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of Pittsburgh
Barry Haworth (barryh@AGB.COM.AU) wrote: << concerning the sampling of 300 out of 3000 ... >> :" Taking a sample of a sample is perfectly appropriate, so long as the second sample is drawn in a sensible way (a random sample of all the original responses, for example)" -- Please, if you have to analyze the 300, then draw them SYSTEMATICALLY rather than randomly. For instance, it is VERY OFTEN useful to know if the early (fast?) respondents were different from those whose forms came in last. (In mortality followups, one looks to see if the causes of death are different for those whose records were hardest to find - found by 'only one method' or found only by persistent checking.) You might draw 100 early+middle+late for your 300, so you could compare. Big changes across the strata suggest a big chance that your Unsampled cases are even more different than early vs late. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
Joseph K. Lyou (CBGLyou@AOL.COM) wrote: : I want to analyze whether there is a significant trend over time in the : annual failure rate of a product. I have 20 years of measurements (i.e., n = : 20). As I understand it, an ordinary regression analysis would be : inappropriate because the residuals are not independent (i.e., the error : associated with a failure rate for 1974 is more highly correlated with the : 1975 failure rate than the 1994 failure rate). Is it appropriate to simply : divide the data into two groups (the 1st 10 years vs. the 2nd 10 years) and : do a between-groups ANOVA? Or is there some other (better) way to analyze : these data? : Should anyone be so inclined as to do the analysis, here are the data: : Year Failure Rate : 1974 3.3 << deleted, numbers between 1.3 and 5.7, across some years >> -- If there were a simple trend, one *might* be tempted to draw a line and then draw conclusions. However, there is nothing simple. You do not provide the numbers on which the "Rates" are based. There is an inherent error-estimate in the number of events that occurred, rather than the "rate". From the number of events, it might be possible to say that there do *seem to be changes* taking place. Or else, not, depending on the numbers. For instance, if the "rates" really represent a low range of 2 events, going up to 6 events, in a year, then there was VERY LITTLE happening. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
In article <961216173408_1424980603@emout02.mail.aol.com>, "Joseph K. Lyou"Return to Topwrote: >I want to analyze whether there is a significant trend over time in the >annual failure rate of a product. I have 20 years of measurements (i.e., n = >20). As I understand it, an ordinary regression analysis would be >inappropriate because the residuals are not independent (i.e., the error >associated with a failure rate for 1974 is more highly correlated with the >1975 failure rate than the 1994 failure rate). Is it appropriate to simply >divide the data into two groups (the 1st 10 years vs. the 2nd 10 years) and >do a between-groups ANOVA? Or is there some other (better) way to analyze >these data? > >Should anyone be so inclined as to do the analysis, here are the data: > >Year Failure Rate >1974 3.3 >1975 2.5 >1976 2.7 >1977 2.4 >1978 5.7 >1979 3.2 >1980 1.6 >1981 5.2 >1982 2.8 >1983 2.4 >1984 2.7 >1985 1.3 >1986 4.5 >1987 4.5 >1988 1.4 >1989 3.6 >1990 1.5 >1991 1.4 >1992 1.6 >1993 1.6 Joseph, Since your problem looked a interesting, I ran it through my statistical package ELF. (FYI, it took all of 30 seconds including importing the data.) I did a regression using Durbin's technique for autocorrelated regressions. My conclusion is that based on the first order autocorrelation coefficient of -0.2 and a t of -0.8 is that you do not have a significant autocorrelation problem. Looking at the regression coefficients and t statistics, you don't have a significant trend either. ELF 201 is not yet available, but you might visit our web site http://www.winchendon.com Eric Eric Weiss eweiss@winchendon.com
She should consider the McNemar test, after casting the data into a k X k table. While k is commonly 2 for this procedure, the McNemar test can be expanded to k > 2. Jerrold H. Zar Department of Biological Sciences, Northern Illinois University DeKalb, IL 60115 USA jhzar@niu.edu >>> Meredith WarshawReturn to Top12/18/96 09:10am >>> I've been asked for help by someone who has paired nominal data, and I'm not sure what to suggest. She's looking at working mothers and has some hypotheses regarding differences in work/child-care for first and second born kids. If these involved numerical data then paired t-tests would be the obvious solution. Is there anything analagous for either dichotomous or multi-level nominal variables? TIA, Meredith Warshaw mwarshaw@tiac.net Dept. of Psychiatry and Human Behavior Brown University Providence, RI Meredith Warshaw mwarshaw@tiac.net
Hello, I'm wondering if it is possible to calculate power values (1 - Beta) for one or all of the common contingency table tests (e.g. chi-square, G, Fisher's exact). Thanks, Peter Midford Department of Zoology U Wisconsin - MadisonReturn to Top
In article <26222628@vixen.Dartmouth.EDU>, Haiyi XieReturn to Topwrote: > We need to do regression analysis with RATIO (e.g., cost/benefit) as a >dependent variable, but I was told that I might violate normal distribution >assumption if I do so. Any comments, suggestions? The importance of normality in a regression is greatly overblown. However, when using the ratio as you have defined it, unless benefit does not get too close to 0 too often, you may very well have huge tails, and not even a finite variance. This is a serious problem. And as the distribution of the error from the prediction in ratio is likely to be non-symmetric, the so-called robust regression procedures are likely to be invalid. I suggest you speak in person to a mathematical statistician about your real problem, which may not be what you have stated. -- This address is for information only. I do not claim that these views are those of the Statistics Department or of Purdue University. Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (317)494-6054 FAX: (317)494-0558
It is a little hard to understand what your friend wants to do without a little more information, but let me give it a shot... You might convert the dichotomous data into rates: what percent of first children are sent to day care and what percent of second children. You could also try using crosstabs (also called a contingency table) if you have another dimension to turn it into a two-way table. I'd look at a stat package manual or maybe Hayes or Blalock's stat books. Finally, if you have a lot of other factors influencing the day care decision, you should investigate logistic regression. Good luck. Eric Weiss eweiss@winchendon.comReturn to Top