Newsgroup sci.stat.consult 21686

Articles

Subject: Re: ratio as a dependent var. in regression
From: Dennis Roberts
Date: Tue, 17 Dec 1996 23:29:16 -0500

DOES A SIMPLE REGRESSION ANALYSIS ... REQUIRE A NORMALLY DISTRIBUTED
DEPENDENT MEASURE? WHAT ABOUT THE INDEPENDENT VARIABLE?
At 10:46 PM 12/17/96 EST, you wrote:
>        We need to do regression analysis with RATIO (e.g., cost/benefit) as a
>dependent variable, but I was told that I might violate normal distribution
>assumption if I do so.  Any comments, suggestions?
>
>Thanks,
>
>Haiyi
>
>

Return to Top

Subject: Re: Help
From: Dennis Roberts
Date: Tue, 17 Dec 1996 23:30:42 -0500

YOU MEAN THERE IS NO ONE LOCAL ... A GRAD STUDENT OR SOMEONE ELSE ... WHO IS
IN A POSTION TO PROVIDE SOME ASSISTANCE???? I WOULD FIND THAT HARD TO
BELIEVE ...
At 12:40 AM 12/18/96 GMT, you wrote:
>I am in the middle of a business stats course that is killing me.  I would
>like to find some help on-line to help me get throgh this course. Please
>e-mail me if you are willing to help
>
>

Return to Top

Subject: Question: Can Splus be ported on Linux
From: Liang Lu
Date: Wed, 18 Dec 1996 10:14:00 -0600

Please email me if you have any information.
Thanks.
Liang

Return to Top

Subject: USA-N C-RTP-SAS Programmer
From: rtpresumes@trilogycnslt.com (Employment)
Date: Wed, 18 Dec 1996 11:06:45 -0500

Responsibilities Include:  Statistical Programming to write SAS programs
for data management and analysis.     Minimum Skills Required Include: 
Base SAS and Statistical Procedures.  Must have a minimum of two years SAS
programming experience.  Candidates must have good statistical knowledge
and communication skills.    Ref-3659
Our regional offices located in Waukegan, IL, Durham NC, Palo Alto, CA,
and Princeton, NJ provide off-site and on-site services to clients in over
30 states. Our Clinical Trial Management group assists Research and
Development organizations in the monitoring and quality assurance of
clinical trial studies. Or System Professionals function in all aspects of
the application development life cycle, manage data, and develop reports.
In Research Statistics, we design statistical experiments, evaluate the
results for the areas of Life Science, Finance, Marketing, Economics, and
Engineering.
Please reply to: rtpresumes@trilogycnslt.com
Trilogy Consulting Corporation
1000 Park Forty Plaza, Suite 190
Research Triangle Park, NC  27713
http://www.trilogycnslt.com/TCC_Home
fax: 919.361.2415

Return to Top

Subject: Re: paired nominal data
From: chuck@pmeh.uiowa.edu (Chuck Davis)
Date: 18 Dec 1996 17:22:33 GMT

In article <5990pt$l3l@news-central.tiac.net>, mwarshaw@tiac.net (Meredith Warshaw) writes:
|> I've been asked for help by someone who has paired nominal data, and
|> I'm not sure what to suggest.  She's looking at working mothers and
|> has some hypotheses regarding differences in work/child-care for
|> first and second born kids.  If these involved numerical data then
|> paired t-tests would be the obvious solution.  Is there anything
|> analagous for either dichotomous or multi-level nominal variables?
|> 
|> TIA,
|> Meredith Warshaw
Dichotomous response: McNemar's test
Polytomous response:  Stuart's (1955, Biometrika) test (Cochran-Mantel-Haenszel
                      general association statistic)
Ordered categorical:  Agresti (1983, Biometrics); CMH mean score test
If there are additional covariates, conditional logistic regression for matched
sets (Breslow and Day, 1980, _Statistical Methods in Cancer Research: Vol 1_) can
be used.  See also Cox and Snell (1989, _Analysis of Binary Data_), Lipsitz,
Laird and Harrington (Statistics in Medicine, 1990).
Chuck Davis

Return to Top

Subject: Equivalence Interim Analysis
From: "Jacqueline R. Cater"
Date: Wed, 18 Dec 1996 11:34:04 -0800

I'm looking for references on performing interim analyses on equivalence 
trials - specifically for a time to failure analysis involving two types 
of surgery.  Any references to current journal articles or macros would 
be very much appreciated.  
TIA and Happy Holidays!!
J. Cater
jrcr@phila.acr.org
P.S.  I will post to group a compilation of responses.

Return to Top

Subject: Re: Is there a test for H0:Pearson-Rho=1?
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 18 Dec 1996 20:59:50 GMT

Warren (wlmay@umsmed.edu) wrote:
: What is the alternative?  That Rho is less than 1?  If you reject, then 
: what could you say?  That there is less than perfect correlation?  
: Perfect correlation would imply a functional relationship instead of a 
: statistical one.
: In large samples, I would think the null would almost always be rejected 
 << rest deleted ...>
I think you must have missed the earliest commentary on the question.
Yes, The question, worded as "Pearson r=1", only arises from naive  
misunderstandings of what testing is about  -  
But the notion of "perfect correlation, if it were not for 
accidents and measurement error",  arises in several areas.  
Computed correlations are 'attentuated'  by the unreliability of the 
component measures.  You may adjust the observed correlation for the 
purpose of comparing to a theoretical value, for instance,  1.0.
Thus, if two 'IQ' tests were to correlate at .95, then they are
almost 100% measuring the same thing;  because that is the limit
of the reliability of any (either)  IQ test.
However, for purposes of testing, one needs to work more directly 
with the variance components, and/or the OBSERVED correlations, 
as well as with the *evidence*  for the reliability of the measures.
  -  If a correlation is .65 between two variables, then it still
owns the variability associated with a correlation of .65, even if
you think you have 'corrected' it  to 1.0.
When reliabilies are poor, adjusting for attenuation can produce
some impressively big-sounding correlations, near 1.0, even 
though the original correlations might not manage to be nominally
significant.
Nunnally has discussed attentuation in "Psychometric Theory", among
other places.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: Help with trauma outcome study
From: Ronan Conroy
Date: Wed, 18 Dec 1996 10:14:10 +0000

>I hope someone can take a little time to help us, as our whole
>department of medical statistics seems to have disappeared for the
>holidays!!
>
>We are looking at the effects of the amount of time paramedic personnel
>spend at the scene of an accident.  Our outcome measures are
>death, intensive care stay, total hospital stay, and 12 month functional
>impairment measure (0-100 integer scale)
Don't do it! The amount of time that paramedical personnel spend on the
scene of an accident is related to a number of properties of the accident
itself. That is, the sort of accidents that they stay at for a long time
are not the same as the ones where their stay is short. For instance, it
may take time to cut people out of the wreckage of a really nasty
accident. This means that your model is wrong about the direction of
cause-and-effect; things that cause death will also cause differences in
the time variable. I doubt the value of statistical adjustment for type
of accident too - there are too many intangibles that are evident to the
person on the scene which cannot be included in the model. (We looked at
the decision to admit people with chest pain; in a significant number of
instances where there were no positive clinical indications that the
person was having a heart attack, the doctor on casualty admitted the
person 'because they didn't look right' and the person indeed went on to
develop a heart attack.)
One potential solution is a controlled trial. You could try to retrain
some units to minimise the delay to hospital admission and compare the
outcome of their patients with patients attended by conventional teams.
This is actually quite hard; a contamination effect will probably occur
where conventional teams will 'race' the fast-track teams and thereby
reduce their own delay times.
Incidentally, this question reflects a whole side of the practice of
statistics which is not maths but craft. It has to do with the
appropriate mapping of mathematical models onto real-world processes.
Statistical models, as opposed to mathematical models, are models in
which this mapping is assumed. The utility of the model is as much to do
with the assumptions of the mapping as with the assumptions of the
mathematics it uses. The trouble is that it is easier to discuss the
mathematics.
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
    _/_/_/      _/_/     _/_/_/     _/     Ronan M Conroy
   _/    _/   _/   _/  _/          _/      Lecturer in Biostatistics
  _/_/_/    _/          _/_/_/    _/       Royal College of Surgeons
 _/   _/     _/              _/  _/        Dublin 2, Ireland
_/     _/     _/_/     _/_/_/   _/         voice +353 1 402 2431
                                           fax   +353 1 402 2329
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
'Do not try to be a genius in every bar' [Brendan Behan? - No, Faure!]

Return to Top

Subject: Re: What do we mean by "The Null Hypothesis"?
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 18 Dec 1996 21:50:30 GMT

 Re: What do we mean by "The Null Hypothesis"?.  
Or, the null as nil?
<< Clay Helberg   Internet: helberg@execpc.com >>  wrote:
: Richard F Ulrich wrote: 
  <<  ... our details, deleted >>
 -- Okay.  Rather than engage in a dissection of the dialog, I will try 
to address the central issue.  Clay is endorsing Hays, but he does not
accept the fact that I find Hays less-than-satisfactory.  With that
in mind, I will try to convey my argument by re-writing part of Hays.
CH:    
:      Incidentally, there is an impression in some quarters that the term
: "null hypothesis" refers to the fact that in experimental work the
: parameter value specified in Ho is very often zero. Thus, in many
: experiments the hypothetical situation "no experimental effect" is
: represented by a statement that some mean or difference between means is
: exactly zero. 
>>to be replaced>>    However, as we have seen, the tested hypothesis can
: specify any of the possible values for one or more parameters, and this
: use of the word *null* is only incidental. It is far better for the
: student to think of the null hypthesis Ho as simply designating that
: hypothesis actually being tested, the one which, if true, determines the
: sampling distribution referred to in the test.
: > :       
How about, "A tested hypothesis must specify a value that does have
a particular meaning, or _gravitas_.  Though that may be any of the
possible values for one (or more) parameters, the use of the word 
*null*  is always appropriate because the test is looking for "no 
experimental effect" (in the words of Hays, above)  -  even though
'no-effect'  sometimes is represented by a number.
"Metaphorically, the null is also reminiscent of a singularity, or 
a black-hole, which is a sort of zero  -  it is what your conclusions
have to collapse to, if your data come out totally noisy.  It is 
certainly different from the way we regard 'alternate" hypotheses'."
What we are discussing here is a pedagogical question, rather
than a statistical one.  In the TECHNICAL terms, I am right and
Hays is wrong, I think, because every hypothesis *is*  reduced 
to what Clay termed a 'tautological' form, where there is a zero.
(At least, that is the way for writing formal, mathematical hypotheses
for t-tests and ANOVAs, where you show that the computed term does 
have the intended distribution, of t or chisquared.  I don't really
remember writing hypotheses  for anything else.)
Further, I am using "effect size"  in the same technical sense that
Hays uses the phrase, above, where the effect size *is*  zero under 
the null.  (Note: Clay has been saying it differently, using 
effect-size as synonomous with, say, raw-change-score.  I would
rather keep it as a technical term.)
For the sake of pedagogy, the Hays approach does de-emphasize 
zero as COMPARISON value.  Is that a major problem?  Personally, I
have not had trouble explaining the difference between effect-size
and comparison-value.  But I do my explaining to one or two
persons at a time.  Also, I have not read Hays, so I do not know
what further use he might make of the ideas in the course of
his presentation.  If the citation came from his introduction,
then maybe he had a lot more to say.  If it came from his summary,
then I think that he just made a meager point, where he could have
argued more fruitfully.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: survey dilemna (sic)
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 18 Dec 1996 22:08:46 GMT

Barry Haworth (barryh@AGB.COM.AU) wrote:
 << concerning the sampling of 300 out of 3000 ... >>
:" Taking a sample of a sample is perfectly appropriate, so long as
the second sample is drawn in a sensible way (a random sample of all
the original responses, for example)"
  -- Please, if you have to analyze the 300, then draw them 
SYSTEMATICALLY rather than randomly.  For instance, it is VERY OFTEN
useful to know if the early (fast?)  respondents were different
from those whose forms came in last.  (In mortality followups, 
one looks to see if the causes of death are different for those
whose records were hardest to find  -  found by 'only one method'
or found only by persistent checking.)  
You might draw 100 early+middle+late for your 300, so you could
compare.  Big changes across the strata suggest a big chance 
that your Unsampled cases are even more different than early vs late.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: Time-Series Analysis
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 18 Dec 1996 22:19:00 GMT

Joseph K. Lyou (CBGLyou@AOL.COM) wrote:
: I want to analyze whether there is a significant trend over time in the
: annual failure rate of a product.  I have 20 years of measurements (i.e., n =
: 20).  As I understand it, an ordinary regression analysis would be
: inappropriate because the residuals are not independent (i.e., the error
: associated with a failure rate for 1974 is more highly correlated with the
: 1975 failure rate than the 1994 failure rate).  Is it appropriate to simply
: divide the data into two groups (the 1st 10 years vs. the 2nd 10 years) and
: do a between-groups ANOVA?  Or is there some other (better) way to analyze
: these data?
: Should anyone be so inclined as to do the analysis, here are the data:
: Year   Failure Rate
: 1974   3.3
 <<  deleted, numbers between 1.3 and 5.7, across some years  >>
  -- If there were a simple trend, one  *might*  be tempted to draw 
a line and then draw conclusions.  However, there is nothing simple.
You do not provide the numbers on which the "Rates"  are based.  There
is an inherent error-estimate in the number of events that occurred,
rather than the "rate".  From the number of events, it might be 
possible to say that there do  *seem to be changes*  taking place.
Or else, not, depending on the numbers.  For instance, if the 
"rates" really represent a low range of 2 events, going up to 6 events,
in a year, then there was VERY LITTLE happening.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: Time-Series Analysis
From: eweiss@winchendon.com (Eric Weiss)
Date: Wed, 18 Dec 96 22:49:27 GMT

In article <961216173408_1424980603@emout02.mail.aol.com>, "Joseph K. Lyou"  wrote:
>I want to analyze whether there is a significant trend over time in the
>annual failure rate of a product.  I have 20 years of measurements (i.e., n =
>20).  As I understand it, an ordinary regression analysis would be
>inappropriate because the residuals are not independent (i.e., the error
>associated with a failure rate for 1974 is more highly correlated with the
>1975 failure rate than the 1994 failure rate).  Is it appropriate to simply
>divide the data into two groups (the 1st 10 years vs. the 2nd 10 years) and
>do a between-groups ANOVA?  Or is there some other (better) way to analyze
>these data?
>
>Should anyone be so inclined as to do the analysis, here are the data:
>
>Year   Failure Rate
>1974   3.3
>1975   2.5
>1976   2.7
>1977   2.4
>1978   5.7
>1979   3.2
>1980   1.6
>1981   5.2
>1982   2.8
>1983   2.4
>1984   2.7
>1985   1.3
>1986   4.5
>1987   4.5
>1988   1.4
>1989   3.6
>1990   1.5
>1991   1.4
>1992   1.6
>1993   1.6
Joseph,
Since your problem looked a interesting, I ran it through my statistical 
package ELF.  (FYI, it took all of 30 seconds including importing the
data.)  I did a regression using Durbin's technique for autocorrelated
regressions.  My conclusion is that based on the first order autocorrelation
coefficient of -0.2 and a t of -0.8 is that you do not have a significant 
autocorrelation problem.  
Looking at the regression coefficients and t statistics, you don't have a
significant trend either.
ELF 201 is not yet available, but you might visit our web site 
http://www.winchendon.com
Eric
Eric Weiss
eweiss@winchendon.com

Return to Top

Subject: paired nominal data -Reply
From: Jerrold Zar
Date: Wed, 18 Dec 1996 12:36:31 -0600

She should consider the McNemar test, after casting the data into a k X k
table.  While k is commonly 2 for this procedure, the McNemar test can be
expanded to k > 2.
Jerrold H. Zar
Department of Biological Sciences, Northern Illinois University
DeKalb, IL 60115 USA   jhzar@niu.edu
>>> Meredith Warshaw  12/18/96 09:10am >>>
I've been asked for help by someone who has paired nominal data, and
I'm not sure what to suggest.  She's looking at working mothers and
has some hypotheses regarding differences in work/child-care for
first and second born kids.  If these involved numerical data then
paired t-tests would be the obvious solution.  Is there anything
analagous for either dichotomous or multi-level nominal variables?
TIA,
Meredith Warshaw
mwarshaw@tiac.net
Dept. of Psychiatry and Human Behavior
Brown University
Providence, RI
Meredith Warshaw
mwarshaw@tiac.net

Return to Top

Subject: Power of contingency table tests
From: pmidford@students.wisc.edu (Peter Midford)
Date: Wed, 18 Dec 1996 19:14:05 -0600

Hello,
       I'm wondering if it is possible to calculate power values (1 - Beta)
for one or all of the common contingency table tests (e.g. chi-square, G,
Fisher's exact).
Thanks,
Peter Midford
Department of Zoology
U Wisconsin - Madison

Return to Top

Subject: Re: ratio as a dependent var. in regression
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 18 Dec 1996 20:39:07 -0500

In article <26222628@vixen.Dartmouth.EDU>,
Haiyi Xie   wrote:
>        We need to do regression analysis with RATIO (e.g., cost/benefit) as a
>dependent variable, but I was told that I might violate normal distribution
>assumption if I do so.  Any comments, suggestions?
The importance of normality in a regression is greatly overblown.
However, when using the ratio as you have defined it, unless benefit 
does not get too close to 0 too often, you may very well have huge tails,
and not even a finite variance.  This is a serious problem.  And as the
distribution of the error from the prediction in ratio is likely to be
non-symmetric, the so-called robust regression procedures are likely to
be invalid.
I suggest you speak in person to a mathematical statistician about your
real problem, which may not be what you have stated.
-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu         Phone: (317)494-6054   FAX: (317)494-0558

Return to Top

Subject: Re: paired nominal data
From: eweiss@winchendon.com (Eric Weiss)
Date: Thu, 19 Dec 96 01:17:55 GMT

It is a little hard to understand what your friend wants
to do without a little more information, but let me give it
a shot...
You might convert the dichotomous data into rates:  what
percent of first children are sent to day care and what
percent of second children.
You could also try using crosstabs (also called a 
contingency table) if you have another dimension to turn
it into a two-way table.  I'd look at a stat package 
manual or maybe Hayes or Blalock's stat books.
Finally, if you have a lot of other factors influencing
the day care decision, you should investigate logistic
regression.
Good luck.
Eric Weiss
eweiss@winchendon.com

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.consult 21686

Directory

Articles