Back


Newsgroup sci.stat.consult 21346

Directory

Subject: Re: Thanks ! re: my question about calculating the chi-square function -- From: Mark Myatt
Subject: Re: Analyzing missing values -- From: Mark Myatt
Subject: Re: Are the two regression lines too many? -- From: Hans-Peter Piepho
Subject: SPSS and GLM: HELP!!! -- From: Pawel Michalak
Subject: Re: Q: correlation in two dimensions -- From: hamer@rci.rutgers.edu (Robert Hamer)
Subject: how to analyze weighted data? -- From: parilis@rci.rutgers.edu (Gary Parilis)
Subject: (no subject given) -- From: Jan Winchell
Subject: Re: What do we mean by "The Null Hypothesis"? -- From: Clay Helberg
Subject: Re: population vs. sample -- From: "John C. Nash"
Subject: lee desu test -- From: benchaib@rockfeller1.univ-lyon1.fr (Mehdi BENCHAIB )
Subject: Scheffe's method -- From: Jordi Riu
Subject: Re: Logit & Probit by TSP -- From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Subject: Re: Logit & Probit by TSP -- From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Subject: Re: Sample Size for Estimating Regression Equation -- From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Subject: Re: Sample Size for Estimating Regression Equation -- From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Subject: request for BMDP assistance -- From: James Mccracken
Subject: Re: Coin-flips and enumeration -- From: rbcrosie@apgea.army.mil (Ronald B. Crosier)
Subject: Re: Power? -- From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: What do we mean by "The Null Hypothesis"? -- From: tgee@superior.carleton.ca (Travis Gee)
Subject: Re: Query: smoothing-spline software -- From: Rodger Whitlock
Subject: Re: Power? -- From: tgee@superior.carleton.ca (Travis Gee)
Subject: M.S. biostatistician opening -- From: fharrell@virginia.edu
Subject: Need help analyzing your data? Central NJ consultant. -- From: mdemilia@eclipse.net (Dr. Michael S. DeMilia)
Subject: Job Opportunity:Statistical Quality Engineer -- From: ambgrp@aol.com
Subject: Re: What do we mean by "The Null Hypothesis"? -- From: rwhite@superior.carleton.ca (Robert White)
Subject: Re: Dropping n.s. dummy variables in logistic regression -- From: nichols@spss.com (David Nichols)
Subject: Re: Logit & Probit by TSP -- From: clint@leland.Stanford.EDU (Clint Cummins)
Subject: Re: SPSS and GLM: HELP!!! -- From: lthompso@s.psych.uiuc.edu (Laura Thompson)
Subject: Repeated measures analysis -- From: Peter Baade
Subject: Re: population vs sample -- From: , lucz@ix.netcom.com
Subject: Re: Coin-flips and enumeration -- From: nakhob@mat.ulaval.ca (Renaud Langis)
Subject: Re: growth, decline, steady state (roughly), or just outright fluctuation -- From: nakhob@mat.ulaval.ca (Renaud Langis)
Subject: Re: excel add-ins *** are there any? -- From: nakhob@mat.ulaval.ca (Renaud Langis)
Subject: Re: normality test ? -- From: axelrod@statwiz.com (Michael Axelrod)
Subject: Re: population vs sample -- From: axelrod@statwiz.com (Michael Axelrod)
Subject: help with sas programiing/ problems -- From: doug
Subject: Re: Power? -- From: Chauncey

Articles

Subject: Re: Thanks ! re: my question about calculating the chi-square function
From: Mark Myatt
Date: Tue, 3 Dec 1996 09:43:09 +0000
Rafael Santos  writes:
>
>Many many thanks for all those who answered my question. Since I am
>unable to get the books some recommended, I used Excel to create a
>table of values in the range I want and inserted it into my
>program. It worked fine.
Here is something in BASIC that does some chi-square and p-value at 1,
5, and 15 d.f (K1 = degrees of freedom). The code is a little old so
variable names are rather terse. You may be able to make use of it.
        OPEN "chi2.dat" FOR OUTPUT AS #1
        FOR x2 = .1 TO 40 STEP .1
        PRINT #1, USING "##.#"; x2;
        k1 = 1
        GOSUB 1000
        PRINT #1, USING "#.####"; TAB(10); p#;
        k1 = 5
        GOSUB 1000
        PRINT #1, USING "#.####"; TAB(20); p#;
        k1 = 15
        GOSUB 1000
        PRINT #1, USING "#.####"; TAB(30); p#
        NEXT x2
        CLOSE #1
        END
1000    x3 = .5 * x2
        k3 = .5 * k1
        x = k3 + 1
        GOSUB 2000
        s = 0
        a = EXP(k3 * LOG(x3) - l - x3)
        IF a = 0 THEN GOTO 1050
        t = 1
        s = 1
        g = k3
1030    g = g + 1
        t = t * x3 / g
        s = s + t
        IF (t / s) > .000001 THEN GOTO 1030
1050    p# = 1 - (s * a)
        RETURN
2000    c1 = 76.18009173#
        c2 = -86.50532033#
        c3 = 24.01409822#
        c4 = -1.231739516#
        c5 = .00120858#
        c6 = -.000005364#
        c7 = 2.50662875#
        x1 = x - 1
        w = x1 + 5.5
        w = (x1 + .5) * LOG(w) - w
        s = 1 + c1 / (x1 + 1) + c2 / (x1 + 2) + c3 / (x1 + 3)
        s = s + c4 / (x1 + 4) + c5 / (x1 + 5) + c6 / (x1 + 6)
        l = w + LOG(c7 * s)
        RETURN
-- 
Mark Myatt
Return to Top
Subject: Re: Analyzing missing values
From: Mark Myatt
Date: Tue, 3 Dec 1996 09:44:55 +0000
Antony Unwin  writes:
>Ed Torpy of SPSS asked:
>
>>Are any of the major statistical packages exceptionally strong at
>>analyzing patterns of missing data and imputing new values.
>
>Our software MANET (Missings Are Now Equally Treated) does not offer any
>methods of imputation, but extends all interactive graphics tools to
>include missing values.  This is very effective for analysing patterns of
>missing data.
How does this differ from giving missing values a "code" (eg 999) and
looking at them with traditional tools?
-- 
Mark Myatt
Return to Top
Subject: Re: Are the two regression lines too many?
From: Hans-Peter Piepho
Date: Tue, 3 Dec 1996 13:34:05 +0100
>In article <9612020859.AA19921@fserv.wiz.uni-kassel.de>,
>Hans-Peter Piepho   wrote:
>>>In article ,
>>>Mike  wrote:
>>>>Andrew Kukla started a thread:
>
>                        ................
>
>>If you assume that X and Y are bivariate normal, and you want to fit a line
>>to the x,y scatter plot to describe the relationship (without need to
>>predict either y from x or x from y), a principal component analysis PCA
>>seems quite appropriate. Of course you would do the PCA on the
>>variance-covariance matrix (unstandardized data), not on the correlation
>>matrix (normalized data).
>
>I agree that one should not do it on the correlation matrix, for which
>the principal component line is the 45 degree line (positive correlation)
>or the -45 degree line (negative correlation).  But the principal
>component line depends on scaling, and apart from the direction coming
>from the sign of the correlation, nothing specific can be said.
>--
If x and y are bivariate normal, the following can be said: We can draw a
confidence region centered at the mean of x and y. This will be an ellipse.
The first principle component coincides with the "long" axis of this
ellipse. (see Johnson and Wichern. Applied multivariate statistical analysis).
Hans-Peter Piepho
_______________________________________________________________________
Hans-Peter Piepho
Institut f. Nutzpflanzenkunde  WWW:   http://www.wiz.uni-kassel.de/fts/
Universitaet Kassel            Mail:  piepho@wiz.uni-kassel.de
Steinstrasse 19                Fax:   +49 5542 98 1230
37213 Witzenhausen, Germany    Phone: +49 5542 98 1248
Return to Top
Subject: SPSS and GLM: HELP!!!
From: Pawel Michalak
Date: Tue, 3 Dec 1996 15:09:00 +0100
Dear Statitisticians
I need an urgent help with ANOVA and ANCOVA in SPSS packet.
It mainly refers to sum of squares (SS). There are so called
"SAS type SS" you can find in literature, from I to IV.
What are their equivalents in SPSS? In SPSS there are Regression
Approach, Hierarchical Approach and Experimental Approach. How
can one relate SPSS's classification to SAS types?
Thank in advance for any help.  
Pawel Michalak
============================================================================
Home WWW Page: http://www.cyf-kr.edu.pl/~uemichal
Info: finger pawel@haldane.pop.bio.aau.dk & uemichal@kinga.cyf-kr.edu.pl
Return to Top
Subject: Re: Q: correlation in two dimensions
From: hamer@rci.rutgers.edu (Robert Hamer)
Date: 3 Dec 1996 09:38:35 -0500
brendy@cs.brandeis.edu writes:
>Does anybody know a formula for computing a correlation between
>two dimensional (or n-dimensional) vectors? 
The standard Pearson product moment correlation is the
cosine of the angle between two n-dimensional vectors, assuming the
values have been standardized appropriately.
I leave assumptions and appropriateness of whatever you are
trying to do to you.
-- 
--(Signature)      Robert M. Hamer hamer@rci.rutgers.edu 908 235 4218
  Do not send me unsolicited email advertisements.  I have never and
  will never buy.  I will complain to your postmaster.
  "Mit der Dummheit kaempfen Goetter selbst vergebens" -- Schiller
Return to Top
Subject: how to analyze weighted data?
From: parilis@rci.rutgers.edu (Gary Parilis)
Date: 3 Dec 1996 09:54:07 -0500
I've been asked to analyze some data, and I need some guidance on how
to go about it.
The data are from a survey in which respondents are asked to allocate
a total 100 points to a series of items, something like:
=================
Please indicate the importance you place on each of the following
criteria for purchasing a particular car by allocating a total of 100
points.  The sum of your responses should equal 100.
1. gas mileage ___
2. dependability ___
3. price ___
4. comfort ___
=================
(Forgive the poor psychometrics of this question.  It's just an
example to illustrate the kind of question I'm looking to address.)
I'll need to come up with a way to summarize the data, as well as to
analyze it with data from other types of questions and demographic
data.  It will probably be primarily used for segmentation.
It has been suggested to me that a cluster analysis may be the way to
go.  Any other ideas?
Also, would there be any value in simply looking at the means of the
four criteria?
Thanks for your assistance.
(I prefer responses via e-mail, which I'll be glad to summarize for the
group if there is interest.)
- Gary
parilis@rci.rutgers.edu
-- 
--------------
         Gary M. Parilis             parilis@rci.rutgers.edu
          **************************************************
          *  Note:  I'm no longer employed at Rutgers, so  *
          *   this account may self-destruct at any time.  *
          **************************************************
"Like a steam locomotive rolling down the track..."
Return to Top
Subject: (no subject given)
From: Jan Winchell
Date: Tue, 3 Dec 1996 10:39:27 EST
// JOB ECHO=NO
DATABASE SEARCH DD=RULES CPULIM=99:00
//RULES DD *
SEARCH STATA IN STAT-L
INDEX
/*
Return to Top
Subject: Re: What do we mean by "The Null Hypothesis"?
From: Clay Helberg
Date: Tue, 03 Dec 1996 09:34:41 -0600
Matt Beckwith wrote:
> 
> In medical school I was taught that, in a randomized prospective trial
> of a drug compared to placebo, the null hypothesis is the hypothesis
> that there is no difference in efficacy between the drug and placebo.
> 
> From this, I inferred that the term "null" in the phrase "null
> hypothesis" referred to the fact that we are hypothesizing NO
> difference (i.e. null difference).
> 
> In a statistics book which I'm currently reading, it speaks of the
> null hypothesis as if it were simply one's initial hypothesis.  This
> would seem to imply that null is simply a subscript denoting the first
> of a series of things--starting one's enumeration from zero rather
> than one, as it were.
> 
> Which of these interpretations of the term "null" in the phrase "null
> hypothesis" is correct?
> 
> Thanks.
As I understand it, the second of these (the subscript explanation) is
technically correct. However, as you observed, the null hypothesis often
specifies an absence of effect or difference. This seems to be the
default, especially in the social sciences (but also in medicine).
However, theoretically any point value can be specified as the null
hypothesis. So, you could test the hypothesis that the effect of a drug
is exactly twice the effect of the placebo (i.e. Ho: odds ratio = 2).
If you're interested in some of the philosophical underpinnings of the
null hypothesis and hypothesis testing in general, there is an
interesting thread on the subject right now in sci.stat.edu (AKA
EDSTAT-L, for those who prefer listservers to newsgroups).
						--Clay
--
Clay Helberg         | Internet: helberg@execpc.com
Publications Dept.   | WWW: http://www.execpc.com/~helberg/
SPSS, Inc.           | Speaking only for myself....
Return to Top
Subject: Re: population vs. sample
From: "John C. Nash"
Date: Tue, 3 Dec 1996 10:41:34 -0500
At one time in my career I worked for the same family of
government organizations as the initiator of this thread.
The issue is that of 1500 potential respondents, only some
600-700 have replied. Or perhaps we should remove "only" and
be surprised so many have taken the time to respond? As others
have pointed out, we need to know "Why?" do they respond or not
in order to get some measure of the reliability of the data for
inferences.
This comment is to note that in a similar situation -- agriculture,
one of the most highly politicized sectors of society -- one group
of respondents organized a seminar on how to answer a questionnaire.
The results were used in a policy that gave the group affected a
very important income boost! So if the questions can be construed
as possibly affecting income or ability to make a living, maybe the
data should be catalogued under fiction.
While surveys often use a small calibration subsample before the
full effort, perhaps it is worthwhile considering a calibration
after the fact. I'm not a survey statistician, but would be
interested in knowing of any experiences in such "post survey
calibration", since in the business/admin. arena one gets a lot of
this "rescue" work. It's not good statistics, but one would like to
know how bad the situation really is when unscrupulous types start
to use the data anyway.
JN
John C. Nash, Professor of Management, Faculty of Administration,
University of Ottawa, 136 Jean-Jacques Lussier Private,
P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada
email: jcnash@uottawa.ca, voice mail: 613 562 5800 X 4796
fax 613 562 5164,  Web URL = http://macnash.admin.uottawa.ca
Return to Top
Subject: lee desu test
From: benchaib@rockfeller1.univ-lyon1.fr (Mehdi BENCHAIB )
Date: Tue, 3 Dec 1996 17:18:18
 Unistat (statistical software)use lee-desu test for survival curve
(kaplan meyer). I don't know this test.
 It'is like log rank or something else ? If somebody knows it 
or have some reference,thank you for his help.
Mehdi BENCHAIB
Laboratoire de Cytologie Analytique
8 avenue Rockefeller
69373 LYON CEDEX 08
E-mail : benchaib@rockefeller1.univ-lyon1.fr
Téléphone : 78 77 70 00 poste 43 23
Return to Top
Subject: Scheffe's method
From: Jordi Riu
Date: Tue, 3 Dec 1996 17:33:30 +0100
I'm trying to construct a joint confidence region, by bringing
all the data in the k different data sets together. This region has an
eliptical shape. I'm conscious that there is a problem with the values
of the confidence coefficients, and that this problem has to do with the
Scheffe's Method. I know this method provides simultaneous confidence
statements for all linear combinations of a set of parameters.
        Could anyone give further explanations and/or some mathematic
expressions or references?
Thanks in advance.
 ------------------------------------------------------------------
 Jordi Riu Rusell                     tel.:   34-(9)77-558187
 Departament de Quimica               fax.:   34-(9)77-559563
 Univ. Rovira i Virgili de Tarragona  e-mail: rusell@quimica.urv.es
 Pl. Imperial Tarraco, 1
 43005-Tarragona    Catalonia - Spain
Return to Top
Subject: Re: Logit & Probit by TSP
From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Date: 3 Dec 1996 10:01:06 -0800
In article <32A3F2B3.1560@students.wisc.edu>,
Tatsuo Ochiai   wrote:
>I am wondering what the algorithms and the convergence criteria TSP uses
>for Logit and Probit model.
>
>
>Thanks in advance.
>
>
>Tatsuo Ochiai
>tochiai@students.wisc.edu
See the "Method" section under Probit and under Logit in the TSP
Reference Manual.
regards,
Don Cram
-- 
doncram@gsb-ecu.stanford.edu
http://www-leland.stanford.edu/~doncram
Return to Top
Subject: Re: Logit & Probit by TSP
From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Date: 3 Dec 1996 10:03:27 -0800
In article <32A3F2B3.1560@students.wisc.edu>,
Tatsuo Ochiai   wrote:
>I am wondering what the algorithms and the convergence criteria TSP uses
>for Logit and Probit model.
>
>
>Thanks in advance.
>
>
>Tatsuo Ochiai
>tochiai@students.wisc.edu
See the "Method" sections under Probit and under Logit in the TSP
Reference Manual.
regards,
Don Cram
-- 
doncram@gsb-ecu.stanford.edu
http://www-leland.stanford.edu/~doncram
Return to Top
Subject: Re: Sample Size for Estimating Regression Equation
From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Date: 3 Dec 1996 10:05:20 -0800
In article <57utn6$26n@usenet.srv.cis.pitt.edu>,
Richard F Ulrich  wrote:
>Zubin Dowlaty (nat2zxd@is.ups.com) wrote:
>: How does one go about calculating the desired sample size when one is
>: estimating a linear regression from the data?  
>: 	Should I estimate(guess) the variability of the coefficents and then
>: plug these variance estimators into the standard sample size formula for
>: determining the desired sample for a population mean(with a predefined
>: tolerable width setting)??
>
>If this is with one variable, then the usual measure of effect size is
>the r or R-squared, and the problem is in any book on power analysis.
>
>If you are proposing multiple regression, then the effect size is
>still R-squared, but the considerations may become more complicated.
>Cohen's book on "Statistical Power Analysis for the Behavioral 
>Sciences"  has much more detailed tables and discussion in the 
>Second Edition (1988)  than it had earlier.
>
>
>Rich Ulrich, biostatistician              wpilib+@pitt.edu
>Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
>
See the very readable explanation in Sande Milton's "A Sample Size
Formula for Multiple Regression Studies" (Public Opinion Quarterly,
1986 V50 p112-118).  The simple result is that to expect a t-stat t on
coefficient B_j in a regression on k number of variables,
      n = k + 1 + (t^2 (1-R^2))/( delta r_j^2 )
is the sample size required, where R^2 is the expected R^2 of the
regression and delta r_j^2 is the expected increase in R^2 due to the 
addition of the jth variable.
regards,
Don Cram
-- 
doncram@gsb-ecu.stanford.edu
http://www-leland.stanford.edu/~doncram
Return to Top
Subject: Re: Sample Size for Estimating Regression Equation
From: doncram@leland.Stanford.EDU (Donald Peter Cram)
Date: 3 Dec 1996 09:57:01 -0800
In article <57utn6$26n@usenet.srv.cis.pitt.edu>,
Richard F Ulrich  wrote:
>Zubin Dowlaty (nat2zxd@is.ups.com) wrote:
>: How does one go about calculating the desired sample size when one is
>: estimating a linear regression from the data?  
>: 	Should I estimate(guess) the variability of the coefficents and then
>: plug these variance estimators into the standard sample size formula for
>: determining the desired sample for a population mean(with a predefined
>: tolerable width setting)??
>
>If this is with one variable, then the usual measure of effect size is
>the r or R-squared, and the problem is in any book on power analysis.
>
>If you are proposing multiple regression, then the effect size is
>still R-squared, but the considerations may become more complicated.
>Cohen's book on "Statistical Power Analysis for the Behavioral 
>Sciences"  has much more detailed tables and discussion in the 
>Second Edition (1988)  than it had earlier.
>
>
>Rich Ulrich, biostatistician              wpilib+@pitt.edu
>Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
>
A straightforward, readable explanation is Sande Milton's "A Sample
Size Formula for Multiple Regression Studies" (Public Opinion
Quarterly, 1986 V50 p112-118).  The simple result is that to expect a
t-stat t on coefficient B_j in a regression on k number of variables,
      n = k + 1 + (t^2 (1-R^2))/( delta r_j^2 )
is the sample size required, where R^2 is the expected R^2 of the
regression and delta r_j^2 is the expected increase in R^2 due to the 
addition of the jth variable.
regards
Don Cram
-- 
doncram@gsb-ecu.stanford.edu
http://www-leland.stanford.edu/~doncram
Return to Top
Subject: request for BMDP assistance
From: James Mccracken
Date: Tue, 3 Dec 1996 11:51:46 -0600
Am trying to use a multiway table through
program 4f in BMDP to prepare an input
table using TYPE=DATA to enter data into
the polychotomous regression program, PR, 
in BMDP.  This seems to work for all other
BMDP programs except PR.
Would gratiously accept any help in solving
this so that I can stop timing out on my
runs since the input data file exceeds
100,000 in record size.
Thanks alot.
Jim McCracken
jmmcrac@mailhost.tulane.tcs.edu
Return to Top
Subject: Re: Coin-flips and enumeration
From: rbcrosie@apgea.army.mil (Ronald B. Crosier)
Date: Tue, 3 Dec 96 17:27:47 GMT
In article <57vdnv$fe9@Radon.Stanford.EDU>,
James Nash  wrote:
>
>Two coins are flipped.  It is known that at least one of them was a head.
>What is the probability that both of them are heads ?
>
>You've probably heard a variation of this question, and the answer seems
>to depend highly on semantics.  The probablity is either 1/2 or 1/3, ...
It's easy to make vague and ambiguous statements.  Just because a problem
is numeric doesn't make it clear or meaningful.  Here is an old one:
How many fairies can dance on the head of a pin?  Here are two questions
that help separate the meaningful stuff from the nonsense:
        What do you mean?         How do you know?
Scientists use operational definitions and require a description of
experimental methods sufficient to allow others to repeat their 
experiments.  So when the problem says, "It is known ...", ask "How do
you know?"
--
Ronald Crosier    E-mail: 
Disclaimer: My opinions are just that---mine, and opinions.
P.S. Anybody have an operational definition of "representative sample"?
Return to Top
Subject: Re: Power?
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 3 Dec 1996 19:11:26 GMT
Matt Beckwith (beckwith@pop.southeast.net) wrote:
: If I understand what's gone before:
: (1) Alpha is the probability that you have rejected that which is
: true.
Well, not exactly.  More like, "that you WOULD reject the Null if 
it were true."   IF you were looking at one test, and one test only.
(Also, you are assuming the assumptions of the test were met, 
including, no important extrinsic factors that have not been accounted
for.)
: (2) Beta is the probability that you have not rejected that which is
: false.
Not exactly, either.  There is a curve, rather than a single value,
unless you make some statement about what part of the curve to look
at:  The expected rate of false negatives is a decreasing function 
of how big an effect you have in mind.
: (3) Power is 1 minus Beta.
: Then does it follow that:
: (4) Power is the probability that what you've rejected is in fact
: false.
No, *definitely*  not.  We sure would like to compute the probability
that the null hypothesis is false.  But you have to stick in some
strong assumptions to get that far, such as, prior probabilities
for the answers.  
For some insight into the difficulty of constructing useful 
probability statements, one interesting source is Stigler's
"The History of Statistics".
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
Return to Top
Subject: Re: What do we mean by "The Null Hypothesis"?
From: tgee@superior.carleton.ca (Travis Gee)
Date: 3 Dec 96 18:03:12 GMT
In <32aba285.12133608@news.southeast.net> beckwith@pop.southeast.net (Matt Beckwith) writes:
>In medical school I was taught that, in a randomized prospective trial
>of a drug compared to placebo, the null hypothesis is the hypothesis
>that there is no difference in efficacy between the drug and placebo.
>From this, I inferred that the term "null" in the phrase "null
>hypothesis" referred to the fact that we are hypothesizing NO
>difference (i.e. null difference).
Not a bad inference. In the classical hypothesis testing model, the
essential feature of the null hypothesis (in the example you describe)
is that there *is* no difference, i.e., the two groups have been
randomly drawn from the same population. The implication of this
is that any difference you observe between the two groups (proportion
exhibiting characteristic X,  mean score on Y, or whatever) is going
to be due solely to sampling variability.  Your test statistic (t, p,
U, or whatever) will only exceed the value corresponding to the
(1-alpha) percentile of the distribution (alpha x 100)% of the time
under this "null" hypothesis.  It is "null" because it states that the
difference is exactly nil.
>In a statistics book which I'm currently reading, it speaks of the
>null hypothesis as if it were simply one's initial hypothesis.  This
>would seem to imply that null is simply a subscript denoting the first
>of a series of things--starting one's enumeration from zero rather
>than one, as it were.
>Which of these interpretations of the term "null" in the phrase "null
>hypothesis" is correct?
There's a certain truth to both of them.  The second one seems
different, because in the sense that the difference between the groups
*is* zero, the null is almost *never* going to be exactly true.  Any
of an infinite number of alternatives is as plausible than the "point"
null of "exactly no difference," because on any continuous probability
density function, the probability of getting *precisely* (any value at
all, including zero) is vanishingly small. As an example, consider
that if 2.5 billion people are male and 2.5 billion people are female,
the probability that their scores on *anything* add up to *precisely*
the same number is around 0.  If this were an exhaustive study of the
world's entire population, then the null hypothesis that the two
groups have the same value for the parameter would certainly be false.
And even if by some miracle of sampleing it were true, it would be
false within moments as more males and females arrive and depart!
The essential truth, as I see it, is that sampling variation will
account for a range of differences between random samples in a large
proportion of the cases, if only sampling variation is at work.  If
something else is at work that we can identify (e.g., group
membership) and it contributes to the variation in a systematic way,
then (say, for a t-test) guessing the group mean for any given
individual will reduce error more than it will increase it (on
average).  The null hypothesis, phrased as "guessing the group mean
will tell us nothing because *any* group assignment will be equivalent
to a random assignment" should not be preferred if it is false. In
that sense, the null hypothesis is really that "unknown factors are
involved in producing the observed variation in the scores."  If one
factor can be identified in such a way as to improve prediction in a
systematic way, then the "unknown factors" part is false.  At least
one is now known, and so this Ho, as a literal statement, is quite
plausibly false. The means differ, but by more than we would expect if
we just randomly tossed people into two groups.  By how much more?  By
a factor of t[critical]!  Probability density functions just allow us
to specify how unlikely an observed difference is in comparison to the
amount of difference we expect "by chance."
All this said, I find it helpful to think of the null hypothesis as
specifying a range of values that the difference could take if unknown
factors are at work in producing the variation.  Sometimes we identify
factors which reduce error enough to say that the factor's effect is
"statistically significant."  Some subset of these statistically
significant differences are large enough to call "meaningful in the
real world."  But that's another discussion...  :>
Hope this helps,
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
Travis Gee () tgee@superior.carleton.ca ()
           ()                           ()        ()()()()
           ()                           ()              ()
           ()                           ()()()()()()()()()
"In science, the more we know the more extensive the 
contact with nescience."  -Spencer
Return to Top
Subject: Re: Query: smoothing-spline software
From: Rodger Whitlock
Date: 3 Dec 96 09:56:03 PST
Ed Hughes  wrote:
>Is there any (preferably free) software available to 
>compute smoothing splines for data on points in 2 and 3 
>dimensions?  I'd prefer that it allow irregular data 
>points, but if it required points to be on a regular
>grid, I can live with it.
I've used Gaussian and binomial smoothing with excellent results in 2 
dimensions.
----
Rodger Whitlock
Return to Top
Subject: Re: Power?
From: tgee@superior.carleton.ca (Travis Gee)
Date: 3 Dec 96 18:39:29 GMT
In <32aa9f72.11345835@news.southeast.net> beckwith@pop.southeast.net (Matt Beckwith) writes:
>If I understand what's gone before:
>(1) Alpha is the probability that you have rejected that which is
>true.
Not exactly. In classic hypothesis testing, alpha is the probability that you
will falsely reject the "null hypothesis" when it is true.
>(2) Beta is the probability that you have not rejected that which is
>false.
No, it's the probability that you'll fail to reject when a specified
alternative is true.
>(3) Power is 1 minus Beta.
Yes.
>Then does it follow that:
>(4) Power is the probability that what you've rejected is in fact
>false.
No, power is the probability that you will reject the "null," given
that the effect is of the given size, in the given direction, and that
you're using a (one or two) tailed test with a given alpha and N.  In
short, it is p(reject|alternative true, parameters). You're saying
that it's p(alternative true|reject, parameters.)  They're not the
same thing. See a paper by Ronald Carver in the Harvard Educational
Review, about 1977.  He asks the question, "What is the probability
that someone is dead, given that he's been hanged by the neck?"  The
rhetorical answer is "About .99."  He then poses the question, what is
the probability that someone was hanged by the neck, given that he's
dead?"  Probably about .00001 in modern times.  p(D|H) is ***NOT***
the same thing as p(H|D).  In like manner, the probability of a
Hypothesis given a Difference is not the same thing as the probability
of a Difference given a Hypothesis, and the probability that some
alternative is true, given you've rejected is not the same as the
probability that you'll reject, given that some alternative is true.
Hope this clears it up for you,
cheers,
((((((((((((((((((((((((((((((((((((((((((((((((((((((((((
Travis Gee () tgee@superior.carleton.ca ()
           ()                           ()        ()()()()
           ()                           ()              ()
           ()                           ()()()()()()()()()
"In science, the more we know the more extensive the 
contact with nescience."  -Spencer
Return to Top
Subject: M.S. biostatistician opening
From: fharrell@virginia.edu
Date: Tue, 3 Dec 1996 20:07:14 GMT
A growing biostatistics program at the University of Virginia
(in Charlottesville) is seeking a person with an M.S. in
biostatistics to work on health services research projects.
The applicant must be proficient in modern high level statistical
languages such as S-PLUS and in database management.  Command of
categorical and nonparametric analysis and multivariable
modeling as well as excellent communication skills are also
required.
Resumes can be sent to
  Frank E Harrell Jr
  Professor of Biostatistics and Statistics
  Director, Division of Biostatistics and Epidemiology
  Department of Health Evaluation Sciences
  School of Medicine
  University of Virginia
  Box 600
  Charlottesville VA 22908
fharrell@virginia.edu
The University of Virginia is an equal opportunity employer.
Return to Top
Subject: Need help analyzing your data? Central NJ consultant.
From: mdemilia@eclipse.net (Dr. Michael S. DeMilia)
Date: Tue, 03 Dec 1996 16:36:38 -0500
Need help analyzing your data?
Recent Rutgers grad (plant biology & statistics/biometry) available
for part-time statistical/data analysis consulting.
SAS got you down?  Statistics a nightmare?  Don¹t know how to set up that
experiment?  Need a complex Excel spreadsheet?  Not quite sure how to
graph it?  Have an unmanageable database?  Trouble processing your weather
data?  Clueless about that C/C++ code?  Putting up a world wide web page?
I can help you figure it out or do it for you.
Very reasonable rates (dependent upon complexity and time requirements).
Special discounts for students (a brief consultation can put you on track!).
-- 
Dr. Michael S. DeMilia
4 Carriage Way
Belle Mead NJ 08502
(908) 359-7225
mdemilia@eclipse.net
http://www.eclipse.net/~mdemilia/
Return to Top
Subject: Job Opportunity:Statistical Quality Engineer
From: ambgrp@aol.com
Date: Tue, 03 Dec 1996 15:11:04 -0600
Here's a job opportunity.  If you are interested and qualified, please see contact information below.  If you know of someone who might be interested and qualified, we appreciate the referral!  Thank you.
Position Description: Statistical Quality Engineer
Will evaluate, develop, train, and support statistical methods including
SPC, classical and Taguchi design of experiments, correlation/regression
analysis, probability theory, hypothesis testing, signficance tests,
acceptance sampling plans, Wiebull analysis and applied reliability
methods throughout 19 OE plants and corporate
office.
Position Requirements:
Undergraduate degree in Statistics or Engineering with significant
concentration in Statistics.  Advanced degree
preferred.
Requires 3-5 years industrial and process improvement experience. 
Superior presentation and classroom teaching experience required.  This
position interacts with mid and upperlevel management to provide stats
support, training, and updates regarding statistical method
implementation and applications.  Substantial interaction with technical
personnel from all divisions and departments is required as is
substantial PC and stats software
skills.
This is a newly-created position and reports to the Quality Assurance
Manager..
Salary range: 
Depending on experience, $45,000 to 59,300
Company Information:  
Modine Manufacturing, located in Racine, Wisconsin, is an independent
worldwide leader in heat transfer technology, serving the vehicular,
industrial, commericial, and building HVAC markets. 
Contact:
Andy Lane
The Ambrose Group
Milwaukee, WI
1-800-925-8244
ambgrp@aol.com
414-273-8250 (fax)
-------------------==== Posted via Deja News ====-----------------------
      http://www.dejanews.com/     Search, Read, Post to Usenet
Return to Top
Subject: Re: What do we mean by "The Null Hypothesis"?
From: rwhite@superior.carleton.ca (Robert White)
Date: 3 Dec 96 20:03:37 GMT
In  tgee@superior.carleton.ca (Travis Gee) writes:
>that sense, the null hypothesis is really that "unknown factors are
>involved in producing the observed variation in the scores."  If one
>factor can be identified in such a way as to improve prediction in a
>systematic way, then the "unknown factors" part is false.  At least
>one is now known, and so this Ho, as a literal statement, is quite
>plausibly false.
[.]
>"statistically significant."  Some subset of these statistically
>significant differences are large enough to call "meaningful in the
>real world."  But that's another discussion...  :>
As I understand what I have learned about the null it is a test
designed to confirm that no relationship exists. Scientists first
assume that there is no statistical significance and then attempt to
prove that none exists. It is in the act of setting out to confirm
no significance that delineates the importance of the test for the
null. Our objective is always to set out with no preconceived bias
and to confirm the test of the null. If we cannot maintain the null
as an empirical statement we have _no_ recourse but to reject the
null. And this is the whole purpose of the test. The null is there to
be rejected or accepted based upon a decision rule. Our task is to
maintain the null in the face of all odds working against it. If
it cannot be maintained we have either done our our job or made a 
mistake. 
-- 
   ----------------------------------------- Carleton University ----------
               Robert G. White               Dept. of Psychology   
                                             Ottawa, Ontario. CANADA
   INTERNET ADDRESS ----- rwhite@ccs.carleton.ca ------------------- E-MAIL
   ------------------------------------------------------------------------
Return to Top
Subject: Re: Dropping n.s. dummy variables in logistic regression
From: nichols@spss.com (David Nichols)
Date: 3 Dec 1996 22:28:17 GMT
In article <1996112417101026123@pstn29.extern.kun.nl>,
John Hendrickx  wrote:
>Shu-Fai Cheung  wrote:
>
>> The following issue have been posted several days before by other, but I
>> saw no response to it.  I think it is an interesting issue and would like
>> raise the issue again for discussion:
>> 
>> Suppose researcher A wants to test a theory asserting that whether Y 
>> occurs is influened by X1 bu not by X2.  A logistic regression anaylsis
>> is conducted, and find coeff. of X2 non-significant.  Assuming X1 and X2
>> are not correlated, researcher A accepts a final model that has X2 dropped
>> and only includes X1 as the predictor.
>> 
>> This sounds reasonable.
>> 
>> Now another case.  Suppose researcher A has a study with three different
>> groups.  The theory being tested asserts that Gp1 and Gp2 differ in the
>> probability of Y's occurence, while Gp1 and Gp2 on average do not differ
>> from Gp3 on that probability.  (Assume that Gp1, Gp2 and Gp3 represent
>> three different experimental treatment.)  Dummy variables are created and
>> then logistic regression is conducted:
>
>In general creating a parsimonious model by reducing the number of
>parameters is a good thing. The model is easier to interpret and the
>parameters are more robust. The reduced model must be substantively
>meaningful though, but that would be the case for this problem. A number
>of reduced models have been designed for square tables such as father to
>son occupational mobility. See the chapter on loglinear models in the
>SPSS advanced statistics manual for examples.
>
>The reduced model should also be designed a priori as is the case here
>rather than on a basis of the results. Doing so can lead to the problem
>of 'capitializing on chance' where your reduced model is customized to
>fit the sample data, but has no relationship to the processes in the
>population.
>> 
>>         D1    D2
>> Gp1  -0.50  -.33
>> Gp2   0.50  -.33
>> Gp3    .00  0.66
>> 
>> Suppose D2 is non-significant.  Is the researcher justified to drop D2
>> in the final model and include D1 as the only predictor?
>> 
>These dummies use the repeated contrast (I think) so D1 indicates the
>difference between Gp1 and Gp2 and D2 indicates the difference between
>Gp2 and Gp3. Dropping D2 will therefore not test your hypothesis
>correctly: it doesn't take the relationship between Gp1 and Gp3 into
>consideration. To test the hypothesis above you would have to create a
>special contrast like this:
>
>  /contrast(GP)=special(1     -1   0  /* comparison of GP1 and GP2
>                        -.5  -.5   1) /* GP3 vs the mean of GP1 and GP2
>
>The preferred way of testing your hypothesis would be to run LOGISTIC
>with both parameters for GP and with GP(1) only and use a likelihood
>ratio test for the difference between the two models. The Wald statistic
>for the significance of GP(2) is too small, especially if the parameter
>has a larger absolute value.
>
>> (Some may think that logistic regression is not the only method and certainly
>> is not the simplest method in this case.  I still choose this case
>> because it, I think, is easier for me to present the problem.)
>> 
>These methods apply in principle to any model with categorical
>independent variables and linear predictors, including loglinear, anova,
>logistic, Cox regression models. If you have a dichotomous dependent
>variable then logistic regression is the way to go.
>
>I wrote a set of SAS macros for creating a design matrix with different
>types of contrasts. This could then be used in PROC GENMOD, the SAS
>procedure for Generalized Linear Models (which includes the above models
>except for Cox regression). Anyone interested in these macros can find
>them at .
>
>> Thanks for any opinion.
>
>John Hendrickx
>Department of Sociology, University of Nijmegen, The Netherlands.
If the contrasts are orthogonal, then the design or basis matrix is (at
least up to a scaling constant) just the transpose of the contrast matrix.
Thus, the codings given originally (which are for DIFFERENCE, or the
reverse of HELMERT contrasts) will compare groups 1 and 2, and then
compare their average with the third.
--
-----------------------------------------------------------------------------
David Nichols             Senior Support Statistician              SPSS, Inc.
Phone: (312) 329-3684     Internet:  nichols@spss.com     Fax: (312) 329-3668
-----------------------------------------------------------------------------
Return to Top
Subject: Re: Logit & Probit by TSP
From: clint@leland.Stanford.EDU (Clint Cummins)
Date: 3 Dec 1996 15:35:58 -0800
>Tatsuo Ochiai   wrote:
>>I am wondering what the algorithms and the convergence criteria TSP uses
>>for Logit and Probit model.
Donald Peter Cram  wrote:
>See the "Method" sections under Probit and under Logit in the TSP
>Reference Manual.
    That handles the algorithm (Newton-Raphson, using analytic second
derivatives; a pretty standard method).  The convergence criteria is
described in Section 10.1 of the TSP User's Guide.  The default is that
the relative change in the parameters (in the final iteration) is less
than .01 or .001 (controlled by the TOL option; see Section 10.7 or
NONLINEAR in the Reference Manual).
Clint Cummins
(TSP tech support)
Return to Top
Subject: Re: SPSS and GLM: HELP!!!
From: lthompso@s.psych.uiuc.edu (Laura Thompson)
Date: 4 Dec 1996 00:22:24 GMT
Pawel Michalak  writes:
>Dear Statitisticians
>I need an urgent help with ANOVA and ANCOVA in SPSS packet.
>It mainly refers to sum of squares (SS). There are so called
>"SAS type SS" you can find in literature, from I to IV.
>What are their equivalents in SPSS? In SPSS there are Regression
>Approach, Hierarchical Approach and Experimental Approach. How
>can one relate SPSS's classification to SAS types?
>Thank in advance for any help.  
>Pawel Michalak
If it's still the same as it has been in the past, type III SS are
equivalent to method=unique and type I are equivalent to method=
sequential.  I do recall a method=experimental, but I don't remember 
what that does.
Well, those I think are the most typically used.  Type II SS 
partial out all effects that contain no term in the hypothesis
term.  I can't remember what type IV are, but they're like III,
except when you have missing cells or something like that.
>============================================================================
>Home WWW Page: http://www.cyf-kr.edu.pl/~uemichal
>Info: finger pawel@haldane.pop.bio.aau.dk & uemichal@kinga.cyf-kr.edu.pl
Return to Top
Subject: Repeated measures analysis
From: Peter Baade
Date: Tue, 3 Dec 1996 21:32:52 -0500
Hi all.
I have a question relating to repeated measures analysis.
Consider a variable A, which is measured over four points in time (A1, A2,
A3, A4).
To assess the effect of time on the variable A is a straightforward repeated
measures analysis problem.
However, I have another variable B that is also measured over the same four
points in time (B1, B2, B3, B4). This variable B may or may not be related
to variable A.
Hence, I want to test whether A is related to B, and whether the
"relationship" between A and B changes over the four time periods.
That is:    let Ri be the relationship between Ai and Bi  (i=1, .., 4)
I want to test          Ho: R1=R2=R3=R4
This initially appeared to be a straightforward problem (in theory), but I
am having problems working out the way to actually conduct the analysis (I
envisaged using SAS  - PROC CATMOD).
My email address is baade@spider.herston.uq.oz.au.
If more information is required to answer this question, please email me and
I will give additional details.
Many thanks in anticipation.
Peter Baade
Return to Top
Subject: Re: population vs sample
From: , lucz@ix.netcom.com
Date: Tue, 03 Dec 1996 22:26:33 -0500
Lucie wrote:
> =
> Hi!
> =
> Somebody in my organization did a survey and sent a questionnaire to al=
l
> the subjects of a particular population (1500 farmers).  He sent 1500
> questionnaires and received 611.  Can we treated this subset like a
> simple random sample of the population of the 1500 persons? Can we
> generalize the results to all this population?
> =
> Thank you!
> Lucie Dugas
> Qu=E9bec, Canada
> ldugas@agr.gouv.qc.ca
This can be called a haphazard sample. The 611 farmers may have been the
one in favor of the topic of your survey, the most educated one, or
simply a "typical" group of farmers. You cannot draw any valid
statistical inference from the data you collected from them. =
If your targeted sample was the 1500, you may have to go after those who
did not respond to increase your response rate.  You current response
rate is far too low, less than 50%.
In practice, you have to plan, design, before selecting a sample. =
To select a simple random sample from your population of N=3D1,500 a
method would have consisted in listing the farmers from 1 to 1,500
first.  Now, if you want a Simple Random Sample of say 611 farmers,  you
should make use of a table of random numbers to select it  before
mailing the questionnaires.  In this fashion, everyone on the list have
the same probability 1/1500 of being selected, and since you selected
your sample of 611 in a random fashion, you can draw valid inferences
from it. Your second step is to mail the questionnaire to the 611
farmers (sample that you want). It has been shown in the survey
literature that mail surveys tend to have low response rates, and should
try to combine it with face to face interviews or phone interviews to
bring the response rate to an acceptable rate. You should make every
effort to bring non-response to a minimum and try to understand why a
certain fraction choose not to participate.
Luke
Return to Top
Subject: Re: Coin-flips and enumeration
From: nakhob@mat.ulaval.ca (Renaud Langis)
Date: Wed, 04 Dec 1996 03:11:35 GMT
On 2 Dec 1996 20:21:19 GMT, jnash@Xenon.Stanford.EDU (James Nash) wrote:
>I don't know if this is the proper forum for this question, and I apologize
>if this is a rather elementary question.  Here it goes:
>
>Two coins are flipped.  It is known that at least one of them was a head.
>What is the probability that both of them are heads ?
>
>You've probably heard a variation of this question, and the answer seems
>to depend highly on semantics.  The probablity is either 1/2 or 1/3, but
>I'm rather confused about how to enumerate the cases.  I did the following:
>
>Coin A		Coin B
>head		head
>head		tail
>tail		head
>tail		tail
>
>It seems there are four cases, and one is ruled out by the given information.
>Since only one case has two heads, the probability is 1/3.  It appears that
>the problem's trap is of ignoring one of the coins, since it's value is
>already  known.  By proceeding in this way, you answer 1/2, but you ignore
>the permutation of A:head, B:tail with A:tail, B:head.
>
>But, now I'm confused.  By naming the coins "A" and "B," (or whatever else
>you name them), aren't you making an arbitrary decision of which coin gets
>which name ?  Is it any more arbitrary to assume that coin "A" IS a head ?
>
>The problem I'm having is that I naturally approach the problem differently
>because it deals with probability, and I'm wondering if that is correct.
>Since this is a probability issue, I naturally enumerate the four possible
>cases, and arrive at 1/3 for a solution.  But, if this were part of a
>mathematical proof, I wouldn't hesitate to assign the two coins variables
>and in doing so choose which was the known quantity.  Since the existence
>of the head is given, I'd select THAT coin and give it a name, without
>loss of generality.
>
>I appreciate very much any time taken to explain this to me.  In
>particular, is there any terminology analogous to "permutation vs.
>combination" for dealing with probability, or is ordering always imposed ?
>
I think you should consider the two coins identical so A:head B:tail is the same
as A:tail B:head. The answer should be 1/2. The difference between permutation
an combination goes only with a possible ordering. If the two coins are
identical, then there is no possible ordering.
R
Return to Top
Subject: Re: growth, decline, steady state (roughly), or just outright fluctuation
From: nakhob@mat.ulaval.ca (Renaud Langis)
Date: Wed, 04 Dec 1996 03:05:00 GMT
On 2 Dec 1996 09:25:42 GMT, "Håkon Finne"  wrote:
>I have a large number of data sets, each of which contains a time series of consecutive annual
>observations, with a maximum of ten years for each set. There is a lot of fluctuation in the data.
>I need an algorithm that will section the data (according to the values of a particular variable)
>into periods characterized by growth, decline, steady state (roughly), or just outright
>fluctuation.
>Each period should last at least two or three years and the characterization should agree
>fairly well with subjective judgment when looking at a graph of the data. As I see it, one problem
>lies in determining inflection points that define the beginning/end of each period.
>If possible, please also give hints to how the algorithm could be implemented in SPSS!
>Thanx. (And yes; this is an econometric problem.)
>Mail, please, to Hakon.Finne@ifim.sintef.no
>
>
Well, you could simply compute a (large) smoothing of your data. then compute
the slope of the new curve in each point. Set some values that will tell if the
curve is in a growth,steady or decline state. I think these values should be a
function of the width of the smoothing function. Another way of doing it is to
count the number of consecutive positive (negative) values of the slope. If you
have more than a certain number of consecutive positive (negaive) values, then
the series is growing (declining). Otherwise the series is steady.
This does not ensure though that the sub series will be at least 2 or 3 years
long. Depends on the data.
I don't know how to implement this on SPSS but i don't think it would be hard.
just use a movering average as your smoother. SPSS Trend can do this (i think).
R
Return to Top
Subject: Re: excel add-ins *** are there any?
From: nakhob@mat.ulaval.ca (Renaud Langis)
Date: Wed, 04 Dec 1996 02:46:55 GMT
On Sun, 01 Dec 1996 19:52:49 -0500, Glenn Fasnacht  wrote:
>I'm wondering if anyone knows of Excel add-ins that perform statistical 
>analysis beyond those provided by microsoft? Specifically I'm looking 
>for two add-ins, one that allow 3d histograms, the other that will 
>produce normal probability plots.
The normal probability plot should be quite easy to program. I know an excel
macro that performs some statistical analysis. It's called xlSTAT, i don't
remember the URL but it should be easy to find with any search engine.
R
Return to Top
Subject: Re: normality test ?
From: axelrod@statwiz.com (Michael Axelrod)
Date: 4 Dec 1996 05:58:36 GMT
The best test (in the sense of being powerful against a broad range of
alternative distributions) is the Shapiro-Wilks test.  Remember after you
accept the data as being normal and then go on to do other procedures--say
estimation-- your calculations of estimation uncertainty must include the
fact that you tested the data.
Michael Axelrod
In article
, Julio
Cesar Voltolini  wrote:
> Dear friends,
> 
> I am a biologist and we are collecting mammals at different strata of the 
> Brazilian Rainforests.
> 
> I would like to do some statistical tests but I need to know if my data have 
> normal distribution. I am starting to use ESTATISTICA and SYSTAT and I would 
> like to do the tests in this packages. May I test the normality in EXCEL too ?
> 
> Thank you for any help !!
> 
>                                 Voltolini
Return to Top
Subject: Re: population vs sample
From: axelrod@statwiz.com (Michael Axelrod)
Date: 4 Dec 1996 06:29:55 GMT
In general you cannot. The following is illustrative of the problem.
In 1948 the Literary Digest magazine mailed questionnaires to people
asking about their presidental preferences. The people were picked at
ramdom from telephone books. The prediction was strongly in favor of
Dewey, who lost.  Many people believe that the problem was a result of
using telephone listings to select people for the survey. Their reasoning
assumes republicians were richer and therefore more likely to have
telephones (remember this is 1948).  This was not the cause. Actually, the
telephone lists did provide a valid (Democrats were just about as likely
have telephones) random sample of voters.  The problem was response bias.
People who were inclined to vote for Dewey had a greater probability of
answering the questionnaire which required them to mail it back.
Another example.
Several years ago, I read about a professor who mailed 5,000
questionnaires to female students asking about rape, and on the basis of a
few hundred responses, declared that somethings like 25% of females
students are raped. This result was greatly at variance with census data
on this matter. I mailed the magazine an analysis of the effects of
response bias and to what degree it would have distorted the study, but it
was not printed.  The article said members of congress took the study
seriously and were preparing legislation. 
The design of surveys can be very technical and much of the statistical
literature deals with theory and not practical aspects of actually doing
survey design.  Consult the book by Cochran on sampling to get some idea
of how to test and design for response bias.
Michael Axelrod
In article <32A30DA4.3B13@agr.gouv.qc.ca>, Lucie  wrote:
> Hi!
> 
> Somebody in my organization did a survey and sent a questionnaire to all 
> the subjects of a particular population (1500 farmers).  He sent 1500 
> questionnaires and received 611.  Can we treated this subset like a 
> simple random sample of the population of the 1500 persons? Can we 
> generalize the results to all this population?
> 
> Thank you!
> Lucie Dugas
> Québec, Canada
> ldugas@agr.gouv.qc.ca
Return to Top
Subject: help with sas programiing/ problems
From: doug
Date: Mon, 02 Dec 1996 23:51:18 -0800
In dire need of help  with sas programing / problems .  ANy help will be 
greatly appreciated.
1. Trying to use the sas internal functions tinv, uniform , and repeat to
construct a sample from a t ditributon and use the sample size and df as 
input. And summerize the sample with proc means and proc chart.
2.  Repeat problem  without the use of tinv, uniform,repeat, or rannnor; 
and summerize same as previous.
3.  Trying to creat a simulation that generates observations from an 
exponential with a fiven mean (mue , mu??). Summerize same as first.
4.  Am wantin to write a simulation that illustrates the central limit 
theorum using the exponential distribution  whti a mean 5 as an example.
5.  Use a simulatiion to determine the minimum sample size required in 
the large sample 90% confidence interval for the mean of an exponential 
distribution. must use enough similation replications to ensure that the 
 estimate of the coverage probability  is within .006 of the truth  with 
95% confidence.
Any help with these will be greatly appreciated or any information that 
may lead me to another source.  thanks .  send reply to lufswtch@tcac.com
sincerely, 
Mr. K. Green
Return to Top
Subject: Re: Power?
From: Chauncey
Date: Tue, 03 Dec 1996 23:47:36 -0800
Matt Beckwith wrote:
> 
> If I understand what's gone before:
> 
> (1) Alpha is the probability that you have rejected that which is
> true.
> 
hmmm, perhaps we are saying essentially the same thing but I have thought of alpha as being the 
criteria that we (depending on the research setting, usually .05) establish as the probability we 
are willing to accept that the differences are due to chance alone.  In other words, if we reject 
the null, p<.05, then there is less than .05 probability that the difference is due to chance and 
we can say, with that level of assurance, that the difference is due to a systematic treatment 
or population effect rather than a "by chance" difference in the sampling.
make sense?
Return to Top

Downloaded by WWW Programs
Byron Palmer