Newsgroup sci.stat.consult 21501

Articles

Subject: Does multiple regression tell you probabilities?
From: beckwith@pop.southeast.net (Matt Beckwith)
Date: Mon, 09 Dec 1996 12:43:19 GMT

I'm going to be performing a study which investigates hypothesized
correlations between certain variables and an outcome.  The outcome is
binary, so I intend to use logistic regression.  The purpose of the
first phase of the study is to come up with a predictive index for the
outcome, using those variables which turn out to be predictive.  The
second stage of the study will then be to test the index to see
whether it is correlated with the outcome.
I was planning to design the predictive index itself based on the
correlation coefficients determined in the first phase of the study.
For example, let's say the regression equation turns out to be y=.1a +
.2b + .3c + .4d + .5e.  Having previously determined that any
correlation below, say, 2.47 could be due to chance, I reject
variables a and b.  The predictive index then becomes something like
I=(3/12)c + (4/12)d + (5/12)e (12 being the sum of 3, 4 and 5),  with
threshold values of "I" being associated with certain probabilities of
the outcome.  For example, an "I" of .4 might be associated with a
probability of 40% for the outcome.
Can such probabilities can be estimated out of the first stage of my
study?
Should I use the relative correlations (r squared) instead of the
correlations (r) for my index coefficients?  I suspect not, since I
want to come up with a high percentage result if only one of the
variables is highly predictive.  For example, let's say that variable
"e" is so highly predictive that, if its value is above a certain
threshold, the outcome is almost certain.  But let's say the same can
be said of variable "d".  If I use the relative correlations, the
index coefficients of these two variables will be deflated so much
that, in a situation where "e" is very high but "d" low, the outcome
may not be predicted.  On the other hand, if I use the correlation
coefficients for the index coefficients, either "d" or "e" being above
a certain value will result in a nearly 100% outcome prediction.
Thanks.
Matt Beckwith

Return to Top

Subject: Wall Street Quant. Position ==> Repost
From: David Rothman
Date: Mon, 09 Dec 1996 08:29:53 +0100

This is a second posting for anyone who might have missed the post of
last month.
The trading arm of a major investment firm is seeking a quantitative
specialist for its New York based Analytical Equity Trading Group to
work with its senior professionals in the on-going development of
sophisticated statistical/econometric trading models and strategies. 
QUALIFICATIONS:
The successful candidate will have in-depth knowledge of financial
economics,  time series econometrics, stochastic processes and the
requisite skills necessary to design and implement strategies in a
sophisticated computer environment. Comfort in dealing with
Probabilistic notions such as Random Walk, Brownian Motion and
Martingale Theory, combined with Econometric ideas such as Stationarity,
Cointegration, Error-Correction Models and Arch/Garch is essential.
This position would be ideal for someone with Wall Street experience in
statistical arbitrage (i.e., pairs trading, basket trading, swaps),
and/or academic training near or at the Ph.D. level.
CONTACT:
E-mail: nyrtd@ny.ubs.com
Please reply via email with either a resume or a short informal
description of yourself.  Please include a day & evening phone number.  
We are an Equal Opportunity employer.

Return to Top

Subject: Freeware version of Survo
From: seppomus@cc.Helsinki.FI (Seppo Mustonen)
Date: 9 Dec 1996 13:48:42 GMT

A full-scale version of Survo working until the 1st of May, 1997
is now freely available from
   http://www.helsinki.fi/survo/
SURVO 84C (Survo) is a general environment for statistical computing and
related areas. It is an integrated software system having a unique
editorial interface of its own.
Main activities of Survo are:
-  Statistical analysis and computing (standard methods +
       special features in multivariate analysis,
             e.g. graphical rotation in factor analysis,
                  stepwise Wilks' lambda in cluster analysis,
       randomized tests in various situations,
             e.g. for contingency tables in different types of
                  experiments,
       in nonlinear regression, automatic detection of parameters and
             symbolic computation of derivatives of model function,
       etc.)
-  Text processing,
-  Graphics (also user-defined types of statistical graphs),
-  Desktop publishing,
-  Editorial computing and computing in the touch mode,
-  Matrix interpreter (integrated with statistical functions),
-  General management of large numerical and textual data bases,
-  Making of expert applications and teaching programs.
The center of activities in Survo is an edit field that at all times
is partially visible on the screen. The edit field is maintained by the
Survo Editor.
The user works in Survo by typing both free text and commands in the
edit field. When commands are activated, their results will appear
in the same edit field. The results are also saved in files for
subsequent processing.
Warning! Survo goes far beyond the old-fashioned command-oriented, or
mouse-driven user interfaces.
Experienced Survo users can extend Survo activities by using a macro
language of Survo as sucros. Even a great deal of the basic functions
of Survo have been programmed as sucros.
General job management in Survo is based on (hierachical) menus
generated automatically according to the needs of the user.
Survo is written in C. It is a collection of DOS programs, but the user
sees it as an integrated environment. Many of the restrictions and
defects of DOS have been relieved. Survo is an open system. Anyone
can extend its capabilities either by the sucro language or in C.
The programming tools are freely available.
Survo works in any concurrent PC and also under multitask environments
like Windows NT, Windows 95, OS/2, and Linux. In desktop publishing and
in demanding report generating applications a PostScript printer is
essential. Survo is at its best in 486 and Pentium PC's but it works
also on lighter alternatives.
To illustrate one of the smart features of the editorial approach of
Survo think about the following tiny but revealing example. Assume that
we are writing text with a WORD PROCESSOR as follows:
Population statistics of Finland (31st October 1996):
The number of males is 2499415 and the number of females 2630612.
Thus the total population is ...
The question is, how do you proceed in order to calculate and write the
total number of inhabitants to the end of this statement as quickly as
possible. Most people (although sitting at PC) still use pocket
calculators or separate programs - really inconvenient!
Anyhow, if it takes 15-60 seconds (as it does in typical statistical
systems, editors and word processors), it is worthwhile to study the
capabilities of Survo where it can be done in less than 5 seconds
(by means of touch mode computing).
More information from http://www.helsinki.fi/survo/
Seppo Mustonen                    Seppo.Mustonen@Helsinki.Fi
Professor
Department of Statistics
P.O.Box 54
00014 University of Helsinki
Finland

Return to Top

Subject: Re: WEB courses and Walmart
From: mrstats@aol.com
Date: 9 Dec 1996 15:32:56 GMT

I think you're overlooking one factor; namely the "brand name"--in other
words, not all colleges are created equal.  CMU courses compete with Penn
State or other schools not just on price, but on reputation, perceived
quality, prestige, etc.

Return to Top

Subject: Re: Multiplicative vs Additive Data
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 9 Dec 1996 15:37:32 GMT

Saleem Nicola (nicola@phy.ucsf.edu) wrote:
<< asking about using a log-transform on biological data. See below>>
When looking at biological variables like growth of cell populations
or chemical concentrations, the 'natural'  metric is often log-normal,
or close enough to it.  If you 'naturally'  do talk about doubling
periods, or half-lifes, then taking the log will give you a straight
line when you plot 'quantity' against time.  Biological activity that 
is investigated with concentrations considered in multiples is a
probable candidate for log-transform of the concentration.
Any time the largest data-value is much bigger than the smallest, and
there is a natural zero, Tukey has written that transformations 
should be  *considered*.  There are some bad instructors out there
who misinform,  believing as they do that being countable meets the
relevant standard for being 'equal interval'.  -- That is not so.
Hoaglin, Mosteller and Tukey(ed.) : "Understanding Robust and 
Exploratory Data Analysis"  -- has a chapter on transformation.
Tukey's "EDA" has one, too.  DJ Finney, "Statistical Method in
Biological Assay"  has a larger discussion about 'bounded' growth,
for which the logit (P/(1-P)) or other symmetrical transformations
are indicated.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
======================original note==========================
: Hello,
: I'd appreciate some help on the following question. Suppose one measures
: a variable in control and then in an experimental condition. One
: way to test whether the variable is different in these two conditions
: is to perform a paired t-test (in which for each experiment, the
: control and experimental values are subtracted, and the mean of these
: differences is compared with their variance). This approach works very
: well if the data is additive: that is, if one expects the same absolute
: magnitude of change in the variable each time one does the experiment.
: However, suppose one expects that the experimental effect is
: multiplicative, not additive. For example, one might expect that
: a particular concentration of an antibiotic will kill off around half
: the total number of bacterial cells in a dish, no matter how many cells
: the dish contains to start with -- 10, 100, 1000, or a billion. If the 
: number of cells in the control condition varies quite a bit, the 
: variance in the differences between control and experimental conditions
: will be enormous, and the paired t-test is of little use.
: There are several ways to take care of this problem; I suspect 
: that not all of them are correct. One is to divide each experimental
: value by the control value instead of to subtract them. Another
: is to subtract the logarithm of the experimental value from the log of
: the control value, and then do the test on the resulting values. (Of
: course the difference between two logs is the same as the log of one
: value divided by the other [ie, log(E)-log(C) = log(E/C)].) I have been 
: told that the latter method (taking the log transform) is the correct 
: one, but I'm not sure about this, and I'd like an explanation for why it 
: is acceptable to use log values. If anyone could provide one, or at
: least point me towards a textbook or other reference that discusses this 
: in detail, I would really appreciate it.

Return to Top

Subject: I need tables, graphics, stats and data about megacities
From: aulatesi
Date: Mon, 09 Dec 1996 16:34:02 -0800

For a study I need tables, graphics, stats and data about megacities. If
you have something, please send-it to me with the indications about
the Font where you have take it.
paco@freenet.hut.fi

Return to Top

Subject: Re: Adjusting parameters with poor reliability
From: Dennis Roberts
Date: Sun, 8 Dec 1996 23:37:27 -0500

one can estimate what the correlation might be IF the predictor measures are
made MORE reliable .. but, the question is: if you do this ... are you
merely fooling yourself? unless there is some real reason why you know why
your data are NOT as reliable as they WILL be ... then i don't think
correction for attenuation is a good idea ...
At 09:16 PM 12/8/96 -0500, you wrote:
>I have a number of variables (categorical and continuous) that have shown
>some evidence of poor test-retest reliability.
>
>I heard that there is a procedure called "attenuation of parameter
>estimates" due to poor reliability, however I am unsure what this is, or if
>there are any references that I might be able to begin with.
>
>I am aware that in PROC CALIS (SAS) it is possible to include the
>reliability of the variable in the model. However, this limits my analysis
>boundaries, as I was intending to use repeated measures.
>
>If this makes sense to anybody, then I would welcome any comments or
>suggestions that they might have.
>
>Thanks in advance,
>
>Peter Baade
>
>
===========================
 Dennis Roberts, Professor EdPsy             !!! GO NITTANY LIONS !!!
 208 Cedar, Penn State, University Park, PA 16802 AC 814-863-2401
 WEB (personal) http://www2.ed.psu.edu/espse/staff/droberts/drober~1.htm

Return to Top

Subject: Re: RSQ=90% with only one dimension!!
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 9 Dec 1996 15:51:49 GMT

Yuichi Watanabe (yuichi@HAWAII.EDU) wrote:
: I have collected data with a 101-item questionnaire on motivation etc.
: from 1000 students. All the responses were 1-5 Likert-type scale
: (Strongly disagree - Strongly agree). 52 items were on motivation.
: I have run an MDS based on a correlation matrix of 52 items, convering
: the correlation coeffiencients into 1+r to obtain a similarity matrix with
: all positive numbers. An MDS with n=1 accounted for 90% of variance, with
: n=2 92%. All the negative items, mainly Anxiety, were on the left and all
: the confidence items, such as Expectancy of success, were on the right
: extreme in the first dimension. With a factor analysis with the same 52
: items, 8 factors accounted for only 50% of variance. Did I do something
: wrong?
: I am confident that the data input was done correctly. The same
: correlation matrix yielded the same factor solutions. I have run MDS both
: with SAS and Systat and got almost identical results.
: Could someone explain why I got so different results beween factor
: analysis and MDS?
 -- Two items with a big negative correlation would be VERY FAR APART
or NOT AT ALL SIMILAR accoring to the data you put in your MDS.
Factor analysis, on the other hand, would put them on the same factor
with opposite loadings.
I do not know how you want to enter your correlations, but adding
1.0  is probably a very poor solution, because it HAS to give you
the results that you see.
Accounting for 50% of the variance of that kind of attitude-item is
probably accounting for all the 'reliable'  variance, so that is
a decent sounding result.  I do not know what is comparable for
MDS;  MDS has not been useful the couple of times that I have tried
it (That was also, like yours, on rating-scale data).
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: cluster analysis - HELP PLEASE HELP
From: "Rickard D. Robbins"
Date: Mon, 09 Dec 1996 11:51:55 -0600

Hello all,
	Ok here is the premise to my problem.  I had 2 different sets of
subjects make ratings on the same set of stimuli.  I took those ratings
and used them as distance measures to do a cluster analysis for each
group.  I now have 2 different sets of clusters containing some
overlaping items.  How do I compare these 2 cluster analyses(lets say
one yielded 4 clusters and the other yielded 3)?  Is there any
quatifiable measure of overlap of 2 cluster analysis?  Does anyone even
know where to start?
		Thanks for any type of help.
			Rickard D. Robbins
P.S.: email me personally, if you don't mind.
email:rrobbins@colab.brooks.af.mil

Return to Top

Subject: Alpha inflation with 2 manovas??
From: jalberts@imap1.asu.edu
Date: 9 Dec 1996 18:03:40 GMT

I know that alpha inflation occurs when doing more than a-1 contrasts on 
the same set of data, where a=the number of groups.
Does alpha inflation occur in the following scenario:
If I do a MANOVA with three dv's and then later decide that I want to do 
another MANOVA with three different variables (from the same data set but 
they are not the same variables from test one) does performing the second 
manova constitue alpha inflation? or is there some oher problem with 
doing 2 separate manovas??
any insights would be appreciated.
TIA
Jay
============================================================================
Jay Alberts
jay.alberts@asu.edu

Return to Top

Subject: Re: Dropping n.s. dummy variables in logistic regression
From: fharrell@virginia.edu
Date: Mon, 9 Dec 1996 17:34:13 GMT

When parsimony is achieved by examining relationships with the response
variables, all statistical inference can be invalidated.  In particular,
standard errors are too small and P-values are too.  See for example the
two references below.  The Grambsch and O'Brien article has a nice
demonstration of the fact that if you fit a model with age and
age-squared, test the age^2 term for significance and drop it, the one
degree of freedom test for age needs to be almost be judged against a
2 d.f. critical value.  In other words, dropping the age^2 term can do
nothing but hurt the power of the test of association between age and
Y if one in fact preserves the type I error.
@article{gra91,
   author = "Grambsch, P. M. and {O'Brien}, P. C.",
   journal = SM,
   pages = "697-709",
   title = "The effects of transformations and preliminary tests for non-linearity in regression",
   volume = "10",
   year = "1991"
}
@Article{cha95mod,
  author = 		 {Chatfield, C.},
  title = 		 {Model uncertainty, data mining and statistical
                  inference (with discussion)},
  journal = 	 JRSSA,
  year = 		 1995,
  volume =		 158,
  pages =		 {419-466},
  annote =		 {bias by selecting model because it fits the data
                  well; bias in standard errors;P. 420: ... need for a
                  better balance in the literature and in statistical
                  teaching between {\em techniques} and problem
                  solving {\em strategies}.  P. 421: It is `well
                  known' to be `logically unsound and practically
                  misleading' (Zhang, 1992) to make inferences as if a
                  model is known to be true when it has, in fact, been
                  selected from the {\em same} data to be used for
                  estimation purposes.  However, although
                  statisticians may admit this privately (Breiman
                  (1992) calls it a `quiet scandal'), they (we)
                  continue to ignore the difficulties because it is
                  not clear what else could or should be done.
                  P. 421: Estimation errors for regression
                  coefficients are usually smaller than errors from
                  failing to take into account model specification.
                  P. 422: Statisticians must stop pretending that
                  model uncertainty does not exist and begin to find
                  ways of coping with it.  P. 426: It is indeed
                  strange that we often admit model uncertainty by
                  searching for a best model but then ignore this
                  uncertainty by making inferences and predictions as
                  if certain that the best fitting model is actually
                  true.  P. 427: The analyst needs to assess the model
                  selection {\em process} and not just the best
                  fitting model.  P. 432: The use of subset selection
                  methods is well known to introduce alarming biases.
                  P. 433: ... the AIC can be highly biased in
                  data-driven model selection situations.  P. 434:
                  Prediction intervals will generally be too narrow.
                  In the discussion, Jamal R. M. Ameen states that a
                  model should be (a) satisfactory in performance
                  relative to the stated objective, (b) logically
                  sound, (c) representative, (d) questionable and
                  subject to on--line interrogation, (e) able to
                  accommodate external or expert information and (f)
                  able to convey information.}
}

Return to Top

Subject: Drug dosage
From: Fancher Wolfe
Date: Mon, 9 Dec 1996 14:23:40 -0600

Please recommend a sorce for a novice to learn about modeling dose levels
and exploring the effect of different ingestion patterns.  I have been
exploring difference equations but my math is weak.  Hope that this is an
appropriate question for this list. Thanks.
Fancher E. Wolfe, Professor
Mathematics and Statistics
Metropolitan State University
730 Hennepin Ave.
Minneapolis, MN 55403-1897
fwolfe@msus1.msus.edu
612-341-7256

Return to Top

Subject: Re: Repeated measures analysis
From: Ed Cook
Date: Mon, 9 Dec 1996 14:31:00 CDT

I might conceptualize it as a question of whether the "effect" of
B on A varies with T (time):  in other words, is there a B x T
interaction.  Assuming that B is continuous and T is categorical,
I'd analyze with the Lorch and Myers repeated measures regression
technique (J. Experimental Psychology: Learning, Memory, and Cognition,
1990).  Their technique is straightforward for anyone familiar with
OLS regression.  If I was more of a statistician I might consider something
like hierarchical linear modelling.  Of course, the more conceptual
questions raised by Rich Ulrich also deserve attention--consideration
of those might resolve issues about whether and how to standardize
prior to applying the L&M; technique. Also, if B is categorical, then the
L&M; approach might not be applicable.
Ed Cook, Assoc Prof of Psychology, Univ of Alabama at Birmingham, USA

Return to Top

Subject: Re: Baseball study
From: Sean Lahman
Date: Mon, 09 Dec 1996 13:32:07 -0500

Peter Flom wrote:
> 
> Sean Lahman wrote about studying the effects of integration and
> expansion  on baseball.
> 
> He asked if he was "missing anything"
> 
> Well, one thing that seems to me to be missing is that batting
> averages are NOT an absolute measure of ability....they are affected
> by lots of things,. but the main one I can think of here is PITCHING
> and fielding ability.
> 
You're right, there is no absolute measure of hitting ability.  But I
would expect that by using several different statistical measures, you
would be able to see the effects of outside forces.  There are obviously
many other factors that affect hitting performance, and I suspect
mutil-variable regression analysis might help to identify and quantify
them.
> If integration improved the quality of play (as seems inutitively
> likely to me as well as to Sean) wouldn't it improve all aspects of
> play?
But I'm attempting to draw a distinction between quality of play and
statistical performance.  Specifically, "does comparable statistical
performance imply comparable quality of play?" 
My theory was something like this (as restated by someone else).  If
ability to hit is distributed normally among the population from which
major league players are drawn, expanding that population while keeping
the number of players the same will result in a higher percentage of
players being drawn from the extreme right end of the normal curve. 
Increasing the number of players drawn from a stable population will
have the opposite effect.  If we accept this premise, we would expect
the MEANS of standard measures of performance like BA, OBP and SLG to be
unaffected (since the "average" batter should improve at the same rate
as the average pitcher).  We would, however, expect expansion to
increase the standard deviations, and integration to have the opposite
effect.
Because the data does not conform to that model, it suggests to me that
the other effects (night baseball, artificial turf, livelier ball,
changes in style of play, etc.) are more significant in affecting player
statistics.
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sean Lahman - lahmans@vivanet.com
Sean Lahman's Baseball Archive
http://www.vivanet.com/~lahmans/baseball.html

Return to Top

Subject: Re: Taxis & lotteries
From: T.Moore@massey.ac.nz (Terry Moore)
Date: 9 Dec 1996 21:00:31 GMT

In article <199611300000.AAA24894@wildnet.co.uk>, John Whittington
 wrote:
> Of course.  My point is that, in the real world, one normally will not know
> for certain that the serial numbers do start at 1 - not the least because
> there are plenty of examples in which this is not the case.
> 
> Is it not best to try to model on the basis of all the available
> information, with the minimum of guessing about the nature of the model?  It
> would seem (to me) reasonable to use a model which assumed no more than that
> the serial numbers were consecutive (usually the case, even when the start
> is not 1), with both 'start' and 'finish' numbers as parameters of the model
> to be estimated.  I don't imagine that such a model would be all that
> difficult to deal with - intuitively, one might expect it would be based,
> inter alia, on both xmin and xmax in the sample.
Indeed you are right. However, if we observe xmax = 1287 and
xmin = 16, we realise that the starting number was probably 1.
Who would start with a small number other than 1? Of course this
is a Bayesian type of argument. What is more, the estimate would
no more than 16 different from using something like
(xmax-xmin) + bias correction.
Terry Moore, Statistics Department, Massey University, New Zealand.
Imagine a person with a gift of ridicule [He might say] First that a
negative quantity has no logarithm; secondly that a negative quantity has
no square root; thirdly that the first non-existent is to the second as the
circumference of a circle is to the diameter. Augustus de Morgan

Return to Top

Subject: Re: F-distribution
From: nakhob@mat.ulaval.ca (Renaud Langis)
Date: Mon, 09 Dec 1996 22:34:06 GMT

On Sun, 8 Dec 1996 14:24:00 -0500, Dennis Roberts  wrote:
>most stat packagew will read off any F value you want ... for any percentile
>point ... certainly Minitab can
>
>At 09:48 AM 12/4/96 GMT, you wrote:
>>Hi all,
>>
>>For some error analysis I need to use Tables
>>of the F-distribution, both the F_n;m;0.025
>>tables as the F_n;m;0.05 tables.
>>There are enough books in which I can find them,
>>but since I need them on a computer I wondered
>>whether instead of typing in all those numbers,
>>there is some WWW site where I could retrieve them.
>>
>>Machiel.
>
Excel has it.
R

Return to Top

Subject: Re: Search for Paradise
From: nakhob@mat.ulaval.ca (Renaud Langis)
Date: Mon, 09 Dec 1996 22:36:51 GMT

On Sun, 8 Dec 1996 05:34:41 -0500, ChrisM11@AOL.COM wrote:
>---------------------
>Forwarded message:
>Subj:    Search for Paradise
>Date:    96-12-08 03:41:25 EST
>From:    ChrisM 11
>To:      ChrisM 11
>
>       This is to inform you about the new adult game that VCS Magazine rated
>"The best game of '96" and gave an "Outstanding ****" (4 stars).  "The Search
>for Paradise is no doubt one of the greatest XXX Adult games available."  The
>first game where it is as much fun as it is a turn on.  Travel the world to
>every continent, every country you can think of, and meet some of the most
>beautiful women in existence. These women will treat you like a king and obey
>your every command.  Any sexual wish you can think of, these women know it
>all.  There is a different paradise for every guy out there, and this game
>will have them all.  This game uses real models, digital video, and digital
>sound
>to make it as realistic as possible. You will feel like you're in the same
>room as the girl you're talking to.  ---  Required: 386 or better, 4 meg ram
>or better, Windows 3.1 or higher (Win95 is fine), sound card is optional,
>CD-Rom is optional.  Game is given either CD-rom, or compressed 3.5"
>diskettes.) - $19.95.
>
>    The last adult game we are going to inform you about is the newly
>released "Club Celebrity X".  Imagine being in a club with some very
>beautiful, well known, ACTUAL celebrities that with skill, will be making you
>breakfast in bed the next day.  These girls you have seen on television,
>magazines, and billboard ads, and now they are on your computer, begging
>for action. Each girl you will recognize and you won't believe your eyes when
>you got them in your own bedroom. This game is hot, and once you start
>playing, you won't be able to stop.   ---  Required: 386 or better, 4 meg ram
>or better, Windows 3.1 or higher (Win95 is fine), sound card is optional,
>CD-Rom is optional. Game is given either CD-rom, or compressed 3.5"
>diskettes.) - $19.95.
>
>Software arrives is a plain, unmarked, brown package.  Delivery takes no
>longer than 7 to 8 working days.  Both your email address, and mailing
>address are NOT added to any mailing lists whatsoever.  Once you are mailed
>this email, your name is deleated from all lists to ensure you are not mailed
>again.
>
>Each game is $19.95, but for a limited time, you can get both "The Search for
>Paradise" and "Club Celebrity X" for just $29.95.   Shipping and handling is
>$2.00 for each game ordered.  There are no additional charges or fees.
>
>Please make checks or money orders out to: Chris Mark
>
I'll surely buy it with probability zero.
R

Return to Top

Subject: Re: Adjusting parameters with poor reliability
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 9 Dec 1996 22:49:40 GMT

Peter Baade (baade@SPIDER.HERSTON.UQ.OZ.AU) wrote:
: I have a number of variables (categorical and continuous) that have shown
: some evidence of poor test-retest reliability.
: I heard that there is a procedure called "attenuation of parameter
: estimates" due to poor reliability, however I am unsure what this is, or if
: there are any references that I might be able to begin with.
 -- This is mainly hocus-pocus, which serves to mislead the unwary.
It is well-intended, to serve a particular theoretical purpose,
in a few particular times and places of application;
but 'correction for attentuation', when I have seen it a couple of
times, has gone along with ignoring  *any*  appropriate tests of
significance.  And claiming that two variables 'seem' to be (or *are*)
correlated perfectly, or thereabouts, is a bit misleading when
the accompanying test of significance might not reach (even) the
5% test level.
The idea is this:  If variable A has reliability of .5, and
variable B has reliability of only .6 (estimated, perhaps, from
the data on hand),  then the observed correlation of .55  between
A and B would be 'corrected' to be a correlation of 1.00.
I find the procedure more believable when the original 
correlations are in the vicinity of .9.  For instance, I have
read some similar arguments concerning the 'content'  of two
IQ tests.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: Baseball study
From: Paige Miller
Date: Mon, 09 Dec 1996 15:33:13 -0500

Sean Lahman wrote:
> 
> Peter Flom wrote:
> >
> > Sean Lahman wrote about studying the effects of integration and
> > expansion  on baseball.
> >
> > He asked if he was "missing anything"
> >
> > Well, one thing that seems to me to be missing is that batting
> > averages are NOT an absolute measure of ability....they are affected
> > by lots of things,. but the main one I can think of here is PITCHING
> > and fielding ability.
> 
> You're right, there is no absolute measure of hitting ability.  But I
> would expect that by using several different statistical measures, you
> would be able to see the effects of outside forces.  There are obviously
> many other factors that affect hitting performance, and I suspect
> mutil-variable regression analysis might help to identify and quantify
> them.
Are you hypothesizing that variable X2 might show the effect of outside
force A, and therefore you could infer from X2 how outside force A
affected variable X1? If so, I'm skeptical that you can do this from the
data.
You are dealing with highly collinear variables here. Multiple
regression will enable you to predict with highly-collinear variables;
it will not enable you to separate the affects of the variables.
Basically, the problem is in the data, not in the statistical method.
-- 
+---------------------------------+------------------------------+
| Paige Miller, Eastman Kodak Co. | "Let's play some basketball" |
| PaigeM@kodak.com                | Michael Jordan in Space Jam  |
+---------------------------------+------------------------------+
| The opinions expressed herein do not necessarily reflect the   |
| views of the Eastman Kodak Company.                            |
+----------------------------------------------------------------+

Return to Top

Subject: Fixed or random?
From: "Brian L. Bingham"
Date: Mon, 09 Dec 1996 14:48:31 -0800

We have an ongoing debate in our lab about nested factors and whether
they should be fixed or random.  There is no consensus among authors on
the subject.  Some say that nested factors are always random while
others state that it is possible (though unlikely) that a nested factor
will be fixed.
Zar (1996) pg. 308 gives an example of a nested analysis where the
nested factor (drug source) is random.  
          Drug 1       Drug 2         Drug 3
Source:   A    Q      D     B        L     S 
1.  What is the basis for considering source as a random factor?  Is
nesting alone sufficient?  Are there not valid reasons to make sources a
fixed factor (e.g., if sources are not randomly chosen)?
2.  The F test for the nested factor will tell us whether there is
variation among drug sources.  Suppose that was significant and we were
specifically interested in comparing sources within each drug. Is it
appropriate to make contrasts between sources within drugs (-1 1 0 0 0
0; 0 0 -1 1 0 0; 0 0 0 0 1 -1)?

Return to Top

Subject: Graphical Analysis of Oblique Factor Analysis
From: C S
Date: Mon, 09 Dec 1996 21:17:26 -0500

If I do an obllique factor analysis, and get two factors, each 
representing a different proportion, what is the best way to represent 
this graphically?
	a.  How long do I make the two oblique axes?
		(Do I make them proportional to the eigenvalues?)
	b.  What is the angle between the two oblique axes?
Thanx
Chuck

Return to Top

Subject: Re: Need ROBUST TIME-SERIES Techniques
From: "DAVID P. REILLY"
Date: Mon, 09 Dec 1996 18:55:49 -0500

Erik H Williams wrote:
I am just becoming familiar with newsgroups and saw your posting. I
don't exactly know
all the etiquette but I do know time series. What you a referring to is
called intervention
detection or outlier detection.
There are lots of assumptions regarding time series and when you talk
about robust
procedures there are a variety of extensions to time series modeling
which allow
one to proceed , always cautiously , when some of the assumptionss are
not met.
One of the assumptions that is nearly always violate by real-world data
is the
assumption that the mean of the errors is invariant and is not
staistically
significant from zero at all points in time. This led directly to
outlier detection.
If you wish to
discuss pulses,seasonal pulses,level shifts or time trends as
characteristics of a
robust time series, please call me at 215-675-0652  .   DAVE REILLY
Another standard assumption , often violated but eminently treatable is
the assumption
that the variance of the errors is constant.  have done with extending
time series
models to GENERALIZED LEAST SQUARES bu bootstrapping the diagonal
elements of the
variance-covariance matrix of the residuals.
Another possible violation , and again treatable is the assumption that
the model/
parameters are invariant over time. I have treated this in our
commercial package
called AUTOBOX. References and down-loadables are available at
      http://darkstar.icdc.com/~autobox

Return to Top

Subject: Heteroscedasticity and degrees of freedom
From: barnett@agsm.unsw.edu.au (Glen Barnett)
Date: 10 Dec 1996 03:42:20 GMT

Can anyone suggest references for the loss of degrees of freedom
in a regression situation under heteroscedasticity?
Alternatively, the equivalent effect in unbalanced 
one way ANOVA may be of help.
The simpler of the situations I'm in essentially has a model of a set of
parallel lines, for which I'm interested in finding the p-values of the
parameters representing the differences in height. The smaller sample
sizes generally have the smaller variances. If the two-sample t is any
guide, this indicates the effect of d.f. should be small, but I'd
like to see what is out there on this problem.
Most of what I've been able to find out there so far either just falls
back on asymptotic normality, or pretends that the degrees of freedom
don't change.
Glen

Return to Top

Subject: Thanks For The Help!
From: Daniel Parker
Date: Tue, 10 Dec 1996 00:51:03 -0500

Just want to say thanks to all the people who responded to my plea for 
help on basic techniques and problems areas in formulating and 
specifying models. Also techniques for exploratory analysis and visual 
interpetation. Now that I have a bunch of responses, I will post a 
summary of them in the next day or two so as to possibly be of use for 
others.
Again, thanks for all the suggestions!
Daniel Parker

Return to Top

Subject: Thanks For The Help!
From: Daniel Parker
Date: Tue, 10 Dec 1996 00:50:06 -0500

Just want to say thanks to al the people who responded to my plea for 
help on basic techniques and problems areas in formulating and 
specifying models. Also techniques for exploratory analysis and visual 
interpetation. Now that I have a bunch of responses, I will post a 
summary of them in the next day or two so as to possibly be of use for 
others.
Again, thanks for all the suggestions!
Daniel Parker

Return to Top

Subject: Thanks For The Help!
From: Daniel Parker
Date: Tue, 10 Dec 1996 00:49:28 -0500

Just want to say thanks to al the people who responded to my plea for 
help on basic techniques and problems areas in formulating and 
specifying models. Also techniques for exploratory analysis and visual 
interpetation. Now that I have a bunch of responses, I will post a 
summary of them in the next day or two so as to possibly be of use for 
others.
Again, thanks for all the suggestions!
Daniel Parker

Return to Top

Subject: Thanks For The Help!
From: Daniel Parker
Date: Tue, 10 Dec 1996 00:49:04 -0500

Just want to say thanks to al the people who responded to my plea for 
help on basic techniques and problems areas in formulating and 
specifying models. Also techniques for exploratory analysis and visual 
interpetation. Now that I have a bunch of responses, I will post a 
summary of them in the next day or two so as to possibly be of use for 
others.
Again, thanks for all the suggestions!
Daniel Parker

Return to Top

Subject: Thanks For The Help!
From: Daniel Parker
Date: Tue, 10 Dec 1996 00:50:17 -0500

Just want to say thanks to all the people who responded to my plea for 
help on basic techniques and problems areas in formulating and 
specifying models. Also techniques for exploratory analysis and visual 
interpetation. Now that I have a bunch of responses, I will post a 
summary of them in the next day or two so as to possibly be of use for 
others.
Again, thanks for all the suggestions!
Daniel Parker

Return to Top

Subject: Sorry for repeats, browser glitched....
From: Daniel Parker
Date: Tue, 10 Dec 1996 01:07:12 -0500

Sorry for the repeats,
my browser glitched and appeared not to be sending anything. But then 
I noticed the flood of messages.
Daniel Parker

Return to Top

Subject: Re: Is there a test for H0:Pearson-Rho=1?
From: Hans-Peter Piepho
Date: Tue, 10 Dec 1996 08:17:44 +0100

>Is there a test for H0:Pearson-Rho=1?
>
>I found tests for Rho=0 and Rho=Rho0 with Rho0<1. Can't gat to find one for
>testing Rho=1.
>
>Any suggestion?
>
reject whenever your sample r is < 1.
_______________________________________________________________________
Hans-Peter Piepho
Institut f. Nutzpflanzenkunde  WWW:   http://www.wiz.uni-kassel.de/fts/
Universitaet Kassel            Mail:  piepho@wiz.uni-kassel.de
Steinstrasse 19                Fax:   +49 5542 98 1230
37213 Witzenhausen, Germany    Phone: +49 5542 98 1248

Return to Top

Subject: Re: Fixed or random?
From: Hans-Peter Piepho
Date: Tue, 10 Dec 1996 08:34:17 +0100

>We have an ongoing debate in our lab about nested factors and whether
>they should be fixed or random.  There is no consensus among authors on
>the subject.  Some say that nested factors are always random while
>others state that it is possible (though unlikely) that a nested factor
>will be fixed.
>
>Zar (1996) pg. 308 gives an example of a nested analysis where the
>nested factor (drug source) is random.
>
>          Drug 1       Drug 2         Drug 3
>Source:   A    Q      D     B        L     S
>
>
>1.  What is the basis for considering source as a random factor?  Is
>nesting alone sufficient?  Are there not valid reasons to make sources a
>fixed factor (e.g., if sources are not randomly chosen)?
>
>2.  The F test for the nested factor will tell us whether there is
>variation among drug sources.  Suppose that was significant and we were
>specifically interested in comparing sources within each drug. Is it
>appropriate to make contrasts between sources within drugs (-1 1 0 0 0
>0; 0 0 -1 1 0 0; 0 0 0 0 1 -1)?
>
>
1. The random assumption is a stronger one than the fixed assumption.
2. I would consider a factor as random, if the levels included in the study
can be regarded as a RANDOM sample from a population (as you indicate in
1.). In the example, there may be a population of possible sources for each
drug. If the levels used in the study are a RANDOM sample from that
population, the nested factor can be regarded as random.
3. If the nested factor is random, the standard errors for drugs will be
larger, and the inference space will be broader, i.e. you can draw
inferences with respect to the whole population of sources. When the nested
factor is fixed, inferences for drugs are restricted to the levels of the
nested factor you have investigated.
4. Since you are interested in contrasts among levels of the nested factor,
it may be that you have purposefully selected the levels, so the factor is
not random. If the factor is really random, you could compare BLUPs rather
than BLUEs.
Hans-Peter
_______________________________________________________________________
Hans-Peter Piepho
Institut f. Nutzpflanzenkunde  WWW:   http://www.wiz.uni-kassel.de/fts/
Universitaet Kassel            Mail:  piepho@wiz.uni-kassel.de
Steinstrasse 19                Fax:   +49 5542 98 1230
37213 Witzenhausen, Germany    Phone: +49 5542 98 1248

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.consult 21501

Directory

Articles