Back


Newsgroup sci.stat.math 12139

Directory

Subject: Re: Good Technical Books? -- From: billmcc
Subject: Minitab manual/book? -- From: Tse-Sung Wu
Subject: Re: clustering program 4 Mac -- From: bwallet@nswc.navy.mil (Brad Wallet)
Subject: ANNOUNCE: Conference in Auckland, New Zealand -- From: dscott@stat.auckland.ac.nz (David Scott)
Subject: Algorithm for Moments -- From: arte@panix.com (Arthur Ellen)
Subject: Re: how to convert a uniform variable to a normal one -- From: "Robert E Sawyer"
Subject: Re: multicollinearity -- From: thompson@charm.net (T. Scott Thompson)
Subject: Re: how to convert a uniform variable to a normal one -- From: Daniel Nordlund
Subject: Re: QUESTION: extension of binomial coefficient to real values -- From: thompson@charm.net (T. Scott Thompson)
Subject: A funny story plus a probability problem I would like to know the answer to... -- From: "Less Wright"
Subject: Definition of "Exponential Order" -- From: Tatsuo Ochiai

Articles

Subject: Re: Good Technical Books?
From: billmcc
Date: Wed, 11 Dec 1996 12:42:57 -0800
Christian Campbell wrote:
> 
> I am a buyer of technical books at Brown University.  So, I thought I'd go
> to the people who read these books to find out which books are "must
> have's!"  If you have any suggestions, please e-mail me.  I am
> particularly interested in recent non-computer titles, but I also stock a
> number of technical classics.
> 
> Thank you,
 I collect what I call bibles fromt the various areas I have worked in,
it is not a big list:
Handbook of steel construction
Fluid Dynamic Drag (and Lift) by Hoerner
Analysis and Design of Flight Vehicle Strauctures, by Bruhn
Marks' Standard Handbook for Mechanical Engineers
Formulas for Stess and Strain, Roarke & Young
Precision Machine design by Slocum
Low Speed Aerodynamics, Plotkin and Katz
The Finite Element Method by Zienkiewicz & Taylor
Machine Design by Schigley (Rothbart is pretty good as well)
Bill McEachern
Return to Top
Subject: Minitab manual/book?
From: Tse-Sung Wu
Date: Wed, 11 Dec 1996 16:33:31 -0500
hi,
can someone recommend a book or manual for Minitab, Release 11 for
Windows 95?  In particular I'm looking for info how to interpret Minitab
output of a multinomial logistic regression, and while my school has it
on its machines, we don't seem to have any up-to-date manual around.
I'd appreciate direct email or responses at least cc'd to me, as I don't
get to this neck of the woods much.
TIA!
Tse-Sung
(please note crossposting)
_________________________________________________________________
.s.o.l.i.c.i.t.a.t.i.o.n, .j.u.n.k.  .m.a.i.l. .u.n.w.e.l.c.o.m.e
Tse-Sung Wu......................................tsesung+@CMU.EDU
Engineering & Public Policy............Carnegie Mellon University
BH-129...................................Pittsburgh, PA 15213 USA   
voice: +1 412-268-3005.......................fax: +1 412-268-3757 
www.epp.cmu.edu/~tw1u/wu.html.............www.ce.cmu.edu:8000/GDI 
Return to Top
Subject: Re: clustering program 4 Mac
From: bwallet@nswc.navy.mil (Brad Wallet)
Date: Wed, 11 Dec 1996 20:53:27 GMT
In article <32AE9C28.197B@geomar.de>, Hildegard Westphal  writes:
|> Hallo everybody!
|> 
|> I would like to address to you with a problem concerning clustering of 
|> large data sets (1000 elements/30 attributes). I am looking for a Mac 
|> program that not only can handle such large data sets and apply 
|> different clustering algorithms (single linkage, average 
|> linkage..../Chi-2..../R-mode, Q-mode), but also can plot dendrograms. 
|> Does anyone have experiences with such programs and can give me a 
|> hint?
Hint one:  Don't forget about the curse of dimensionality.  There is
no possible way you can do clustering in anything approaching 30-space
without major dimensionality reduction.
Brad
Return to Top
Subject: ANNOUNCE: Conference in Auckland, New Zealand
From: dscott@stat.auckland.ac.nz (David Scott)
Date: 11 Dec 1996 23:26:47 GMT
New Zealand Statistical Association
48th Annual Conference
University of Auckland
Wednesday July 9--Friday, July 11, 1997
Themes of the Conference are Bayesian Statistics including Markov Chain
Monte Carlo, and Statistical Ecology.
It is expected that there will also be sessions on Official Statistics,
Biostatistics, Statistical Theory, and Statistical Education.
Contributed papers in any area of statistics will however be accepted for
the conference program.
Keynote speakers who have accepted invitations to speak at the Conference
are Peter Hall (ANU), Luke Tierney (Minnesota), Steve Buckland (St Andrews),
Keith Worsley (McGill), and Richard Huggins (La Trobe).
Peter Hall's talk will be presented jointly with the joint meeting of the
Australian Mathematical Society and the New Zealand Mathematics Colloquium,
which is being held in Auckland from July 7 to July 11.
Steve Buckland is to present a Workshop on Line Transect and Distance
Sampling for Estimation of Wildlife Populations on the morning
of July 11. The Workshop and the sessions on Statistical Ecology are intended
to be interdisciplinary, bringing together researchers from Biology,
Ecology and Statistics.
Accommodation has been reserved for participants in the student residence
Grafton Hall which is close to the University.
The deadline for submission of abstracts is May 23, 1997.
For further details concerning the Conference, or to register your interest,
there is a link on the home page of the Statistics Department at the
University of Auckland (http://www.stat.auckland.ac.nz/).
Alternatively, contact
Associate Professor David J Scott,
Department of Statistics,
Tamaki Campus,
The University of Auckland,
PB 92019, Auckland,
New Zealand
Phone: +64 9 373 7599 Fax: +64 9 373 7177
Email: d.scott@auckland.ac.nz or dscott@scitec.auckland.ac.nz
Return to Top
Subject: Algorithm for Moments
From: arte@panix.com (Arthur Ellen)
Date: 11 Dec 1996 19:35:58 -0500
Can someone post an algorithm for the 4 moments in either basic 
or pascal with a brief explanation?
tia
art
arte@panix.com
Return to Top
Subject: Re: how to convert a uniform variable to a normal one
From: "Robert E Sawyer"
Date: 12 Dec 1996 02:16:04 GMT
The most direct is this:
Let F be the cumulative distribution function of N(0,1),
let G be the inverse of F,
let U be a uniform[0,1] random variable,
and define X = G(U).
(Approximations and algorithms for G are in many standard references.)
Then pr(X wrote in article <58mqdb$au9@goliat.eik.bme.hu>...
| Hi,
| 
| Can anyone tell me an easy conversion scheme from  variables, picked from a uniform distribution, to variables picked from a
normal one?
| 
| Thanx
| 
| Babak
| 
Return to Top
Subject: Re: multicollinearity
From: thompson@charm.net (T. Scott Thompson)
Date: 11 Dec 1996 09:05:35 GMT
jim bouldin (jrbouldin@ucdavis.edu) wrote:
> Thank you Scott, and thanks to the others who replied to my original
> question.  The comments have been most helpful and understandable.  I DO
> want to sort out the separate effects of the ind variables for a world
> that may be very different from the one the data came from. 
> Specifically, it has to do with the fact that even though temperature
> and precipitation may be highly negatively correlated now (as one goes
> up in elevation in the mountains), under various global warming
> scenarios, they may not (i.e. probably will not) have the same
> relationship in the future.  The scientific value of my dissertation
> will be increased greatly if I can offer some estimates of the change in
> growth rates under various combinations of changes in the two variables
> in the future, along with estimates based on assumptions of the current
> relationships holding true.  (I understand that I still must confine
> reasonable predictions to the range of values for the ind variables in
> the data set).
You seem to be missing the point.  Suppose precip and temp are exactly
collinear and there are no other explanatory variables.  Then the data
all lie on some line in (temp,precip) space.  Even if there was no
regression error at all in your data you would not be able to separate
the effects of precip and temp.  To do that you need to have at least
some variation in one of the indep variables that isn't perfectly
explained by the other.
Think of it this way.  The estimated regression surface can be thought
of as a plane in (precip,temp,z) space, where z is the dependent
variable.  If there weren't any errors, nor much collinearity, and if
the world were really linear, then all of your measured data would lie
on this plane.  You could physically place a sheet of metal on top of
the data points, then measure the slope of this sheet in the precip
and temp directions in order to get the regression coefficients.  
Now suppose that you have perfect collinearity.  Then your attempt to
place the sheet of metal on top of the data will fail.  You may have
more than three points of support, but since they are all on a line,
there is no unique way to rest the sheet on top of the data.  You
don't have three _independent_ points of support.  The metal sheet can
be tilted one way or the other, pivoting around the line that holds
all of your data.  As it pivots, the measured slopes in the two
directions of interest will change.  You can pin down the tilt by
adding artificial data points away from the line.  This is essentially
what ridge regression and related methods do.  However, the measured
slopes are determined by the artificial points you choose, and not by
the data.
The situation in which there is a lot of collinearity, but not perfect
collinearity is similar.  Now all of your data are close to being on a
line, but not exactly on a line.  Any measurement errors will cause
the resting angle of the piece of metal to pivot dramatically,
essentially because you don't have any data away from the line.  Your
estimate will be very sensitive to small changes in the measurement
errors of the dependent variable, hence highly variable.  You can
reduce this error by adding an artificial point away from the data
(e.g. ridge regression), but the measured slopes are again not
primarily determined by the data in that case.
> Let me summarize my understanding now and pose another question or two. 
> Zar's idea (section 19.6 of the '96 edition of Biostatistics) of
> removing the correlation betwween two ind variables by doing a
> regression of the residuals that result from two linear regressions (in
> my case temp vs precip and temp vs growth rate, to obtain  the
> relationship of precip vs growth rate with the effects of temperature
> "removed"), will NOT provide an estimate of the effect of changing
> precipitation on growth rate, independent of temperature changes.
No.  This works, but no better than using the original regression,
except possibly through reducing numerical roundoff errors in your
computer.  If you could do exact calculations then this would yield
exactly the same estimate of the effect of precip on growth as you get
when regressing growth on precip and temp at the same time.  Consider
that if you have perfect collinearity then there are no residuals when
you regress temp vs. precip, or more precisely, the residuals are all
zero.  A bunch of zeros aren't going to help you predict growth rates.
>  The
> reason for this is that part of precipitation's effect on growth is
> unrecoverably hidden in precip's correlation with temp.  The residuals
> explain only that part which is NOT correlated with temperature, not the
> full effect of changes in precip.
The residuals in the regression of precip on temp are necessarily
uncorrelated with temp.  However, the residuals in the regression of
growth on temp have had are also uncorrelated with temp.  So there is
less variation in growth that the precip residuals have to explain.
The mathematics of linear regression tell us that the slope of this
residual/residual regression is exactly the same as the slope on
precipitation in the original multivariate regression model.
> So, question one.  Would a solution be an analysis of covariance by
> turning one of the two continuous ind variables into a categorical one
> and using it as a covariate?  Within each category the correlation
> between the two ind vars should be greatly reduced, right?  
No.  If all of the data are close to being on a line, then this is
true for any subset as well.  In fact because we are now allowed to
vary the line across groups, the within group collinearity will tend
to be more severe.  
> So I could
> produce an estimate of the independent effects of each ind variable on
> the dep variable, for each category.
No.  If anything the problem is worse, since you have fewer data
points within each group, and at least as much collinearity.
>  (However, maybe the correlation
> with the dependent variable will also be reduced, defeating the
> purpose.  Nevertheless, isn't my situation exactly what ANCOVA is
> designed for?)
> Qestion two.  I know little about ridge regression but it seems to be
> designed for this kid of problem, sacrificing accuracy for precision in
> the estimates of coefficients.  Is it worth trying?
That depends on your loss function in making the estimates.  Are you
willing to live with numbers that have low variability, not because
the data pins them down well, but because you have imposed an
artificial and somewhat arbitrary constraint on the model.  I'm not
being facetious.  There is an element of this in every model
specification decision.  Thinking hard about which assumptions you are
willing to live with and which are unacceptable is important.
>   Someone also
> mentioned principal components analysis, but I fail to see how different
> linear combinations of the ind variables will allow the types of
> predictions I'm looking for.
PCA is roughly akin to the residual/residual method that you mentioned
before.  While it might help you solve numerical accuracy issues, it
doesn't solve the fundamental problem.  It doesn't help at all if you
can do exact arithmetic, since it must then produce the same answers
as the original regression.
A final point: You say that you realize that you shouldn't make
forecasts that involve varying a regressor outside its observed range.
I can't think of any argument supporting this view that doesn't also
tell you that you shouldn't make forecasts that vary a pair of
regressors outside the observed range for the pair.  Extrapolation is
extrapolation, whether the individual variables, considered one at a
time, have reasonable values or not.
--
T. Scott Thompson            email:  thompson@charm.net
Severna Park, Maryland       phone:  (410) 431-5027
Return to Top
Subject: Re: how to convert a uniform variable to a normal one
From: Daniel Nordlund
Date: Wed, 11 Dec 1996 16:59:53 -0800
Daniel Nordlund wrote:
> 
> Babak Fakhamzadeh wrote:
> >
> > Hi,
> >
> > Can anyone tell me an easy conversion scheme from  variables, picked from a uniform distribution, to variables
> picked from a normal one?
> >
> > Thanx
> >
> > Babak
> 
> Below is sample C code to generate two random normal deviates
> from two uniform random deviates.  Uniform random deviates
> are first generated by some function, ranf(), in pairs.
> Compute x1 and x2.
> 
> In the DO-WHILE loop, keep generating uniform deviates until
> the point (x1,x2) falls inside the unit circle,
> i.e. w = x1*x1+x2*x2 < 1.
> 
> Then, compute y1 and y2, which are a pair of normal random
> deviates, distributed as N(0,1).
> 
>          float x1, x2, w, y1, y2;
> 
>          do {
>                  x1 = 2.0 * ranf() - 1.0;
>                  x2 = 2.0 * ranf() - 1.0;
>                  w = x1 * x1 + x2 * x2;
>          } while ( w >= 1.0 );
> 
>          w = sqrt( (-2.0 * log( w ) ) / w );
>          y1 = x1 * w;
>          y2 = x2 * w;
> 
I should have mentioned that this technique is the polar 
coordinate form of the Box-Muller transform.  There are a 
number of useful references listed at
http://taygeta.com/random/gaussian.html
Dan
Return to Top
Subject: Re: QUESTION: extension of binomial coefficient to real values
From: thompson@charm.net (T. Scott Thompson)
Date: 11 Dec 1996 09:20:49 GMT
David SQUIRE (squire@cui.unige.ch) wrote:
> I got very little response to this in sci.math, so I thought I'd try it here:
> In article <588plu$s8b@uni2f.unige.ch>, squire@cui.unige.ch (David SQUIRE) writes:
> >Dear all,
> >
> >I have a problem which I feel sure someone must have addressed before. I have
> >a statistic which I compute on a matrix A of observations. This requires that
> >I compute "a_{ij} choose 2" for each matrix element. For any real observation
> >matrix, all the elements are integers, so this presents no problem.
> >
> >I also want to compute an expected value for this statistic. I know how to
> >compute the expected value of each matrix element, but these are then no
> >longer integers. Is there a sensible way to compute "a_{ij} choose 2" when
> >a_{ij} is not an integer? (I suspect that gamma functions may be the answer).
> >
> >The expected a_{ij} are guaranteed to be rational, so at the moment I am doing the
> >calculation for the integer values obtained by multiplying by the common denominator,
> >and rescaling the result using the known maximum value for the implied sample
> >size. Does this sound reasonable?
No.  It appears that you are attempting to analyze this statistic at
the expected value of the data.  That will only give the the expected
value for the statistic if the statistic is linear in the data.
However, your statistic is clearly nonlinear, and not even well
defined at the expected value of the data.  
Try calculating your statistic for each possible value of the data
_before_ you do any averaging.  That is a more appropriate method, and
it avoids having to define what \pi choose 2 might mean.
--
T. Scott Thompson            email:  thompson@charm.net
Severna Park, Maryland       phone:  (410) 431-5027
Return to Top
Subject: A funny story plus a probability problem I would like to know the answer to...
From: "Less Wright"
Date: 12 Dec 1996 08:44:08 GMT
Tonight on the way home from dinner I told some of my programming
colleagues a funny story I got in email, but after the laughs it turned
into a discussion of probability theory, followed by a bet on who's answer
was right...please enjoy the short story below if you haven't already heard
it, but if you can help settle our bet based on the resulting problem it
brought up, I would appreciate it!
The story as I recall it: (if you've already heard, please skip to 'the
bet')...
Two college students, supposedly from UVA, were doing quite well in their
chemistry class and ended up having several free days during finals before
their last exam in chemistry.  Given that they only needed to get a D on
their chem final to pass the course, they decided to enjoy the free days
before the exam by partying at a neighboring college...in their ensuing
drunkeness, they ended up oversleeping on the day they were supposed to
return to take their 'easy' chem exam...technically, this meant they got a
0 and were now going to fail the course.  They drove back to campus, and on
the way concocted a story about how they had returned on time, but were
seriously delayed due to a flat tire, and the ensuing towing and repair
time.  
They told their professor the story, and begged for a make up exam - he
consented, provided that they could not leave the classroom during the exam
for any reason until they were done with their exam, and said they could
take the exam the next day...they stayed up all night reviewing all of
their chemistry knowledge.  They met the professor the next day, and he
then placed them into seperate rooms on each side of the hall, gave them
their exams, shut the door, and he then sat in a desk in the middle of the
hallway.  The students then opened the exam book to see:
1)Question - worth 100 points:
Which tire ?
The bet:
Hopefully you enjoyed the story, but now for the statistical bet which
evolved - what is the probability of both students guessing the correct
tire, given that they hadn't agreed beforehand on which tire had gone flat
in their concocted story?  Everyone but me said 25%, because their are four
tires and only one correct choice.  I disagreed, because my vague
recollection of my probability course says that the real  fact is that you
are evaluating the probability of *two* people choosing the same tire, each
of which has a 25% choice, so your odds of them both randomly picking the
same tire should be less than 25%....i.e. potential outcomes are 16
different combinations of tires (i.e. Student A chooses Right Front,
Student B could choose RF, LF, RB, LB.  A could also choose Left Front, and
B could again choose RF,LF,RB,LB...)  So is it still a 25% chance they both
pick the same tire, or is it 1/16 probability or ??  Actually though, now
that I have written this out, I am starting to think I am wrong - b/c A
could pick any given tire, and B could then pick any tire with a 25%
probability of B's pick matching A...anyway, if you know the definitive
answer and could help us resolve this gentlemen's bet, I would greatly
appreciate it....
Thanks in Advance,
Less
Return to Top
Subject: Definition of "Exponential Order"
From: Tatsuo Ochiai
Date: Thu, 12 Dec 1996 03:01:59 -0800
Could anyone tell me the definition of so-called "exponential order"?
How is this different from "rate of convergence"? 
Thanks in advance.
Tatsuo Ochiai
tochiai@students.wisc.edu
Return to Top

Downloaded by WWW Programs
Byron Palmer