Back


Newsgroup sci.stat.math 12399

Directory

Subject: Art Site -- From: sea-hawk@urgentmail.com
Subject: How to estimate this? -- From: li@stat.ohio-state.edu (Shuying Li)
Subject: Re: PI series needed -- From: "D.J. Wilkinson"
Subject: [Q] Principal component analysis: where? -- From: menca@iroe.iroe.fi.cnr.it (Dr Francesco Mencaraglia)
Subject: Re: What is "mode" in stats? -- From: wilsonj@smtplink.ipfw.indiana.edu (Jeff Wilson)
Subject: CFP: 1997 Fall Technical Conference -- From: sasrdt@shewhart.unx.sas.com (Randall D. Tobias)
Subject: Re: time series analysis, delay coordinate embedding, phase (state) space reconstruction -- From: middleto@mcmail.cis.McMaster.CA (Gerard Middleton)
Subject: Re: Iterative proportional fitting, convergence -- From: nichols@spss.com (David Nichols)
Subject: Question about Confidence intervals -- From: Charles H Ouyang
Subject: Re: Question about repeated measures study -- From: mcohen@cpcug.org (Michael Cohen)
Subject: Re: Sport Statistics Study -- From: mats.liljedahl@mbox200.swipnet.se (Mats Liljedahl)
Subject: Re: PI series needed -- From: jpc@a.cs.okstate.edu (John Chandler)
Subject: Re: Sport Statistics Study -- From: uthed@ais.net
Subject: missing explanatory variables -- From: Sijne van der Beek
Subject: Re: [Q] Principal component analysis: where? -- From: ebohlman@netcom.com (Eric Bohlman)
Subject: Re: PI series needed -- From: Yves Moreau
Subject: Re: The Fuzzy Debate -- From: "D.J. Wilkinson"
Subject: Regression and necessary sample size -- From: Peter Dittrich
Subject: Re: Air Pollution -- From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Test - please respond -- From: "afn27382@afn.org"
Subject: Re: What is "mode" in stats? -- From: "M.GASPARINI"
Subject: Hitting time probability of Brownian motion through deterministic nonlinear barrier -- From: "Pedro Santa-Clara"

Articles

Subject: Art Site
From: sea-hawk@urgentmail.com
Date: Thu, 16 Jan 97 12:45:56 GMT
http://www.geocities.com/Paris/8892/gallery.htm
Return to Top
Subject: How to estimate this?
From: li@stat.ohio-state.edu (Shuying Li)
Date: 15 Jan 1997 23:54:12 -0500
I need to estimate how big a set is. Right now I have no clue.
Please help!
The problem:  Given constants 0< a, b<1, c and large integer N,
How big is 
	{ (m, k) |m+k=N,  m, k>=0,  (m-aN)^2/(aN)+(k-bN)^2/(bN)
Return to Top
Subject: Re: PI series needed
From: "D.J. Wilkinson"
Date: 16 Jan 1997 10:08:25 GMT
Greg Heath (heath@ll.mit.edu) wrote:
> On 13 Jan 1997, Dan Fox wrote:
> > I need the infinite series used to calculate PI. Can anyone help?
> PI  =  4 * arctan(1),
> arctan(x)  =  SUM(k=0,oo){ [(-1)^k] * [x^(2k+1)]/(2k+1) } 
>            =  x - (x^3)/3 + (x^5)/5 - ...
But for x=1, this converges incredibly slowly. However, using the fact 
that arctan(1)=arctan(1/2)+arctan(1/3), and evaluating those two series, 
gives a series which converges much faster.
--
Dr Darren Wilkinson   e-mail
WWW
Return to Top
Subject: [Q] Principal component analysis: where?
From: menca@iroe.iroe.fi.cnr.it (Dr Francesco Mencaraglia)
Date: Thu, 16 Jan 97 17:44:59 GMT
The problem I would like to solve is as follows: there are (many) data from
measurements; each set is an array with about 100-500 data (all arrays have 
the same length); what I would like to do is to separate these arrays in
different groups (sorting them) so that in each group all are more or less
alike. I have been suggested that this could be done with PCA (principal
component analysis).
Could anyone give me some pointer to the argument and (if any) to some 
shareware product ?
Francesco Mencaraglia - IROE - Via Panciatichi 64 - 50127 Firenze - Italy
menca@iroe.iroe.fi.cnr.it
menca@moloch.iroe.fi.cnr.it
Return to Top
Subject: Re: What is "mode" in stats?
From: wilsonj@smtplink.ipfw.indiana.edu (Jeff Wilson)
Date: Thu, 16 Jan 1997 13:13:27 GMT
jzs@europa.com (Justin) wrote:
>In article , tline@iglou.com (Tom Line) wrote:
>The mode is the value of a variable that occurs most frequently....
>Chicken        No. of eggs
>1                             50
>2                                73
>3                             90
>4                                100
>The mode here is 100.
>-- 
>JZS 3=)
No, there is no mode.  The mode is not the largest score, it is the
most frequently occurring score, and in this example all of the scores
have an equal frequency.
In this case:
Chicken			Eggs
1			78
2			60
3			78
4			95
mode = 78
Jeff Wilson					TIP#184
wilsonj@smtplink.ipfw.indiana.edu
Return to Top
Subject: CFP: 1997 Fall Technical Conference
From: sasrdt@shewhart.unx.sas.com (Randall D. Tobias)
Date: Thu, 16 Jan 1997 16:06:11 GMT
                             41st Annual
                      Fall Technical Conference
                                 1997
                           Call for Papers
                "Mining Data for Quality Improvement"
                       Omni Inner Harbor Hotel
                         Baltimore, Maryland
                         October 16-17, 1997
Co-sponsored by:
   American Society for Quality Control
      - Chemical and Process Industries Division
      - Statistics Division
   American Statistical Association
      - Section on Physical and Engineering Sciences
Applied and expository papers are  needed  for  parallel  sessions  in
Statistics, Quality Control, and Tutorial / Case Study.
Detailed submission instructions are available on the Web
   http://www.sas.com/ftc97/
or you can request them from one  of  the  following  members  of  the
program committee:
   Susan L. Albin
   Department of Industrial Engineering
   Rutgers University
   PO Box 909
   Piscataway, NJ  08855-0909
   phone: 908-445-2238
   email: salbin@rci.rutgers.edu
   FAX: 908-445-5467
   Sharon Fronheiser (to whom paper correspondance should be addressed)
   Eastman Kodak Company
   151 Mill Hollow Crossing
   Rochester, NY  14626
   phone: 716-588-2014
   email: sharonf@kodak.com
   FAX: 716-722-4415
   Randy Tobias (to whom electronic correspondance should be addressed)
   SAS Institute Inc.
   SAS Campus
   Cary, NC  27513-2414
   tel: 919-677-8000 x7933
   email: sasrdt@unx.sas.com
   FAX: 919-677-8123
The submission  process will  start on  August 1, 1996 and conclude on
January 17, 1997.  Papers should be  strongly justified by application
to  a problem in  quality  control,  or  the  chemical,  physical,  or
engineering sciences.  The mathematical level of papers may range from
none,  to  that  of  the  Journal of  Quality  Technology, or  that of
Technometrics.
-- 
Randy Tobias          SAS Institute Inc.     sasrdt@unx.sas.com
(919) 677-8000 x7933  SAS Campus Dr.         us024621@interramp.com
(919) 677-8123 (Fax)  Cary, NC   27513-2414
   Faith, faith is an island in the setting sun.
   But proof, yes: proof is the bottom line for everyone.
                                                       -- Paul Simon
Return to Top
Subject: Re: time series analysis, delay coordinate embedding, phase (state) space reconstruction
From: middleto@mcmail.cis.McMaster.CA (Gerard Middleton)
Date: 16 Jan 1997 10:02:50 -0500
The best overall reference for techniques to analyse possibly chaotic 
time series is the book by Abarbanel
Henry D.I. Abarbanel, 1996, Analysis of Observed Chaotic Data.
   Springer-Verlag, 272 p. ISBN 0-387-94523-7
It will not solve all your problems, though, just add to them!!
-- 
Gerry Middleton
Department of Geology, McMaster University
Tel: (905) 525-9140 ext 24187 FAX 522-3141
Return to Top
Subject: Re: Iterative proportional fitting, convergence
From: nichols@spss.com (David Nichols)
Date: 16 Jan 1997 18:51:21 GMT
In article <5bgpup$dq@samba.rahul.net>,
Theodore Sternberg   wrote:
>Has anyone worked out sufficient conditions for convergence of the
>iterative proportional fitting algorithm?  Maybe at least for the special
>case where one starts from a collection of bivariate and univariate
>marginal configurations? 
>
>I've turned Bishop, Fienberg et al upside down, and no dice.  All they say
>is that Darroch (J Roy S Soc B 1962) worked this out, but I've read
>Darroch and he doesn't offer any such theorem.  (The closest he comes is
>something about characterising configurations where ipf will converge in
>one step.)
>
>Necessary conditions are a dime a dozen--familiar adding-up constraints,
>positive-semidefiniteness of the implied correlation matrix, nonnegative
>probabilities--but even all together they don't appear to be sufficient. 
>
>Ted Sternberg
>San Jose, California USA
Check page 186 of Agresti's _Categorical Data Analysis_ for discussion
and references.
--
-----------------------------------------------------------------------------
David Nichols             Senior Support Statistician              SPSS, Inc.
Phone: (312) 329-3684     Internet:  nichols@spss.com     Fax: (312) 329-3668
-----------------------------------------------------------------------------
Return to Top
Subject: Question about Confidence intervals
From: Charles H Ouyang
Date: Thu, 16 Jan 1997 17:34:10 -0500
Hi,
   I am writing a program to calculate Critical Area by sampling.
Info about data:
    1. Assume normal distribution.
    2. The total population is known.
    3. The samples are equally weighted and have a min value of 0
       and a known max value.
    The idea is that total population (Pt) is large, and is too expensive
    to extract all, so I only want to extract n samples.
    I have an equation which uses the info 1 above and can calculate the
    confidence interval to get a range in which actual mean is located
    using sample mean and sample variance.
Equation:
    (X - z * s / sqrt(n), X - z * s / sqrt(n))
        where: X - calcualted sample mean
               z - a table value assigned based on the desired confidence
                   level.
               s - calculated sample variance
Question:
    Is there any way I can use info 2 and 3 to further narrow my
    inteerval?
Thanks,
Charles O.
Return to Top
Subject: Re: Question about repeated measures study
From: mcohen@cpcug.org (Michael Cohen)
Date: 16 Jan 1997 23:38:31 GMT
Dick Adams (rdadams@access2.digex.net) wrote:
: 
: > Since the two groups differ on age, sex, and rural/urban residence,
: > we would like to adjust for these factors in the analysis.
: 
: Rather than "adjusting" (whatever that means), I would test to 
: see if there is an effect at each measurement for age, gender, 
: or residence.  (BTW: Gender = Female/Male; Sex = Yes/No)
: 
Originally, it was Sex=Female/Male; Gender=Feminine/Masculine.  In the
U.S. Federal Government, Sex is still the accepted term (Office of
Management and Budget rules) for the Female/Male question.  In journals
that use APA style, gender is now the required term. 
At any rate, I would be inclined to include these variables in the
analysis as independent variables.
-- 
Michael P. Cohen                       home phone   202-232-4651
1615 Q Street NW #T-1                  office phone 202-219-1917
Washington, DC 20009-6310              office fax   202-219-2061
mcohen@cpcug.org
Return to Top
Subject: Re: Sport Statistics Study
From: mats.liljedahl@mbox200.swipnet.se (Mats Liljedahl)
Date: Thu, 16 Jan 1997 20:05:47 GMT
One of the best books on math I have got is "The Mathematics of Games"
by John D. Beasley.   Golf and football (soccer I think) are a few of
the subjects covered in that book.
Mats Liljedahl
mats.liljedahl@mbox200.swipnet.se
Return to Top
Subject: Re: PI series needed
From: jpc@a.cs.okstate.edu (John Chandler)
Date: 17 Jan 1997 04:27:24 GMT
In article <5bkump$nij@whitbeck.ncl.ac.uk>,
D.J. Wilkinson  wrote:
>Greg Heath (heath@ll.mit.edu) wrote:
>> On 13 Jan 1997, Dan Fox wrote:
>
>> > I need the infinite series used to calculate PI. Can anyone help?
>
>> PI  =  4 * arctan(1),
>> arctan(x)  =  SUM(k=0,oo){ [(-1)^k] * [x^(2k+1)]/(2k+1) } 
>>            =  x - (x^3)/3 + (x^5)/5 - ...
>
>But for x=1, this converges incredibly slowly. However, using the fact 
>that arctan(1)=arctan(1/2)+arctan(1/3), and evaluating those two series, 
>gives a series which converges much faster.
It's true that (1 - 1/3 + 1/5 - 1/7 + ...) converges slowly,
but you can speed up the convergence greatly using either Euler's
averaging transformation with Mark Townsend's two-thirds rule, 
repeated Aitken delta-squared extrapolation,
or Wynn's epsilon algorithm, to mention only three
of the simplest acceleration algorithms.
Using only twenty terms in the series, the accelerated
approximations to the limit are accurate to more than
ten decimal digits, as I recall.
There are better ways to compute pi than this, it's true, 
but pessimistic pronouncements such as Knopp's statement that
the series above converges too slowly to be of any practical
use are quite incorrect.  Apparently Knopp never met Aitken.
Acceleration methods also yield useful results when applied
to many, and perhaps most, divergent series.
(x - x^2/2 + x^3/3 - x^4/4 + ...) only converges to ln(1+x)
for abs(x)<1, but the diagonals of the epsilon algorithm
tableau converge for all x>-1 and provide very accurate values
of ln(1+x) for x=2 or x=10 or x=100 or...
-- 
John Chandler
jpc@a.cs.okstate.edu
Return to Top
Subject: Re: Sport Statistics Study
From: uthed@ais.net
Date: Thu, 16 Jan 1997 22:35:11 -0500
Richard Scott wrote:
> 1) Is this a feasible area of research?  I have been working on it in a
> basic way given my own limited resources and time for the past few years.
> 2) Following from 1) is that at all useful, employment-wise do you think?
Two words . . . "sports gambling" . . . where do you think the SPREADS
come from? Hold your nose and you'll get rich. Hello Las Vegas . . .
Return to Top
Subject: missing explanatory variables
From: Sijne van der Beek
Date: Fri, 17 Jan 1997 08:02:00 +0100
Dear all,
I'm analyzing a set of mortality data in which many explanatory
 variables are missing. The trait I want to explain is binary. It is
probably save to discard all observations for which the expl. var. is
missing. However, if I then put several of those in my model, I hardly
have any observations left. So my question is: is it allowed to
proceed as follows: 
I assume data missing at random. If have observed distributions
for expl var given trait =0 and for expl. var. given trait = 1. For
an observation with a missing expl. var. I sample this exp. var. 
from the distribution expl. var. given trait=0 if for this observation
the trait is 0 else from the observed distribution expl. var. given
trait=1. 
What I hope to gain relative to discarding observations is that
observations for which only one of expl. variables is missing are
still contributing information.
Can anybody tell me if I am on the correct track?
Thanks, 
Sijne van der Beek.
Sijne.vanderBeek@alg.vf.wau.nl
Return to Top
Subject: Re: [Q] Principal component analysis: where?
From: ebohlman@netcom.com (Eric Bohlman)
Date: Fri, 17 Jan 1997 09:08:53 GMT
Dr Francesco Mencaraglia (menca@iroe.iroe.fi.cnr.it) wrote:
: The problem I would like to solve is as follows: there are (many) data from
: measurements; each set is an array with about 100-500 data (all arrays have 
: the same length); what I would like to do is to separate these arrays in
: different groups (sorting them) so that in each group all are more or less
: alike. I have been suggested that this could be done with PCA (principal
: component analysis).
This actually sounds more like a cluster analysis problem; cluster 
analysis deals with forming groups of individuals; factor analysis 
(related to but not the same things as PCA) deals with forming groups of 
variables.
Return to Top
Subject: Re: PI series needed
From: Yves Moreau
Date: Fri, 17 Jan 1997 10:19:36 +0100
Have a look at:
http://www.nas.nasa.gov/NAS/RNRreports/dbailey/pi/pi.html
http://www.users.globalnet.co.uk/~nickjh/pi_links.htm
Yves
Return to Top
Subject: Re: The Fuzzy Debate
From: "D.J. Wilkinson"
Date: 17 Jan 1997 09:19:22 GMT
Dogmat (dogmat@aol.com) wrote:
> I build chemical plants, and we talk about WAGs and SWAGs. A WAG is
> a wild-ass-guess. A SWAG is a scientific-wild-ass-guess. (Reassuring isn't
> it?) That is my reality folks.
And the formalism you require is subjective probability. YOUR probabilty 
for an event is the proportion of a dollar YOU think is a fair price for 
the bet which pays a dollar if the event occurs, and nothing otherwise. 
> Very often,. I have little to no
> information, I'm extrapolating an empirical model beyond its known region,
> I get one shot at it, I cannot trust the information I do have, and I
> still have to guarantee that the chemical plant will operate
> satisfactorily and safely.
So you are making one-off decisions based on uncertain information. 
Utility theory can help here.....
> The probability statistics I know doesn't have
> a chance in hell of modelling my situation -- my real-world violates every
> classical assumption in the book. So what do I do? Learn how to SWAG very
> well,and add a little safety margin (actually a lot; although the exact
> number is unknown, generally each new plant is about 20%-25%
> overdesigned.)
I'm glad you add a margin of safety, you stickler, you! :-)
> I have never heard a
> satisfactory explanation of what 30% possibility really means, but I know
> I can provide that value.
I have never heard a satisfactory answer either, but the definition of 
subjective _probability_ I give above does give real meaning to 
statements of uncertainty. 
> Here is a question (not original) for the probability zealots. Say I
> "know" that a value (x) is between 10 and 20. But that is ALL I know. Show
> me a probability distribution that models that (and no more) and I will
> never look at possibilities again.
There is no  probabilistic representation of only that information. 
However, constraint propagation is easy - it is discussed in many books 
on computer programming. The trouble is, you are wanting to make 
probabilistic inferences based on constraint statements - you can't. Make 
probabilistic statements if you want probabilistic inferences. In fact, 
in most practical situations, you wouldn't know _for_sure_ that the 
variable is constrained between 10 and 20. You would simply know that it 
rarely ever strays outside that range. You could therefore make a 
statement like E(X)=15, Var(X)=(10/6)^2. It is possible to make 
inferences based on limited specifications such as those (Bayes linear 
methods, http://fourier.dur.ac.uk:8000/stats/bayeslin/).
See - "fuzzy logic" is not only the most aptly named subject I know, but 
it isn't needed or helpful, anyway.... :-)
--
Dr Darren Wilkinson   e-mail
WWW
Return to Top
Subject: Regression and necessary sample size
From: Peter Dittrich
Date: Fri, 17 Jan 1997 16:22:03 +0100
Hi:
For several statistical procedures (e.g. calculation of confidence
limits, tests for difference of means etc.) the necessary sample size
can be estimated. Textbooks describe how to do it, software is
available.
But what about regression ( in my case especially nonlinear fit of the
logistic and the Hill equation to describe dose-response relationships)
? How many data points and how many replicates are necessary to
sufficiently describe a fit or to do this with a given certainty ?? Is
there a general answer possible or are there at least practical
recommendations ? Where can I find books or other literature ?
Thank you very much for your help.
Peter
Return to Top
Subject: Re: Air Pollution
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 17 Jan 1997 17:01:28 GMT
joew (joew@bda-inc.com) wrote:
: Air pollution standards are uasually given in ppm or ug/m^3 along with a 
: time value (such as annual arithmetic mean or maximum 24 hr concentration 
: etc.)  I am looking for these, generally speaking, in tons of pollutant 
: per year.  
The "ambient concentration", which is what is measured and regulated in
ppm, depends on how much is emitted, divided by how broadly it is
dispersed, and how rapidly it is filtered out or settles out or converts
to some other form (for example, SO2 gas becoming sulfates in small
particles).
So, regulations for the human environment - either the city street, or
the factory workfloor - are in different terms from the regulations
at the smokestack, or what pours from the mouth of a sewage pipe, where
it is proper and necessary to look at the total tonnage that is 
released.
: 	This was the question as it was posed to me.  Make any 
: assumptions.
: Specifically for:
: SO2 -	60 ug/m^3 annual arithmetic mean
: PM10 	50 ug/m^3 annual arithmetic mean
: NOx -	100 ug/m^3 annual arithmetic mean (comprised mainly of NO & NO2)
I think that the kind of assumptions that you have to make are like
this:  total emissions are measured (say) for the Ohio River valley, 
and areas up to the Rocky Mountains;  average ambient level is
what is measured in the central part of the state of New York.  Then
the historical relationship gives you a translation-factor, where
you could argue about the relevant geography of the sources and the area
where you are measuring ambient levels.
But there is no automatic way to convert emissions to ambient levels,
or vice-versa.
Rich Ulrich, biostatistician                wpilib+@pitt.edu
http://www.pitt.edu/~wpilib/index.html   Univ. of Pittsburgh
Return to Top
Subject: Test - please respond
From: "afn27382@afn.org"
Date: Fri, 17 Jan 1997 11:49:19 -0500
Has this message made it to a newsgroup?  If so, let me know.
Return to Top
Subject: Re: What is "mode" in stats?
From: "M.GASPARINI"
Date: Thu, 16 Jan 1997 07:52:43 GMT
Justin wrote:
> for example, a survey was taken to see how many eggs were made from
> chickens in a 1 year period. The chickens were numbered 1,2,3, and 4 for
> easy identification.
> 
> the results were as follows:...
> 
> Chicken        No. of eggs
> 1                             50
> 2                                73
> 3                             90
> 4                                100
> 
> The mode here is 100.
Nope, the mode here is 4.
Mauro.
Return to Top
Subject: Hitting time probability of Brownian motion through deterministic nonlinear barrier
From: "Pedro Santa-Clara"
Date: 17 Jan 1997 19:36:52 GMT
Hi everybody,
I want to know the probability that a Brownian motion, $W$, will hit the 
barrier
\[ B(t) = ae^{b(T-t)} + ct + d \]
in the interval $[0,T]$. $W$ starts at zero in time $0$ and $a$, $b$, 
$c$ and $d$ are constants such that, at $0$, the barrier is negative, 
$B(0)<0$.  
My hope would be that this probability could be written as a one 
dimensional integral with respect to time.
Pedro Santa-Clara
The Anderson School at UCLA
psantacl@anderson.ucla.edu
Return to Top

Downloaded by WWW Programs
Byron Palmer