Newsgroup sci.stat.math 11694

Directory

Subject: What is the expectation of the distance between 2 points in a unit square? -- From: Rick Johns
Subject: estimation of multivariate AR(1) process, code wanted -- From: buerger@buerger.pik-potsdam.de (Dr. G. Buerger)
Subject: Re: What is the difference between chaotic and random? -- From: randrew@teton.UVic.CA (Rex Andrew)
Subject: Re: Pronunciation of LaTeX -- From: crockett@Mickey.stat.unc.edu (Patrick Crockett)
Subject: Re: What is the difference between chaotic and random? -- From: "Robert. Fung"
Subject: Re: Occam's razor & WDB2T [was Decidability question] -- From: maj@waikato.ac.nz (Murray Jorgensen)
Subject: Re: Pronunciation of LaTeX -- From: buhr@stat.wisc.edu (Kevin Buhr)
Subject: Measure of homogenity -- From: "Dr. Torben Wiede"
Subject: Re: Pronunciation of LaTeX -- From: sande@haven.ios.com (Gordon Sande)
Subject: Re: What is the difference between chaotic and random? -- From: aacbrown@aol.com
Subject: Re: What is the difference between chaotic and random? -- From: aroberts@usq.edu.au (Tony Roberts)
Subject: Re: Occam's razor & WDB2T [was Decidability question] -- From: maj@waikato.ac.nz (Murray Jorgensen)
Subject: DATA NETWORKS by DIMITRI BERTSEKAS/ROBERT GALLAGER -- From: dtsmith@hiwaay.net
Subject: Re: Bounds on variances -- From: rwhutch@nr.infi.net
Subject: Re: What is the expectation of the distance between 2 points in a unit square? -- From: rbcrosie@apgea.army.mil (Ronald B. Crosier)
Subject: Re: What is the difference between chaotic and random? -- From: aacbrown@aol.com
Subject: Re: Bonferroni's Method -- From: aacbrown@aol.com
Subject: Re: Correction to: Re: Bounds on variances -- From: aacbrown@aol.com

Articles

Subject: What is the expectation of the distance between 2 points in a unit square?
From: Rick Johns
Date: 13 Nov 1996 16:28:07 GMT
This is a problem passed to me by another.  Although I can figure out 
what the mean distance between two points in a unit square is by 
empirical sampling, I would like to know how to express the expectation 
mathematically.  Can anyone help with this?  I am also interested in 
having a general solution for higher dimensions as well.  Thanks in 
advance.
Return to Top
Subject: estimation of multivariate AR(1) process, code wanted
From: buerger@buerger.pik-potsdam.de (Dr. G. Buerger)
Date: 13 Nov 1996 17:17:59 +0100
Hi,
as the subject says. I need to fit a multivariate
	x(t+1) = S*x(t) + e(t),
with e(t) being as white as possible.
Some theory?
Some code?
	Gerd
Return to Top
Subject: Re: What is the difference between chaotic and random?
From: randrew@teton.UVic.CA (Rex Andrew)
Date: 13 Nov 1996 18:30:24 GMT
I haven't been involved much in chaos, but Aaron's comments on
this thread led me to immediately wonder if a correlation function
of a time series would distinguish a random time series from a 
chaotic one. Surely this would work to distinguish between an 
IID process and one with a distinctive phase space map?
Just wondering.
Rex
??????????????????????????????????????????????????????????????????????
Rex K. Andrew, Science Voyeur			randrew@sirius.UVic.CA
Dept. of Elec. & Comp. Eng.			(604)656-6079 (H)
University of Victoria, BC, CA			(604)363-6798 (FAX)
Return to Top
Subject: Re: Pronunciation of LaTeX
From: crockett@Mickey.stat.unc.edu (Patrick Crockett)
Date: 13 Nov 1996 18:07:11 GMT
Hideo Hirose   wrote:
>In Japan, many researchers pronounce LaTeX as "latef." Is it correct? How do you 
>pronounce TeX and LaTeX actually, especially in the united states?
Most of us pronounce it like the paint: "LAY tecks", though I recall
reading somewhere that Knuth had in mind some arcane pronounciation --
maybe with a sort of glottal X like a Greek chi.
Return to Top
Subject: Re: What is the difference between chaotic and random?
From: "Robert. Fung"
Date: Wed, 13 Nov 1996 15:27:37 -0500
Robert Dodier wrote:
> 
> Hello all,
> It's a beautiful blue day here in Boulder, hope it's the same whereever
> you are.
> 
> Troy Shinbrot wrote:
> >
> > In article <19961108025700.VAA15372@ladder01.news.aol.com>,
> > jksnyder@aol.com wrote: > >
 > > > Please excuse the elementary nature of the question, I am just learning
 > > > about chaotic systems.  Is there a difference between chaotic systems and
 > > > random systems?  If so, could the difference be measured/quantified by
 > > > plotting the series on normal probability paper?
 > > >
 > > > I generated a chaotic series of 1,000 numbers, between zero and one,
 > > > using the logistic equation and similarly generated a series of 1,000
 > > > numbers, between zero and one, with a random number generator.  After
 > > > ordering both series, they both plotted as straight lines on normal
 > > > probability paper.  Therefore, by this test, chaotic and random number
 > > > series appear to have a normal distribution.
 > 
 > This seems very strange; the invariant measure for ax(1-x) is far from
 > normal. For a=4, the invariant measure is continuously differentiable
 > and thus there is a density function, which looks like a U. For a < 4,
 > the invariant measure has a lot of spikes (I don't recall if the number
 > of spikes is finite, countable, or uncountable), so there is no density.
 > Also, when you say a random number between 0 and 1, I believe you must
 > mean uniformly distributed; again, this is anything but normal. If you
 > make a histogram of the 1000 numbers, what shape do you get?
 > 
 > > > Are there other measures, such as analysis of variance, that could
 > > > distinguish between random and chaotic series?
 > 
 > I'll try to argue that this is, for practical purposes, not a meaningful
 > question. First, let's review what Mr. Shinbrot wrote. By the way, this
 > is
 > indeed a great question, which touches on a fundamental issue.
 > 
 > > A great question.  First a practical answer for this particular problem.
 > > If X(n) denotes the n-th value of your time series, if you plot X(n) vs.
 > > X(n-1), you will get a mess for the random data, because the n-th value of
 > > the time series does not depend on the n-1'st value.  For the logistic
 > > data, you will get a parabola (obviously).
 > >
 > > Thus because of the deterministic nature of chaos, one value depends on
 > > its history, while random data does not.  Vis a vis statistical tests, if
 > > you randomize the ORDER of the logistic data, you will have two data sets,
 > > one logistic and a second randomized-order logistic, both of which are
 > > guaranteed to have EXACTLY the same mean, variance, skew, kurtosis, or
 > > anything else you would care to measure.  It is only the determinism of
 > > the data sets that differ.
 > 
 > I don't think dependence on the past is a suitable way to distinguish
 > random from chaotic processes. I'm sure we'll all agree that a Markov
 > process is a random process, yet such a process may have a very strong
 > dependence on past states.
 > 
 > > The second answer is more pedagogical: the logistic data have dimension at
 > > most 1.  That is, they all lie precisely on the parabola I mentioned, and
 > > one variable is all that is needed to define the state and thus determine
 > > the future state of the system.  The random data are (ideally) infinite
 > > dimensional: an ideal random number generator would require an infinite
 > > number of variables to define the future state.  Practically we all know
 > > that this isn't quite true, but that is the strict answer.
 > 
 > There is nothing in the textbook definition of a random variable that
 > requires that it be generated by an infinite-dimensional problem. In
 > order
 > for all the definitions about expected value, distribution function,
 > density, etc etc to work, all that is required is that the generating
 > process have a unique invariant measure, so that time averages over
 > process values equal weighted averages taken over the state space (with
 > the invariant measure doing the weighting). That is, a random variable
 > need not be generated by an infinite-dimensional process; the process
 > need
 > only be ergodic -- this is a much weaker condition.
 > 
 > Incidentally, for ergodic processes the frequentist and
 > measure-theoretic
 > definitions of probability coincide. I don't think the followers of
 > these
 > two schools really differ on any practical point.
 > 
 > So I've pointed out that random processes can be low-dimensional, but
 > I could make the argument a little more convincing by coming up with
 > some examples of deterministic system which has an everyday distribution
 > as its invariant measure. So far I can't think of a low-dimensional
 > system
 > which has an invariant measure which is approximately normal, say.
 > Can anyone name such a system?
 >
    For a small finite dimensional state space like a single die throw, one
    can devise an infinite number of deterministic machines to throw the die
    with differing levels of randomness while maintaing a flat distribution across 
    each of the 6 states, over a projected infinite number of trials. 
    If one machine "A" throws 1,2,3,4,5,6,1,2,3,4,5,6.... and 
    the another "B" simularly throws 1,2,3,4,4,3,6,5,1,2,5,6,1,2,... what is the measure 
    that says "B" is more random than "A" ? Both are deterministic and 
    have flat distributions of their states. 
    Should a random process be proveably non-deterministic ?
Return to Top
Subject: Re: Occam's razor & WDB2T [was Decidability question]
From: maj@waikato.ac.nz (Murray Jorgensen)
Date: Thu, 14 Nov 96 16:06:08 GMT
I regret that I do not have the time to respond to this thread in detail. 
I have looked at Geoff Webb's article in
http://www.cs.washington.edu/research/jair/table-of-contents-vol4.html
and it seems to conflict with all my intuition built up as a practising 
statistician.
The subject of 'machine learning' is very closely connected with the 
fitting of statistical models to empirical data. What the ML people have 
contributed is a range of new algorithms and models but the fundamental 
questions remain unchanged. It is widely accepted in the statistical 
community that 'overfitting' of a data set [using a needlessly complex 
model] results in a fitted model closely tuned to that particular data 
set that has poor predictive power. This is not to say that there is not 
additional complexity to be discovered, just that the data set under 
consideration does not contain enough information about possible 
elaborations to the model to make it safe to fit them.
I recommend the book 
Model Selection   by H. Linhart and W. Zucchini
Wiley 1986   ISBN  0-471-83722-9
Murray Jorgensen
In article <32837820.7ACB@postoffice.worldnet.att.net>,
   kenneth paul collins  wrote:
>kenneth paul collins wrote:
>
>> From the view of WDB2T, Occam's Razor can be sharpened a bit. In
>> terms of WDB2T, the more-complex alternative simply does not fit
>> observations well, and it can be rejected solely on that basis.
>> And when one looks, one sees that this is the the same point that
>> I've been working to make with respect to the relative utility of
>> conventional Logic and this "new" WDB2T-optimization "Logic" I am
>> proposing.
>
>[For those who didn't read the "Decidability" thread, "WDB2T" is an 
>acronym for "What's Described By the 2nd law of Thermodynamics". 
>WDB2T refers to the Physical Reality that is described by 2nd Thermo, 
>not 2nd Thermo itself.]
>
>Yesterday, I came across a short, unsigned, report in the Nov 96 
>issue of _Discover_ magazine, p34, "Is Occam's Razor Rusty?". The 
>article reports on work done by Geoffrey Webb at Deakin University in 
>Geelong, Austrailia. In a series of experiments, Webb found that, 
>(quoting from the _Discover_ article) "for 12 of 13 problems analyzed 
>by the computer, the more complex decision-making process gave more 
>accurate results".
>
>The _Discover_ article quotes Webb: "'People are potentially missing 
>out on useful patterns because they're just looking for the simple 
>ones,' says Webb. 'Occam's razor influences and limits what science 
>can do with information." 
>
>The article ends without clarifying the point, but my interpretation 
>is that it's (Webb is) saying that Occam's razor =erroneously= 
>"influences and limits what science can do with information", and 
>since this contradicts the position that I've recently taken here in 
>sci.logic with respect to Occam's razor, I wish to explore this 
>matter further.
>
>This msg is an introduction to the new thread. I'll post further 
>discussion later today. ken collins
>_____________________________________________________
>People hate because they fear, and they fear because
>they do not understand, and they do not understand 
>because hating is less work than understanding.
Return to Top
Subject: Re: Pronunciation of LaTeX
From: buhr@stat.wisc.edu (Kevin Buhr)
Date: 13 Nov 1996 15:45:38 -0600
-----BEGIN PGP SIGNED MESSAGE-----
Hideo Hirose  writes:
| 
| In Japan, many researchers pronounce LaTeX as "latef." Is it
| correct? How do you pronounce TeX and LaTeX actually, especially in
| the united states?
Donald E. Knuth spends an entire chapter of the TeXbook on the
pronounciation of "TeX".  Okay, the "chapter" is only one page long,
and he talks about some other stuff, too.  However, it is abundantly
clear that the author of "TeX" wants you to pronounce it "teck" (i.e.,
rhymes with "blech"), not "tecks".  Evidently, "when you say it
correctly to your computer, the terminal may become slightly moist".
If I remember correctly, Leslie Lamport makes it clear that there is
*no* official pronounciation of "LaTeX".  He recommends one of:
	(1)  lah-TECK
	(2)  LAH-teck
	(3)  LAY-teks
However, on any CTAN site, the "CTAN/latex/intro.tex" document will
tell you that LaTeX should be pronounced as one of:
	(1)  Lah-tech
	(2)  Lay-tech
(in each case, rhyming with blech).
So, the correct solution is to avoid pronouncing LaTeX whenever
possible.  And if you must pronounce it, try to mumble.
Kevin 
-----BEGIN PGP SIGNATURE-----
Version: 2.6.3
Charset: noconv
Comment: Processed by Mailcrypt 3.4, an Emacs/PGP interface
iQBVAwUBMopBW4mVIQW1OgXhAQHJ/wIAhOCzstNW3GLyvA8OG3vmh4IV198NgwY3
STX/mxnGmhLdIrfK1FEWmT6sf6ae56exTUmODdKRvbKZ+Ou7qFxYaA==
=ZN8k
-----END PGP SIGNATURE-----
Return to Top
Subject: Measure of homogenity
From: "Dr. Torben Wiede"
Date: Wed, 13 Nov 1996 16:07:21 +0100
I am looking for measures of homogenity (or 
distance) for two distributions of nominal scaled 
variables.
These measures should not compare only the location 
or scale parameters of these distributions but the 
whole informations of the distributions.
Where can I find some literature about such 
measures.
Torben Wiede
University of Bamberg
Germany
email: torben.wiede@sowi.uni-bamberg.de
Return to Top
Subject: Re: Pronunciation of LaTeX
From: sande@haven.ios.com (Gordon Sande)
Date: Wed, 13 Nov 96 22:27:54 GMT
In Article <56d2of$5ag@newz.oit.unc.edu>, crockett@Mickey.stat.unc.edu
(Patrick Crockett) wrote:
>Hideo Hirose   wrote:
>>In Japan, many researchers pronounce LaTeX as "latef." Is it correct? How do
you 
>>pronounce TeX and LaTeX actually, especially in the united states?
>
>Most of us pronounce it like the paint: "LAY tecks", though I recall
>reading somewhere that Knuth had in mind some arcane pronounciation --
>maybe with a sort of glottal X like a Greek chi.
Consult page 1 of "The TeXbook" for the definitive statement from the author.
Think of 'technology', to get the basic Greek correct. There is a detour
through Scottish 'loch' before the warning that if you get it right your
computer terminal (sic) may become slightly moist.
Gordon Sande
Return to Top
Subject: Re: What is the difference between chaotic and random?
From: aacbrown@aol.com
Date: 14 Nov 1996 01:35:07 GMT
RVICKSON@MANSCI.uwaterloo.ca (Ray Vickson) in
<56as3k$9ec_001@eng.uwaterloo.ca> writes:
> I have a naive question. If chaotic systems can have rapidly
> diverging orbits just by changing the initial condidions, say out
> in the 50th decimal place, then how do we obtain information
> about chaotic behavior by computation? (i.e., things like regions
> of stability, etc.) Don't we have to be able to compute orbits to
> infinite accuracy in order to study chaotic behavior numerically?
Not necessarily. A system may be chaotic in one representation but stable
in another. For practical prediction purposes, chaos theory is only useful
if we can find such a representation. Otherwise it is only a philosophic
subtlety whether we call a system chaotic or random.
In many statistical methods we begin by smoothing data, eliminating sudden
gaps or outliers, transforming our data into independent, gaussian
variables. If the series is chaotic, rather than random, this is a
mistake. You should concentrate on the sudden gaps and outliers.
Statistics pays most of the attention to the most significant digits,
chaos to the least significant digits.
Aaron C. Brown
New York, NY
Return to Top
Subject: Re: What is the difference between chaotic and random?
From: aroberts@usq.edu.au (Tony Roberts)
Date: Thu, 14 Nov 1996 11:53:34 +1000
In article <3287777C.73D8@colorado.edu>, dodier@colorado.edu wrote:
> I could make the argument a little more convincing by coming up with 
> some examples of deterministic system which has an everyday distribution
> as its invariant measure. So far I can't think of a low-dimensional
> system
> which has an invariant measure which is approximately normal, say. 
> Can anyone name such a system?
> 
It may not have a name, but it is easy to sketch an algorithm:
Given a number x,
* map it to y in [0,1] using erf (so that a normal distribution of x
corresponds to a uniform distribution of y)
* map z=sin^2(y\pi/2), I recall, so that the uniform distribution in y
corresponds to the invariant distribution of the logistics map at a=4
* iterate once: z'=4z(1-z)
* invert the previous maps z'->y'->x'
These deterministic but chaotic iterates x_n will have a normal distribution.
                                                Tony
---------------------------------------------------------------------
Professor A.J. Roberts     
Dept of Mathematics & Computing     E-mail: aroberts@usq.edu.au
University of Southern Queensland   Phone:  (076) 312943
Toowoomba, Queensland 4350          Fax:    (076) 312721
AUSTRALIA                           WWW: http://www.sci.usq.edu.au
                                    /pub/MC/staff/robertsa/home.html
---------------------------------------------------------------------
Return to Top
Subject: Re: Occam's razor & WDB2T [was Decidability question]
From: maj@waikato.ac.nz (Murray Jorgensen)
Date: Thu, 14 Nov 96 21:15:11 GMT
In article <56dgil$fcs@netserv.waikato.ac.nz>,
   maj@waikato.ac.nz (Murray Jorgensen) wrote:
>I regret that I do not have the time to respond to this thread in detail. 
> . . .
Apologies, but I forgot to point out that I added the newsgroups
sci.stat.math  and  comp.ai.neural-nets
to a thread which started in sci.logic.
Murray Jorgensen,  Department of Statistics,  U of Waikato, Hamilton, NZ
-----[+64-7-838-4773]---------------------------[maj@waikato.ac.nz]-----
Doubt everything or believe everything: these are two equally convenient
strategies. With either we dispense with the need to think.
                                                       - Henri Poincare'
Return to Top
Subject: DATA NETWORKS by DIMITRI BERTSEKAS/ROBERT GALLAGER
From: dtsmith@hiwaay.net
Date: Thu, 14 Nov 1996 05:58:40 GMT
DATA NETWORKS by DIMITRI BERTSEKAS/ROBERT GALLAGER
I am looking for a book that covers the basic theory of Data Networks
and it's application in today's environment. This book, which I have
only skimmed, is my only example of applied mathematics in this field.
I am sure there are many good books out on this subject.
I will endevor to return the favor to anyone who takes the time to
respond.
DTS
dtsmith@hiwaay.net
Return to Top
Subject: Re: Bounds on variances
From: rwhutch@nr.infi.net
Date: 14 Nov 1996 07:14:31 GMT
In <19961112152500.KAA26511@ladder01.news.aol.com>, aacbrown@aol.com writes:
>ahmed shabbir  in
> asks:
>
>> How can I estimate (by sampling) upper and lower bounds
>> of the variance of a random variable with unknown distribution.
>
>The variance is a parametric statistic. If you know nothing about the
>distribution then you cannot set exact confidence intervals. Therefore you
>have two choices:
>
>(1) Adopt a non-parametric measure of spread, say the interquartile range,
>and set exact confidence intervals for any distribution; or
>
>(2) Make some assumptions about your distribution, say that the tails
>beyond 1% probability are exponential or thinner, then set worst-case
>confidence intervals. Obviously, the stronger the assumptions you are
>willing to make the tighter your confidence bands can be.
>
>
>
>
>Aaron C. Brown
>New York, NY
	Actually, I vaguely remember that the variance CAN be treated as a
non-parametric statistic, via the theory of U-Statistics, and that it is POSSIBLE
thereby to come up with asymptotic and very broadly valid confidence intervals
for the variance. The catch is that the sample sizes that are necessary for such
asymptotic confidence intervals to carry much plausibility are rather on the large
side for most applications. For most fields of application, the result is thus of
theoretical interest only.
--------------------------------------------------------------
"I would predict that there are far greater mistakes waiting
to be made by someone with your obvious talent for it."
Orac to Vila. [City at the Edge of the World.]
-----------------------------------------------
R.W. Hutchinson. | rwhutch@nr.infi.net
Return to Top
Subject: Re: What is the expectation of the distance between 2 points in a unit square?
From: rbcrosie@apgea.army.mil (Ronald B. Crosier)
Date: Thu, 14 Nov 96 13:44:13 GMT
In article <56csun$85@corn.cso.niu.edu>, Rick Johns   wrote:
>This is a problem passed to me by another.  Although I can figure out 
>what the mean distance between two points in a unit square is by 
>empirical sampling, I would like to know how to express the expectation 
>mathematically.  Can anyone help with this?  I am also interested in 
>having a general solution for higher dimensions as well.  Thanks in 
>advance.
>
I sent Rick Johns an old post by Steve Finch (from sci.math) that
contains the answer he (Rick) wants.  Part of Steve's post follows.
--
Ghosh also provides formulas for the moments of the ... distribution.  In the
special case of a square with unit sides (a=b=1), the mean distance between
the two random points is 
	{sqrt(2) + 2 + 5 log(1+sqrt(2))}/15,
which is approximately 0.52141,  ... .
Formulas for mean distances associated with other convex planar sets are
given in Santalo[2].  As far as I know, no analogous exact formulas exist
for convex sets in higher dimensions - simulation may indeed be the only
approach to determine distributions/moments for, e.g., the cube or the sphere.
			References
[1] B. Ghosh, Random distances within a rectangle and between two rectangles,
    Bull. Calcutta Math. Soc., 43 (1951) 17-24.
[2] L. Santalo, Integral Geometry and Geometric Probability, Addision-Wesley,
    1976, p.49.
At least two more recent books on stochastic geometry have recently appeared
but I cannot recall further information about them.   ...  Steve Finch
--
Ronald Crosier    E-mail: 
Disclaimer: My opinions are just that---mine, and opinions.
If you have a good idea, be patient---it will go away.
Return to Top
Subject: Re: What is the difference between chaotic and random?
From: aacbrown@aol.com
Date: 14 Nov 1996 13:56:40 GMT
randrew@teton.UVic.CA (Rex  Andrew) in <56d440$3o82@uvaix3e1.comp.UVic.CA>
writes:
> I haven't been involved much in chaos, but Aaron's
> comments on this thread led me to immediately wonder
> if a correlation function of a time series would distinguish
> a random time series from a chaotic one. Surely this would
> work to distinguish between an IID process and one with a
> distinctive phase space map?
Unfortunately, no. The correlation function is based on linear association
only, chaotic time series will have non-linear dependence.
In principal you could get around this by using a more sophisticated
measure of association. But another feature of chaotic systems is that
even a small amount of random noise will have a large effect on your data.
Therefore you are unlikely to get reliable results from this approach.
In some cases the dependence is so complicated, or the system so unstable,
that the series must just be treated as random. However, many chaotic
systems have representations that are reasonably simple and not sensitive
to random noise. If you can find one and figure it out, then you can make
better predictions.
Aaron C. Brown
New York, NY
Return to Top
Subject: Re: Bonferroni's Method
From: aacbrown@aol.com
Date: 14 Nov 1996 14:02:43 GMT
ANGEL MARTINEZ BARAMBIO  in
<3289C215.4072@QUIMICA.URV.ES> writes:
> My question is: 'at least' means that the true joint
> confidence coefficient for the k simultaneous stataments,
> should be lower than the result of dividing the indivdual
> statemnet confidence coefficient by the k number of
> simultaneous statements?. If so: How much lower should it be?
The joint confidence must be at least equal to the sum of the individual
confidences minus k-1 (or zero, if that quantity is negative). At most it
can equal the minimum of any of the individual confidences.
If the k statements are independent then the joint confidence is the
product of the individual confidences; this will be between the lower and
upper bounds set above.
Aaron C. Brown
New York, NY
Return to Top
Subject: Re: Correction to: Re: Bounds on variances
From: aacbrown@aol.com
Date: 14 Nov 1996 15:13:31 GMT
Ellen Hertz  in <32892767.317C@access.digex.net>
writes:
> If you have a big enough sample, around 30, so that you can
> use  normal approximation, you can say the sum over i of
> (Xi-Xbar)^2  is approximately  sigma^2 times a variable that
> is chi square on n-1 degrees of freedom. Call it Y. Since a
> chi square on k df has mean k and variance 2*k,
> P(n-1-1.96*sqrt(2*(n-1)) <= Y/sigma^2 <=n-1+1.96*sqrt(2*(n-1))
> =~ .95. Then an approximate 95% confidence interval for
> sigma^2 is (Y/(n-1+1.96*sqrt(2*(n-1)), Y/(n-1-1.96*sqrt(2*(n-1)).
I'm afraid you've piled too many asymptotics on top of each other here.
The original question specified "unknown distribution"; for some
distributions this approach will not work even with an infinite sample
size. Even with a well-behaved distribution and a sample size of 1,000;
there intervals will be wildly incorrect. Instead of a 5% chance of being
outside the interval you will have a 0.2% chance with a uniform
distribution and a 32% chance with an exponential.
The statement that a sample size of 30 allows you to use the Normal
approximation should be used at each approximation step. You invoke the
Normal approximation three times so a better guess for the minimum sample
size is 30^3 or 27,000. This is obviously just a rough guess for
well-behaved distributions but it is a better guess than 30.
The biggest error comes in approximating the chi-square distribution by a
Normal. You would do better to approximate it as a Normal-squared. That is
a chi-square with k degrees of freedom (k>30) is approximately half the
square of a Normal deviate with mean (2k-1)^.5 and variance 1.
The 95% confidence interval for the chi-square should be about
0.5*[(2k-1)^.5-2]^2 to 0.5*[(2k-1)^.5+2]^2 or k+1.5-2*(2k-1)^.5 to
k+1.5-2*(2k-1)^.5. This will give considerably better results, although it
will still only work for large samples from well-behaved distributions.
Aaron C. Brown
New York, NY
Return to Top

Downloaded by WWW Programs
Byron Palmer