Newsgroup sci.stat.math 11438

Articles

Several posters have recently asked about generating correlated
random variables where the bivariate distributions are not normal.
This is a problem I have also encountered, so I got to
wondering about it again.  Below I suggest an approach (an
elaboration of Frechet's method) that could be used, and I would
like to ask readers of this group if it seems reasonable.
Assume you want to generate X and Y with specified marginal
distributions and some degree of correlation.  Assume further
that you have very little idea about what underlying mechanisms
cause the correlation, so you just want a general approach
that produces the right marginals and lets you vary the
strength of the correlation systematically across simulations.
I suggest the following procedure to generate positively
correlated variables:
  1. Generate X randomly from its desired marginal distribution.
  2. Look up the percentile of X, P_x, within its marginal distribution.
  3. Form a beta distribution with a mean P_x and
     variance V (V is discussed below).
  4. Draw a random number R from this beta distribution.
  5. Find Y at the percentile R of its marginal distribution.
  6. Take the (X,Y) pair as one bivariate observation.
Comments:
---------
In different simulations, the researcher varies the parameter V
to control the strength of the correlation: As V decreases,
the correlation increases.
To generate negatively correlated variables, use a beta
distribution with a mean of 1-P_x instead of P_x.
Questions:
----------
Would this method really generate the correct marginal
distribution for Y?  (I think so, but am not certain.)
Is it stupid and/or naive to use the beta distribution
to induce a correlation in this fashion?  (I guess the
quesion here is whether this procedure implements a
reasonable model of correlation.  Perhaps this
question is not answerable in general, but it seems
reasonable to ask it because I'd like a method to be
used in cases virtually nothing is known about
the bivariate structure.)
Thanks for any comments.
Jeff Miller
Dept of Psychology
Univ of Otago
Dunedin, New Zealand
miller@otago.ac.nz    http://jomserver.otago.ac.nz/

Return to Top

Subject: Re: correlated non-normal variables
From: T.Moore@massey.ac.nz (Terry Moore)
Date: 25 Oct 1996 00:26:42 GMT

In article <54olmc$ipo@celebrian.otago.ac.nz>, Jeff@   (Jeff Miller) wrote:
  Perhaps this
> question is not answerable in general, but it seems
> reasonable to ask it because I'd like a method to be
> used in cases virtually nothing is known about
> the bivariate structure.)
As has been pointed out in this thread, specifying
the marginals and the correlation is not sufficient
to determine the joint distribution. So if virtually
nothing is known, virtually nothing can be done.
You may obtain the right marginals and correlation,
but you will never know if you have the right
distribution.
Terry Moore, Statistics Department, Massey University, New Zealand.
Imagine a person with a gift of ridicule [He might say] First that a
negative quantity has no logarithm; secondly that a negative quantity has
no square root; thirdly that the first non-existent is to the second as the
circumference of a circle is to the diameter. Augustus de Morgan

Return to Top

Subject: Re: Good Probability Books?
From: Ellen Hertz
Date: Thu, 24 Oct 1996 20:44:51 -0400

G. Chan wrote:
> 
> HI!  I'm taking a first year statistics course and were doing a lot of
> probabilities and I having a hard time learning this material.  So I was
> wondering if anyone can refer me to some really good books that can help
> me a lot or any computer software that will help me with learning of
> probabilities.  Thank you so very much.
> 
> G.Chan
> 
Introductory Probability and Statistical Applications by
Paul L. Meyer has good, clear coverage of the probability 
concepts you would need in a first year statistics course.

Return to Top

Subject: Re: Regression Question
From: Ellen Hertz
Date: Thu, 24 Oct 1996 20:23:57 -0400

Keith Willoughby wrote:
> 
> I have developed a regression model to examine those game
> statistics which are important in determining victory for sports
> teams.
> 
> I have examined three different teams in my model (call them
> teams A,B and C).  These teams play many games during a "season",
> but some of their games will be against each other.  Thus, there
> seems to be a certain "dependency" within the data (ie. if team A
> is playing team B, then if team A wins, team B loses).
> 
> How can I include this dependency factor within my regression
> model?
> 
> Thanks for any and all assistance.
> 
> Regards,
> 
> Keith Willoughby
> kwilloug@acs.ucalgary.ca
You might consider a 1-1 conditional logistic regression model in
which the stratum is the game, the case is the winner and the control 
is the loser. A good reference for this is Hosmer and Lemeshow.

Return to Top

Subject: Density of sum of truncated normals?
From: auld@qed.econ.queensu.ca
Date: 24 Oct 1996 16:24:34 -0500

Consider a random variable 
	x_t = {           k + (\sigma)e_t     if e_t > (-k / (\sigma))
      	      {                 0             otherwise,
where k > 0 is a contant, e_t is an iid standard normal, and \sigma is the
standard deviation.  Each x_t, then, is a normal with truncation point
(-k/(\sigma)). 
I only observe the sum of T such variables, and need to evaluate the
probability
		Pr( \sum_{t=1}^T x_t = X ).
I can write this probability as a T-1 dimensional integral over the
densities of the truncated normals, but this is compututionally burdensome
to evaluate, even using a simulation estimator (these probabilities are
contributions to a likelihood function which will have to be evaluated
many times). 
I suspect, given the iid assumption, that there must be a way of
expressing the density of the sum of these variables in a convenient
manner.  I wonder if anyone could point me to a reference which contains
such an expression, or, alternately, has any thoughts about how to quickly
numerically evaluate the probability.  Accuracy is not extremely important
in my application -- an approximation would do. 
Thanks, Chris.
-- 
Chris Auld                               Department of Economics
Internet: auld@qed.econ.queensu.ca       Queen's University
Office:   (613)545-6000 x4398            Kingston, ON   K7L 3N6

Return to Top

Subject: Re: Looking for a better estimator for a simple expectation
From: "Robert E Sawyer"
Date: 25 Oct 1996 01:33:22 GMT

Hein Hundal  wrote in article
<326D15DA.4898@kincyb.com>...
| ...
| ... the question is "Is there a better estimate for the expectation than
| the average?"  
| ...
When asking for "the best" estimator, you should realize
that the discussion will depend on our *criterion* for 
"best"; also, the fact that such a criterion may not be
unique should lead us to suspect that whatever is "best"
according to one may not be so according to another.
I won't address the exact problem you posed, but here's
one that is simpler and makes the point:
Suppose we observe x1, x2,..., xn independent 0/1 random
variables, each having pr(xi=1)=p, pr(xi=0)=q=1-p.
Then the population mean is E(xi)=p, and *may* be estima-
ted by the sample mean m=(x1+x2+...+xn)/n; however, it 
may also be estimated by M=A*m, where 0 1, i.e. M ~> m, as n -> infinity.)
-- 
Robert E Sawyer (soen@pacbell.net)

Return to Top

Subject: Re: Geometric and harmonic means
From: "Pavel E. Guarisma"
Date: Wed, 23 Oct 1996 23:45:03 -0400

Michael Kamen wrote:
> 
> John,
> 
> I do not know of any specific references, and perhaps I am
> oversimplifying your question, but it seems to me that the
> rationale for using a geometric mean would be that the data was
> from a geometric distribution.  I am not sure what a harmonic
> distribution is so can't comment.  Am I warm?
> 
> Regards,
> Michael
Rather chilly I would say...
The geometric and harmonic means are special means that suit certain
phenomena.
The harmonic mean of nonzero numbers x1,x2,.....,xn is
      n
h=n/(SUM(1/xi))
     i=1
It is mostly used for averaging speeds.
The geometric mean of nonzero numbers x1,x2,.....,xn is
   n
g=PROD(xi)^(1/n)
  i=1
It is mostly used for averaging ratios of numbers.
Reference: "Probability, Statistics, and Queueing Theory with Computer
Science Applications" Arnold Allen, Academic-Press
Hope this helps,
-- 
Visit my ALL NEW homepage: http://www4.ncsu.edu/~peguaris/WWW/
************************************************************************
* Pavel E. Guarisma N.                   Raleigh, N.C.                 *
* Operations Research Graduate Program   e-mail: peguaris@eos.ncsu.edu *
* College of Engineering                 Phone: (919)-512-9471         *
* North Carolina State University                                      *
************************************************************************

Return to Top

Subject: Re: Geometric and harmonic means
From: "Pavel E. Guarisma"
Date: Wed, 23 Oct 1996 23:45:03 -0400

Michael Kamen wrote:
> 
> John,
> 
> I do not know of any specific references, and perhaps I am
> oversimplifying your question, but it seems to me that the
> rationale for using a geometric mean would be that the data was
> from a geometric distribution.  I am not sure what a harmonic
> distribution is so can't comment.  Am I warm?
> 
> Regards,
> Michael
Rather chilly I would say...
The geometric and harmonic means are special means that suit certain
phenomena.
The harmonic mean of nonzero numbers x1,x2,.....,xn is
      n
h=n/(SUM(1/xi))
     i=1
It is mostly used for averaging speeds.
The geometric mean of nonzero numbers x1,x2,.....,xn is
   n
g=PROD(xi)^(1/n)
  i=1
It is mostly used for averaging ratios of numbers.
Reference: "Probability, Statistics, and Queueing Theory with Computer
Science Applications" Arnold Allen, Academic-Press
Hope this helps,
-- 
Visit my ALL NEW homepage: http://www4.ncsu.edu/~peguaris/WWW/
************************************************************************
* Pavel E. Guarisma N.                   Raleigh, N.C.                 *
* Operations Research Graduate Program   e-mail: peguaris@eos.ncsu.edu *
* College of Engineering                 Phone: (919)-512-9471         *
* North Carolina State University                                      *
************************************************************************

Return to Top

Subject: Re: probability is relativistic
From: Christopher McKinstry
Date: Thu, 24 Oct 1996 23:17:08 -0500

Dr Michael Mattes wrote:
> I don't think so. If i would look at a fast moving passenger, i would
> see "his time" passing by very slow. Thats "his" proper time.  On the
> other hand, if he looks at my clock, he would see a very fast running
> clock. Thats "my" proper time. And when he returns (e.g. after a year
> in ** his ** reference system) for me (in ** my ** reference system
> there have passed a lot of more than one year (e.g. some millions
> year, if he was travelling at near light speed).  This means ** I **
> would be able to write much more digits than the moving passenger.
> 
> > Unless you�re moving at the speed of light you can�t generate a random
> > number.
You're exactly right... I inverted the situation in my last statement...
I stand corrected.

Return to Top

Subject: Re: Geometric and harmonic means
From: "Pavel E. Guarisma"
Date: Wed, 23 Oct 1996 23:49:48 -0400

Pavel E. Guarisma wrote:
> 
> The geometric mean of nonzero numbers x1,x2,.....,xn is
>    n
> g=PROD(xi)^(1/n)
>   i=1
> 
Oooops!! The geometric mean should be
    n
g=(PROD(xi))^(1/n)
   i=1
-- 
Visit my ALL NEW homepage: http://www4.ncsu.edu/~peguaris/WWW/
************************************************************************
* Pavel E. Guarisma N.                   Raleigh, N.C.                 *
* Operations Research Graduate Program   e-mail: peguaris@eos.ncsu.edu *
* College of Engineering                 Phone: (919)-512-9471         *
* North Carolina State University                                      *
************************************************************************

Return to Top

Subject: Re: Q: How to linearize a nonlinear stochastic ODE?
From: RVICKSON@MANSCI.uwaterloo.ca (Ray Vickson)
Date: Thu, 24 Oct 1996 20:54:27 GMT

In article <54k02g$r5a@hobyah.cc.uq.oz.au>,
   Rodney Beard  wrote:
>Hi, I have a problem I need to linearize a nonlinear stochastic ODE 
of Ito
>type?
>Does anyone have any ideas? It's actually a system of coupled ODE's 
with state
>dependent diffusion. Can I just Taylor expand it?
This strategy may be questionable. A good source (which, despite
it title, is mostly theoretical) is the book _Numerical Solution of 
Stochastic Differential Equations_, by P.E. Kloden and E. Platten, 
Apps. of Math. Seriew, Vol. 23, Springer-Verlag, 1992.
Ray
>
>Rodney Beard
>University of Queensland
>
------------------------------------------------------------------------------
R.G. Vickson
Department of Management Sciences
University of Waterloo
(519) 888-4729

Return to Top

Subject: Re: Looking for a better estimator for a simple expectation
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 25 Oct 1996 09:01:57 -0500

In article <326E38A7.67EC@pobox.org.sg>,
Helene Thygesen   wrote:
>Hein Hundal wrote:
>>    Once again I am seeking some knowledge and/or a reference to confirm
>> a rumor.  I am conducting an experiment with the following possible
>> outcomes: {-11, -9, -7, -5, ..., 9, 11} (odd numbers between -12 and
>> 12.)  I will perform the experiment approximately 5000 times.  
>> During the interview the
>> applicant mentioned that there was a better estimator available for this
>> kind of experiment.  I didn't believe him at the time, but then he
>> mentioned a paper that he had written on the subject, so I took his word
>> for it.  I really wish I had written down the paper's title because a
>> better estimator would be very useful for me.
>> So the questions is "Is there a better estimate for the expectation than
>> the average?"  
>Depends what you mean by "better" and what you know about the
>distribution.
If you know that the probabilities of the 12 outcomes are equal,
the expected value is 0 and the variance is 143/3.  If you do 
not, it is hard to beat the average.
>If you expect many "outliers" a robust estimator such as the median may
>be better than the average.
If you are interested in the expectation, unless you are willing to
believe that the distribution is symmetric about its expectation, this
is very unlikely to be correct.  This is especially the case if you
expect outliers which are still correct observations; the outliers may
be the most important part of the distribution.  Insurance is an industry
which operates this way.
But in the above experiment, if the sample size is odd, and more and 
more often for large samples if it is even, the median will be one of
the 12 possible values.  This unlikely to be a good estimator of the
expected value.
To summarize, unless there is real information about the distribution,
or there are erroneous outliers, the expected value seems to be as good
as anything.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Re: Joint PDF of Order Statistics
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 25 Oct 1996 09:10:14 -0500

In article <54leut$j4i@oravannahka.Helsinki.FI>,
Marko Tervio  wrote:
>I need to know the joint PDF of 2 or 3 order statistics.
>The problem is that the covariance matrix is not diagonal.
>The basic question is about a 5-vector of multinormally distributed 
>random variables with means 0, variances 1 and covariances 0.5.
>In the end I'm trying to find out what is the best estimate for the 
>unknown parameter V, when we know the 1st, 4th and 5th biggest
>observation from a 5-vector that we know is drawn from a multinormal
>distribution with means V, variances V/10 and covariances V/20 (so two 
>observations remain unknown). Then how does this estimate change when
>the 3rd biggest observations is revealed to us.
For the case of the normal distribution, with equal variances and
covariances, this can be done numerically.  Looking at the model
by adding an unobservable variable points it out.
So let Z_i be jointly normal with constant mean m, all variance
equal to V, and all covariances equal to C > 0.  Then the Z_i 
can be written as A + X_i, all independent, a being normal with
mean m and variance C, and the X_i having mean 0 and variance V-C.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Re: Bias and Variance of Eigenvalues of a Covariance Matrix?
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 25 Oct 1996 09:32:55 -0500

In article <54mmba$bld@news.alaska.edu>,
Eric Breitenberger  wrote:
>Hello, all:
>I am using principal components in my work (meteorology) and I have
>some questions regarding the sampling variability of the results.
>I'm currently using formulas for the bias and variance of the 
>eigenvalues which I got from a paper by Lawley (1956, Biometrika 43,
>128-136). These formulas are good to O(1/n^2) for bias and O(1/n^3)
>for variance. 
>My covariance matrices are usually of dimension ~500, with about
>1000 samples, and Lawley's formulas often give me negative variances
>for many eigenvalues. This disturbs me as I am not sure whether I
>am somehow misusing the formula or if there is some other explanation.
>In addition to the problems with the variance, I often find that my
>_leading_ sample eigenvalues are consistently lower than the Lawley 
>formulas would indicate they should be. I have checked my code 
>carefully, including comparisons with some published results, and
>buggy code does not appear to be the problem.
>If anyone is still with me at this point (!) I would really appreciate
>some help with this. Does anyone know of other good references on
>this subject?
I would look in such things as the proceeding of multivariate conferences
for such.  
The results are usually for the case of purely random Wishart matrices
with the true covariance matrix being diagonal.  The asymptotics are
based upon the assumption that the sample size is much larger than the
dimension, and there is a feature of ratio as well as difference,
particularly for the intermediate and lower roots.  You probably do
not have this type of data, and the distribution of what you get may
be far off.
If the leading roots are smaller, this can be explaiend in many ways
from the real model.  I would doubt that real world data is close 
enough to the standard normal assumptions for validity of what you
seem to be trying to do.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Re: answer needed!!!
From: aacbrown@aol.com (AaCBrown)
Date: 25 Oct 1996 13:16:02 -0400

 jtahara@chat.carleton.ca (James Tahara) in
<54p6n2$mnu@bertrand.ccs.carleton.ca> asks:
> What is the probability of being dealt 2 cards which are an
> Ace; and a King, a Queen, a Jack or a ten?
There are two ways for this to happen: Ace followed by K, Q, J or 10 and
K, Q, J, or 10 followed by Ace.
In the first case your first card must be 1 of 4, your second 1 of 16;
this means there are 4x16 or 64 ways to get it. The second case is another
64 ways for a total of 128 different ways to get your blackjack.
There are 52x51=2,652 different ways to deal two cards. So the probability
of a blackjack is 128/2,652 = 4.83%.
Aaron C. Brown
New York, NY

Return to Top

Subject: Re: 0.5*infinity
From: aacbrown@aol.com (AaCBrown)
Date: 25 Oct 1996 13:21:01 -0400

mercurio@flagstaff.princeton.edu (Matthew G. Mercurio) in
<54ltnb$ei3@cnn.Princeton.EDU> writes:
> Okay.  Now a number of guests equal to the number of real
> numbers between 0 and 1 arrive at the hotel.  All of a sudden,
> the inn is full!! Not even 1% of the new guests can be accomodated!
> I guess they just don't make infinite hotels as infinite as they used
too  :)
It's true that the guests cannot be accomodated. But we can line them up
in the corridor in such a way that there are an infinite number of empty
rooms between any two of the new guests. But there still are not enough
rooms for all of them to have one.
Aaron C. Brown
New York, NY

Return to Top

Subject: Looking for Matlab Bootstrap code
From: Ashutosh Sabharwal
Date: Fri, 25 Oct 1996 14:16:41 -0400

Hi,
I am looking for any general purpose 
resampling code in matlab (public domain
code would be great !!).
Thanks for help in advance.
Cheers
-- 
Saby

Return to Top

Subject: Help adding Gaussian Noise
From: Chris Kirkham
Date: Fri, 25 Oct 1996 12:01:48 +0100

Can anyone explain the mechanism for generating gaussian noise to add to
data? I have read a lot of people quoting "we added gaussian noise to
the inputs" but never an explanation of how this is achieved (i.e. how
to generate it in my software program).
I am sure it is straight forward, and I could probably make an educated
guess as to how it is done, but I would like a definitive answer that
the scientific and mathematical world is happy with.
Thanks in advance for your help. I would prefer if you could email the
answer to me if possible
Regards
-- 
Chris Kirkham
Centre for Neural Computing Applications
Brunel University, Runnymede Campus
Egham, Surrey, TW20 0JZ. UK.
Tel: +44 (0)1784 431341 x270
Fax: +44 (0)1784 472879
mailto:Christopher.Kirkham@brunel.ac.uk
http://www.brunel.ac.uk/research/cnca

Return to Top

Subject: Re: Help adding Gaussian Noise
From: Jim Hunter
Date: Fri, 25 Oct 1996 15:23:33 -0400

Chris Kirkham wrote:
:> 
:> Can anyone explain the mechanism for generating gaussian noise to add
to
:> data? I have read a lot of people quoting "we added gaussian noise to
:> the inputs" but never an explanation of how this is achieved (i.e.
how
:> to generate it in my software program).
:> 
:> I am sure it is straight forward, and I could probably make an
educated
:> guess as to how it is done, but I would like a definitive answer that
:> the scientific and mathematical world is happy with.
:> 
Just like you suspected.
Pick a variance s, generate a sequence a iid gaussion numbers, N(s,0),
and add them to the data. Unless told otherwise it's generally taken
for granted that the noise is zero-mean and uncorrelated.
Jim

Return to Top

Subject: Re: Principal components analysis
From: jonyi@elysium.mit.edu (Jon Rong-Wei Yi)
Date: 25 Oct 1996 20:23:31 GMT

In article <54lvj8$8mu@hobbes.cc.uga.edu>,
Michael Covington  wrote:
>Is there a free program or source code available somewhere to do
>principal components analysis?  I am asking on behalf of a colleague
>and am not familiar with principal components analysis myself, but
>I understand it has to do with finding the eigenvectors of a matrix.
given a covariance and correlation matrix, the rotation matrix is
std_dev*eigenmatrix*eigenvalues^0.5
essentially, the std_dev matrix is a diagonal matrix with std deviations
down the diagonal.  the eigenmatrix is a matrix where the columns are
the eigenvectors.  the last term is a diagonal matrix where the entries
of the sqrt roots of the corresponding eigenvalues.
you can use matlab to calculate this rotation matrix.
jon

Return to Top

Subject: recommendations for SPSS alternative needed
From: masalope@nyx10.cs.du.edu (Mark Salopek)
Date: 25 Oct 1996 16:28:37 -0600

Can anyone recommend a good alternative for SPSS, preferably shareware or
public domain for the DOs/Windows platform?
Mark

Return to Top

Subject: Skewed distributions for Don Philp
From: Roger D Metcalf DDS
Date: Wed, 23 Oct 1996 21:07:32 -0500

Don Philp (procass@wt.com.au) wrote asking about how to plot
a theoretical distribution given values for mean, var, skewness, 
and kurtosis....I forwarded his message to the Mathematica newsgroup
and received a response from Sherman Reed, who taught me Mathematica
in an electrical engineering class at U Tex Arlington---jeez, I should
have known that he would have an idea how to do this! In the book
"Econometric and Financial Modeling with Mathematica," Hal Varian ed.,
there is a chapter by Robert Korsan on "Decision Analytica: An Example
of Bayesian Inference and Decision Theory Using Mathematica" where Korsan
implements a Mathematica package for plotting Johnson Distributions
(NL Johnson and S Kotz, 1970, _Distributions_in_Statistics:_Continuous
Univariate_Distributions_1_, Wiley). From what I gather this is a sort
of "universal" distribution that takes four arguments (gamma, delta,
xi, lambda) though from messing around with the package they don't seem
to be *exactly* analogous to (mean, var, skewness, kurtosis)...not by a
long shot! Anyway, now will go dig around for the Johnson book....for 
the many here who must be familiar with Johnson distributions, PLEASE
correct me as needed, as this is just a *bit* out of my area of expertise....
--Roger
-- 
--Roger D Metcalf DDS
--metcalf@mem.po.com  or  metcalf@startext.net

Return to Top

Subject: Re: Skewed distributions for Don Philp
From: kbutler@sfu.ca (Kenneth Butler)
Date: 25 Oct 1996 22:19:23 GMT

Roger D Metcalf DDS  writes:
>Korsan
>implements a Mathematica package for plotting Johnson Distributions
>(NL Johnson and S Kotz, 1970, _Distributions_in_Statistics:_Continuous
>Univariate_Distributions_1_, Wiley). From what I gather this is a sort
>of "universal" distribution that takes four arguments (gamma, delta,
>xi, lambda) though from messing around with the package they don't seem
>to be *exactly* analogous to (mean, var, skewness, kurtosis)...not by a
>long shot! 
I don't believe they are; I think they have more to do with deciding 
which shape the distributions are (since Johnson distributions have a 
variety of shapes), and the mean...kurtosis  ("moments", for short) 
depend on the arguments in some non-obvious fashion.
You might also want to think about the Pearson family of distributions: 
like the Johnson family, these can take a wide variety of shapes, and the
four parameters that index them are the first four moments, from the mean
to the kurtosis. Common distributions such as normal, chi-squared, F and
beta are part of the Pearson family.  WP Elderton and (presumably) the
same Johnson wrote a book called "Systems of Frequency Curves" in which
both the Pearson and Johnson families are discussed (predominantly the
former). The book looks old-fashioned now (with pre-computer computational
details), but it's still correct. 
If your sole aim is to plot the density functions of the Pearson family, 
Elderton and Johnson give these for the various different forms and you 
can plug them into Mathematica. Calculation of probabilities is another 
story, however.
Either family will produce you *a* distribution with the desired mean, 
variance etc., but of course there are (infinitely) many other 
distributions with the same first four moments, but different 5th, 
6th,... moments, some of which may look very different to the one you found.
Cheers,
Ken.
--
----------------Boring factual .sig:
Ken Butler, Dept. of Mathematics and Statistics 
Simon Fraser University, Burnaby, B.C., Canada V5A 1S6;
kbutler@sfu.ca; http://www.sfu.ca/~kbutler/

Return to Top

Subject: Re: correlated non-normal variables
From: Ellen Hertz
Date: Fri, 25 Oct 1996 18:38:11 -0400

Jeff Miller wrote:
> 
> Several posters have recently asked about generating correlated
> random variables where the bivariate distributions are not normal.
> This is a problem I have also encountered, so I got to
> wondering about it again.  Below I suggest an approach (an
> elaboration of Frechet's method) that could be used, and I would
> like to ask readers of this group if it seems reasonable.
> 
> Assume you want to generate X and Y with specified marginal
> distributions and some degree of correlation.  Assume further
> that you have very little idea about what underlying mechanisms
> cause the correlation, so you just want a general approach
> that produces the right marginals and lets you vary the
> strength of the correlation systematically across simulations.
> 
> I suggest the following procedure to generate positively
> correlated variables:
> 
>   1. Generate X randomly from its desired marginal distribution.
>   2. Look up the percentile of X, P_x, within its marginal distribution.
>   3. Form a beta distribution with a mean P_x and
>      variance V (V is discussed below).
>   4. Draw a random number R from this beta distribution.
>   5. Find Y at the percentile R of its marginal distribution.
>   6. Take the (X,Y) pair as one bivariate observation.
> 
> Comments:
> ---------
> In different simulations, the researcher varies the parameter V
> to control the strength of the correlation: As V decreases,
> the correlation increases.
> 
> To generate negatively correlated variables, use a beta
> distribution with a mean of 1-P_x instead of P_x.
> 
> Questions:
> ----------
> Would this method really generate the correct marginal
> distribution for Y?  (I think so, but am not certain.)
> 
> Is it stupid and/or naive to use the beta distribution
> to induce a correlation in this fashion?  (I guess the
> quesion here is whether this procedure implements a
> reasonable model of correlation.  Perhaps this
> question is not answerable in general, but it seems
> reasonable to ask it because I'd like a method to be
> used in cases virtually nothing is known about
> the bivariate structure.)
> 
> Thanks for any comments.
> 
> Jeff Miller
> Dept of Psychology
> Univ of Otago
> Dunedin, New Zealand
> 
> miller@otago.ac.nz    http://jomserver.otago.ac.nz/
I couldn't prove or disprove that Y has the right cdf. It
would be equivalent to showing that R is uniform on (0,1)
since step 5. is taking G_inverse(R) where G is the
desired cdf for Y. 
However, if instead of a beta distribution we let V = 0,
so that R = F(X) with probability 1, the method would result
in Y = G_inverse(F(X)) where F is the desired cdf for X.
That and the fact that Y = G_inverse(T) would work whenever
T is uniform on (0,1) suggests the following approach,
which may or may not be related to yours:
1. Generate X from the distribution with cdf F.
2. Generate Z from a uniform distribution on (0,1)
independently of X.
3. Let T = F(X) with probability p and Z with 
probability 1-p.
4. Let Y = G_inverse(T).
The resulting joint distribution of X and y or even
their correlation coefficient is not obvious but we can say that
this will result in the desired marginals. If p = 0 then
X and Y will be independent. If p = 1 and F = G then
Y = X. Also, 1-F(X) can be substituted for F(X) in 3. to
get Y that tends to decrease with X.
Very interesting question.
Ellen

Return to Top

Subject: Re: Request for a stats/math word processor
From: Markus Jantti
Date: 25 Oct 1996 19:44:41 -0400

pd@kubism.ku.dk (Peter Dalgaard BSA) writes:
> 
> 
>    4) Tables etc are somewhat better built in WP6.1 than Tex
>       (steels himself for flames....)  [debatable matter of opinion]
>       and perhaps tables also play some role in a stats report?
> 
> No flames from here... They really can be a pain, although the gurus
> have been getting the act quite a bit better together in
> LaTeX2e. (This also reflects on a design problem with TeX: There's no
> easy way to hook in external programs that could work as user-friendly
> front ends. It's kind of stuck in the batch-processing paradigm.) Of
> course, if you're clever enough, you just persuade you statistics
> program to spit out LaTeX code.
> 
It is, of course debatable what i means to have an easy to use
program. It seems to me, however, that by using tools such as perl and
gnuplot combined with a statistical package to create e.g. tables and
figures directly for LaTeX, you end up having a very user-friendly way
of generating and revising documents. I am currently writing a largish
literature review, summarizing lots of numbers and one perl script
produces all the tables and figures I want from a database on
results. 
No typing of numbers and references (which, BTW, are easily picked
using BIBTEX) in WYSIWYG tables.
Moreover, maybe MSWORD and Wordperfect do now do this, but it would
surprise me if they had decent control over floating tables and figures.
So, LaTeX and BIBTEX , combined with aucTeX and Emacs as well as
useful perl scripts are all I wish for to make my texts.
regards,
markus
-- 
Markus Jantti				|	Department of Economics
markusj@econ.lsa.umich.edu		|	University of Michigan
http://www.abo.fi/~mjantti		|	611 Tappan St
					|	Ann Arbor, MI 48109-1220
+1-313-997 0525 (Home/Voice)		|	+1-313-763 2254	(Office/Voice)
					|	+1-313-764 2769	(Office/Fax)

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.math 11438

Directory

Articles