Newsgroup sci.stat.math 11602

Articles

Subject: Re: Meta-Analysis ??
From: caragaki@ucla.edu (Corinne Aragaki)
Date: Wed, 06 Nov 1996 10:33:35 GMT

"Janine M. Mylett M.D."  wrote:
>I am trying to learn about the procedures involved in performing a
>Meta-Analysis. However, no seems to be able to direct me to a good
>source. Does anyone have any ideas??  Any help would be appreciated!!
You could start with:
Greenland S.  Quantitative methods in the review of epidemiologic 
literature.  Epidemiologic Reviews 1987;  9:1-30.

Return to Top

Subject: Bias in bootstrapings
From: reno@pop.bio.aau.dk (Reno Lindberg)
Date: Wed, 06 Nov 1996 12:10:42 +0100

I have studied the population of palm trees in Ecuador. Through a
projection matrix i found the population growth rate (Lambda, Dominant
eigenvector) to be 0.9629. I made a bootsrap analysis on the
population(5000 resamplings of the population with all parameters), and
found the average growthrate to be 0.9484 (with the confidence intervals
of 0.89685 and 0.98974). There is a bias of 0.0145 between the calculated
growthrate (or observed (Lambda(o))) and the growthrate found through the
bootstraping. Can I say that my population is declining because my upper
limit is under 1.0 or can't I tell because if the difference between the
upper limit and Lambda(b) added to Lamda(o) gives me an upper limit of
1.004. How to implement this bias into my results? And what do the bias
tell me about the structure of my sample and the population?
It is big quetions, but what I ask for is just som ideas!
reno@pop.bio.aau.dk

Return to Top

Subject: Re: Bias in bootstrapings
From: Rodney Sparapani
Date: Wed, 06 Nov 1996 09:16:48 -0500

Reno Lindberg wrote:
> 
> I have studied the population of palm trees in Ecuador. Through a
> projection matrix i found the population growth rate (Lambda, Dominant
> eigenvector) to be 0.9629. I made a bootsrap analysis on the
> population(5000 resamplings of the population with all parameters), and
> found the average growthrate to be 0.9484 (with the confidence intervals
> of 0.89685 and 0.98974). There is a bias of 0.0145 between the calculated
> growthrate (or observed (Lambda(o))) and the growthrate found through the
> bootstraping. Can I say that my population is declining because my upper
> limit is under 1.0 or can't I tell because if the difference between the
> upper limit and Lambda(b) added to Lamda(o) gives me an upper limit of
> 1.004. How to implement this bias into my results? And what do the bias
> tell me about the structure of my sample and the population?
> It is big quetions, but what I ask for is just som ideas!
> 
> reno@pop.bio.aau.dk
Take a look at:
Better Bootstrap Confidence Intervals, by Bradley Efron,
JASA March '87, Vol. 82, No. 397, pp. 171-185
Rodney Sparapani

Return to Top

Subject: Re: Moments & Standard Normal
From: aacbrown@aol.com
Date: 6 Nov 1996 14:38:46 GMT

"Jim Weeks"  in <01bbcb53$a129dce0$39968781@titleist>
writes:
> I need to be able to determine the Even Order Moments
> of a Standard Normal Dist..
The kth moment of a standard Normal is:
k! 2^(-k/2) / (k/2)!
for k even (the odd moments are all zero of course). So, for example, the
fourth moment is 4! 2^-2/2! = 24 * .25 / 2 = 3.
Aaron C. Brown
New York, NY

Return to Top

Subject: Re: probability of drawing one's own name in a drawing
From: Helene Thygesen
Date: Wed, 06 Nov 1996 12:52:40 +0100

Per Frendahl wrote:
> Assuming 10 guests:
> 
> Prob = 1 - 1 + (1/2!) - (1/3!) + ... + (1/10!) =approx= 1/e
> Further:
> This gives that the probability that AT LEAST ONE OF TEN GUESTS GET THEIR OWN
> PRESENT =approx= 1 - 1/e =approx= 0.63212. (Which is greater than 1/2.)
Since the probability that a particular guest draws his own name is 1/n,
the expected number of huests who do so is 1. The number fo guests who
do
so is not excactly binomialy distributed since the individual events are
not
independant, but for a larg n it's a fair approximation.
The limit 1/e is p(0) in the poisonian distribution.
-- 
Helene Thygesen
Hogelanden WZ 17
3552 AB Utecht
The Netherlands
+31(0)654 655 631
mailto:helene@pobox.org.sg
http://www.pobox.org.sg/~helene

Return to Top

Subject: alpha-stable pdf
From: eek@eng.cam.ac.uk (E.E. Kuruoglu)
Date: 6 Nov 1996 15:43:40 GMT

Hello,
I am looking for a program that can generate alpha-stable 
probability density function. I have a program doing it
via taking the inverse Fourier transform of the characteristic
function. I am interested in other alternative approaches.
Thank you very much for your help,
Ercan Kuruoglu
Signal Processing Laboratory
Cambridge University

Return to Top

Subject: Mantel again
From: Rossi Jean Pierre
Date: Wed, 06 Nov 1996 16:51:01 +0100

Hello everybody
I am trying to learn about the Mantel statistic (Mantel,1967) in the 
field of soil ecology. If the method is useful in the study of linear 
structures, its ability to investigate relationships between matrices in 
the case of strong non-linearity seems to be doubtful (Legendre and 
Fortin,1989). Unfortunately, in soil biology, most of the spatial 
structures encountered or not linear gradient!
Can you help?
*Mantel, N. (1967). The detection of disease clustering and a 
generalized regression approach. Cancer Research, 27, 209-220.
*Legendre, P. and Fortin, M.J. (1989). Spatial pattern and ecological 
analysis. Vegetatio, 80, 107-138.
Jean Pierre ROSSI                   Tel: (33) 1 48 02 55 01   
Laboratoire d'Ecologie              Fax: (33) 1 48 47 36 78         
des Sols Tropicaux                  Email: rossijp@bondy.orstom.fr      
ORSTOM Centre de Bondy          
32 Av. VARAGNAT        
93143 Bondy Cedex FRANCE

Return to Top

Subject: time series
From: jcussens@comlab.ox.ac.uk (James Cussens)
Date: 06 Nov 1996 16:14:06 +0000

Which package, eg SPSS, SAS S-PLUS, do people think is best for the
analysis  of financial time-series.
-- 
 James Cussens                          
 Oxford University Computing Laboratory 
 Wolfson Building                       
 Parks Road                             
 Oxford OX1 3QD                         
 UK                                     
 Tel  +44 (0)1865 283520                
 Fax  +44 (0)1865 273839                
 http://www.comlab.ox.ac.uk/oucl/groups/machlearn/james.cussens.html

Return to Top

Subject: Applications of Log-linear models for Data-mining
From: Rainer Deventer
Date: Wed, 06 Nov 1996 18:28:24 -0800

Hello, 
today I spent my time looking in the Internet for papers/examples
about application of Loglinear models in the domain of data-mining
(Knowledge discovery in large databases). 
I am considering whether it makes sense to extend an existing 
data-mining tool with an (semi-)automatic search for the addaptation
of Log-linear models. 
So I have the following questions
1. Time complexity in dependence on the number of variables and 
   their values.
2. How many variables can be used in comerial programs
3. What is the maximal size of a contingency table which can be 
   analysed in an acceptable time (2 - 3 hours)
4. Are there examples for successfull application in business for 
   analysis of great databases
5. What results can be expected. Can Log-linear models also be used
   for clustering?
Thank you. 
If I get any responses I will post a summary. 
Kind regards 
Rainer

Return to Top

Subject: test
From: "Wim VDB"
Date: 6 Nov 1996 17:52:26 GMT

-- 
dit is een test

Return to Top

Subject: Re: Help with vector spaces, please.
From: ehren@macalester.edu (David L. O. Ehren)
Date: 6 Nov 1996 19:29:59 GMT

In article , ebohlman@netcom.com (Eric
Bohlman) wrote:
> Robert Gelb (rgelb@engr.csulb.edu) wrote:
> : Help with homework (vector spaces):
> 
> : The problem is to find out whether a given set is a vector space.
> 
> : The problem is as follows:
> : W' is the set of all ordered pairs(x,y) of real numbers that satisfy the
> : equation 2x+3y=1.
> 
> : The answer at the end of the book says, that this set is not a vector
> : space.
> : Can someone explain to me why?  I would consult the book but it provides
> : theoretical examples, however no practical ones.
> 
> For one thing, W' isn't closed under scalar multiplication; if 2x+3y=1, 
> then 2ax+3ay=a, so unless a=1, a(x,y) isn't in W'.
The graph of the equation you have is a line, so it LOOKS like a
one-dimensional vector space.  As was stated, it is not closed under
scalar multiplication, but it also fails to have an identity (0,0) or even
be closed under addition: if you have members (x,y) and (z,w), then 2(x+z)
+ 3(y+w) = 1+1 = 2.

Return to Top

Subject: Re: Random number generation
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 6 Nov 1996 20:24:17 GMT

I posted without too much thought when I wrote,
: What you seem to be requesting is somebody's  "really rotten
: pseudo random number generator";  with even the worst of those
: (among the practical ones), you can only make the future prediction
: as a statistical one, for example, "the 10th number PROBABLY will be
: SMALL..."
  -- but I have not totally changed my mind despite several 
comments by other people.  This message, below, contains extractions
from three other notes.
==========================from latest message> (from Barry Hembree),
> | : I'm looking for a pseudo random number algorithm which enables me to
> | : choose the n:th number in the sequence without needing to generate the
> | : first n-1 numbers (where n can be quite big). Does such a thing exist?
:Myself and Bruce Collings wrote a paper on how to do what you ask for 
:arbitrary length sequence RNGs.  It was published in the Oct 1984 issue 
:of JACM.
===========================
Assuming JACM was the Journal of the Association for Computing 
Machinery, I looked, and failed to find any the article in Oct 1984, 
or Oct 1983 or Oct 1985.  Nor did a search of some indexes, which 
included JACM, show any Barry Hembree in 1984 or 1985.
   Hey, I'm making an effort to learn something here, but to what avail?
(I would have just asked for the reference and gone to check again,
but that is going to take me a day or two more, and I thought that it
was time that I wrote some update to my original response...)
Before that, there was the post by Robert E Sawyer (soen@pacbell.net):
"Given the necessary caveats about limitations of PRNGs, the request seems 
a reasonable one in context.  Since (almost?) every PRNG has a (large) finite 
cycle length L, it makes sense to wonder whether, in theory, a generator might 
somehow be designed to allow the user to select among the L rotations of the 
cycle."
I agree that the math-theory I once studied about cycles suggests that
such a thing is conceivable.  But the little that I know about random
number generators suggests that skipping through the cycle should be
tough.  The main step, typically, is to multiply the previous Seed
by a large amount, and to extract some set of bits.  On the one hand,
the "large amount" and set of bits are chosen (I think)  for how
*little* information that they should carry pertaining to the previous
or the next step.  On the other hand, the FAULT of linear congruential
generators is that they DO have a tendency to provide lumpiness at
certain lag-times -- if one is to discuss just THAT type of PRNG.
But in detail, it seems to me that the regenerating the finite cycle 
by a different rule has probably NOT been done.  (And, from my
original suspicion:  if it could be done for a sequence, then I think
there is probably OTHER kinds of evidence that the sequence on hand
is not a very good one....)
It seems a whole lot tougher to imagine skipping through the cycle
if the selection/generation  includes a pseudo-random list that
is generated, then selected from at (pseudo) random;  which is 
a refinement that I first read about in one of these Usenet groups.
I can understand the idea of somehow skipping through a cycle, if you
are given the rules for generating it.  But I have to admit that I
really have little idea of what Herman Rubin was getting at, when 
he posted. 
"For the various pseudo-random algorithms in my ken, typically the
n-th element can be generated in time O(log(n)), and except for n
VERY large, it is not likely that better will be possible."
 -- How in heaven's name does one achieve an efficiency on the
order of log(n)?  That would imply that the 100th-to-come item 
is not much harder to pre-define than the 1000th.  Is this a 
statement of theoretical principle (and can we have a hint,
or a reference),  or has someone actually achieved this for some
interesting PRNGs?
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: Mantel again
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 6 Nov 1996 20:40:29 GMT

Rossi Jean Pierre (rossijp@bondy.orstom.fr) wrote:
: I am trying to learn about the Mantel statistic (Mantel,1967) in the 
: field of soil ecology. If the method is useful in the study of linear 
: structures, its ability to investigate relationships between matrices in 
: the case of strong non-linearity seems to be doubtful (Legendre and 
: Fortin,1989). Unfortunately, in soil biology, most of the spatial 
: structures encountered or not linear gradient!
: Can you help?
  -- I just looked a the Mantel article, and its whole PURPOSE is to 
show you how to take into account  'strong non-linearity'.  The
main, particular example is about looking at clusters in time, versus
clusters in space.  He suggests using the RECIPROCALs of distance between
cases, and of how far apart they occurred in time.  He shows an example
where the reciprocals work ENORMOUSLY better than using the ordinary
scales of distance==miles  and  time==days.  Does your other reference
mis-use Mantel to suggest that he is doing something that is just
linear?
: *Mantel, N. (1967). The detection of disease clustering and a 
: generalized regression approach. Cancer Research, 27, 209-220.
: *Legendre, P. and Fortin, M.J. (1989). Spatial pattern and ecological 
: analysis. Vegetatio, 80, 107-138.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh

Return to Top

Subject: Re: CpK for industrial SPC. What is it?
From: cwbern@aol.com
Date: 6 Nov 1996 20:38:34 GMT

The Cpk is a measure of how well is your process covering the tolerance.
Cpk = Min (CPL,CPU)
CPU = (USL - Avg.) / (3 X s.d) 
CPL = (Avg. -LSL) / (3 X s.d) 
A Cpk equal to 1 would mean that your process is exacty covering the 3
sigma tolerance.  The higher the Cpk, the more capable your process of
conforming to specifications.
Scince this method is highly dependent on the assumption of a normall
distribution, you should perform a normality test first.

Return to Top

Subject: Multiple Regression
From: tschmitz@hpu.edu
Date: 6 Nov 1996 21:21:09 GMT

Could someone please explain how to understand this in English?
Particularyly the t-stats and f-stat. I know what they are, but I am confused by the output.
Multiple Linier Regrerssion of GDP based on IBTS, V, and JS				Y =  GDP				
				X1 = IBTS				
				X2 = V				
SUMMARY OUTPUT			X3 = JS				
Regression Statistics								
Multiple R	0.996378							
R Square	0.992769							
Adjusted R Square	0.992504							
Standard Error	0.000122							
Observations	86							
ANOVA								
	        df	SS	        MS	        F	        Significance F			
Regression	3	0.000166	0.0000554	3752.701	0.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000123			
Residual	82	0.00000121	0.0000000148					
Total	        85	0.000167						
	        Coefficients	        Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.000%	Upper 95.000%
Intercept	-588.06	79.31423	-7.4143	0.0000000000998	-745.841	-430.279	-745.841	-430.279
X Variable 1	266.7569	        16.97887	15.71112	0.0000000000000000000000000188	232.9805	300.5333	232.9805	300.5333
X Variable 2	343.1546	        96.82104	3.544215	0.000653	150.5466	535.7626	150.5466	535.7626
X Variable 3	-20.86	6.245629	-3.33994	0.001263	-33.2846	-8.43549	-33.2846	-8.43549
-----------------------------------------------------------------------
This article was posted to Usenet via the Posting Service at Deja News:
http://www.dejanews.com/          [Search, Post, and Read Usenet News!]

Return to Top

Subject: Finite Population Correction
From: "Richard Reid"
Date: 7 Nov 1996 02:59:45 GMT

Greetings:
How is the "finite population correction" derived?
Any help would be appreciated.
Sincerely,
Richard Reid

Return to Top

Subject: Re: Random number generation
From: d91johol@isy.liu.se (John Olsson)
Date: 7 Nov 1996 11:52:51 GMT

I got a personal reply from Barry W. Brown, who said that he and some other persons
had made a program package based on (among other papers)
"Implementing a Random Number Package with Splitting Facilities"
Pierre L'Ecuyer and Serge C�t�
ACM Transactions on Mathematical Software
Vol. 17, No. 1, March 1991, Pages 98-111
The algorithm described in this paper allows the user to advance the state of the
generator by 2^k values.
Part of the REAME file that comes with the source:
                                    RANLIB.C
               Library of C Routines for Random Number Generation
                                     README
                            Compiled and Written by:
                                 Barry W. Brown
                                  James Lovato
                     Department of Biomathematics, Box 237
                     The University of Texas, M.D. Anderson Cancer Center
                     1515 Holcombe Boulevard
                     Houston, TX      77030
 This work was supported by grant CA-16672 from the National Cancer Institute.
                       THANKS TO OUR SUPPORTERS
This work  was supported  in part by  grant CA-16672 from the National
Cancer Institute.  We are grateful  to Larry and  Pat McNeil of Corpus
Cristi for their generous support.  Some equipment used in this effort
was provided by IBM as part of a cooperative study agreement; we thank
them.
                          SUMMARY OF RANLIB
The bottom level routines provide 32 virtual random number generators.
Each generator can provide 1,048,576 blocks of numbers, and each block
is of length 1,073,741,824.  Any generator can be set to the beginning
or end  of the current  block or to  its starting value.  Packaging is
provided   so  that  if  these capabilities  are not  needed, a single
generator with period 2.3 X 10^18 is seen.
Using this base, routines are provided that return:
    (1)  Beta random deviates
    (2)  Chi-square random deviates
    (3)  Exponential random deviates
    (4)  F random deviates
    (5)  Gamma random deviates
    (6)  Multivariate normal random deviates (mean and covariance
         matrix specified)
    (7)  Noncentral chi-square random deviates
    (8)  Noncentral F random deviates
    (9)  Univariate normal random deviates
    (10) Random permutations of an integer array
    (11) Real uniform random deviates between specified limits
    (12) Binomial random deviates
    (13) Negative Binomial random deviates
    (14) Multinomial random deviates
    (15) Poisson random deviates
    (16) Integer uniform deviates between specified limits
    (17) Seeds for the random number generator calculated from a
         character string
                 COMMENTS ON THE C VERSION OF RANLIB
The C version was  obtained by converting the original  Fortran RANLIB
to C using PROMULA.FORTRAN  and performing  some hand  crafting of the
result.  Information on PROMULA.FORTRAN can be obtained from
                   PROMULA Development Corporation
                    3620 N. High Street, Suite 301
                         Columbus, Ohio 43214
                            (614) 263-5454
RANLIB.C  was tested  using the xlc  compiler under AIX  3.1 on an IBM
RS/6000.  The code  was  also examined  with lint  on the same system.
The RANLIB test  programs were also  successfully run using   the  gcc
compiler (see below) on a Solbourne.
RANLIB.C can be  obtained from  statlib.  Send mail  whose message  is
'send ranlib.c.shar from general' to statlib@lib.stat.cmu.edu.
RANLIB.C can also be obtained by anonymous ftp  to odin.mda.uth.tmc.edu
(129.106.3.17) where is is available as
                        /pub/unix/ranlib.c.tar.Z
For  obvious   reasons, the   original   RANLIB  (in    Fortran)   has
been renamed to
                        /pub/unix/ranlib.f.tar.Z
on the same machine.

Return to Top

Subject: Re: Multiple Regression
From: jdebord@MicroNet.fr (Jean Debord)
Date: Thu, 07 Nov 1996 18:41:06 GMT

tschmitz@hpu.edu wrote:
>Could someone please explain how to understand this in English?
>Particularyly the t-stats and f-stat. I know what they are, but I am confused by the output.
> [... computer output snipped ...]
t is the ratio of the parameter to its standard error. The t test
compares the value of the parameter with zero. If the associated
probability (P-value) is sufficiently low, you may admit (with a risk
equal to the probability) that the parameter differs significantly
from zero. Usually one chooses probability values of 0.05 (5%), 0.01
(1%) and 0.001 (0.1%) as the reference values to which the observed
probability is compared. If the probability is > 0.05, the parameter
does not differ significantly from zero : this means that the
influence of the corresponding variable is negligible, so that you may
wish to discard this variable from the regression.
Your results show that the influence of variables 1 and 2 is
significant at the 0.1% level, and that the influence of variable 3 is
significant at the 1% level. So, you don't need to discard any
variable.
The coefficient of determination r� (R square) is the ratio of the
"explained" sum of squares (SSe) to the total sum of squares (SSt) :
	r� = SSe / SSt
	SSt = Sum [y(i) - m]�
	SSe = Sum [ycalc(i) - m]�
	where : 
	m is the average of the y(i) : m = [Sum y(i)] / n
	n the number of observations
	ycalc(i) the estimated y value for the i-th observation.
The adjusted coefficient of determination (r�aj) is such that :
	r�aj = 1 - [(n - 1) / (n - p)] (1 - r�)
	where p is the number of regression parameters
	This coefficient takes into account the number of regression
parameters  with respect to the number of observations.
The residual standard error (s) is such that :
	s� = SSr / (n - p)
	s� = residual variance
	SSr = Residual sum of squares = Sum [y(i) - ycalc(i)]�
F is the ratio of the explained variance to the residual variance :
	F = [SSe  / (p - 1)] / s�
In the case of linear regression  : 	SSt = SSe + SSr
(p - 1) and (n - p) are the numbers of degrees of freedom (df)
associated with the explained and residual variance, respectively.
In your case : n = 86, p = 4, p - 1 = 3, n - p = 82, and the F is
significant at the 0.1% level
I hope this helps.
Best regards.
Jean Debord
Faculte de Medecine, Limoges, France

Return to Top

Subject: Wall Street Quant Position
From: David Rothman
Date: Thu, 07 Nov 1996 13:01:10 +0100

The trading arm of a major investment firm is seeking a quantitative
specialist for its New York based Analytical Equity Trading Group to
work with its senior professionals in the on-going development of
sophisticated statistical/econometric trading models and strategies. 
QUALIFICATIONS:
The successful candidate will have in-depth knowledge of financial
economics,  time series econometrics, stochastic processes and the
requisite skills necessary to design and implement strategies in a
sophisticated computer environment. Comfort in dealing with
Probabilistic notions such as Random Walk, Brownian Motion and
Martingale Theory, combined with Econometric ideas such as Stationarity,
Cointegration, Error-Correction Models and Arch/Garch is essential.
This position would be ideal for someone with prior experience in a
related field, and/or academic training near or at the Ph.D. level.
CONTACT:
E-mail: nyrtd@ny.ubs.com
Please reply via email with either a resume or a short informal
description of yourself.  Please include a day & evening phone number.  
We are an Equal Opportunity employer.

Return to Top

Subject: Confounding variables in regression
From: Dan Kehler <005769k@ace.acadiau.ca>
Date: Thu, 07 Nov 1996 15:29:58 -0800

We've been struggling with this question for a while:
Given the common situation of wanting to know the effect of a variable,  
X2 on Y, independent of the the effect of X1 on Y, when X1 and X2 are 
correlated, which of the following two methods is more appropriate?
method 1:  
model 1 = Y ~ X1
model 2 = residuals(model 1) ~ X2 , test for significance of X2
Method 2:  
model 1 = Y ~ X1 + X2, test for significance of X2.
This is what I think is going on:
These two methods are quite different, and the difference between them 
depends on the degree of correlation betwen X1 and X2.  In the first 
method, the two parameters, b1 and b2 are estimated separately, such that 
the entire variability in Y shared between X1 anfd X2 is allocated to the 
parameter estimate for X1.  In the second method the two parameter 
estimates compete for the variability in Y shared between X1 and X2.
Which method is preferable and why?
We'd be very grateful for some insight or a useful reference.
Thanks,
Dan Kehler
Acadia University
Wolfville, NS, CANADA
B0P IX0

Return to Top

Subject: Confounding variables in regression
From: Dan Kehler <005769k@ace.acadiau.ca>
Date: Thu, 07 Nov 1996 15:29:44 -0800

We've been struggling with this question for a while:
Given the common situation of wanting to know the effect of a variable,  
X2 on Y, independent of the the effect of X1 on Y, when X1 and X2 are 
correlated, which of the following two methods is more appropriate?
method 1:  
model 1 = Y ~ X1
model 2 = residuals(model 1) ~ X2 , test for significance of X2
Method 2:  
model 1 = Y ~ X1 + X2, test for significance of X2.
This is what I think is going on:
These two methods are quite different, and the difference between them 
depends on the degree of correlation betwen X1 and X2.  In the first 
method, the two parameters, b1 and b2 are estimated separately, such that 
the entire variability in Y shared between X1 anfd X2 is allocated to the 
parameter estimate for X1.  In the second method the two parameter 
estimates compete for the variability in Y shared between X1 and X2.
Which method is preferable and why?
We'd be very grateful for some insight or a useful reference.
Thanks,
Dan Kehler
Acadia University
Wolfville, NS, CANADA
B0P IX0

Return to Top

Subject: Confounding variables in regression
From: Dan Kehler <005769k@ace.acadiau.ca>
Date: Thu, 07 Nov 1996 15:29:37 -0800

We've been struggling with this question for a while:
Given the common situation of wanting to know the effect of a variable,  
X2 on Y, independent of the the effect of X1 on Y, when X1 and X2 are 
correlated, which of the following two methods is more appropriate?
method 1:  
model 1 = Y ~ X1
model 2 = residuals(model 1) ~ X2 , test for significance of X2
Method 2:  
model 1 = Y ~ X1 + X2, test for significance of X2.
This is what I think is going on:
These two methods are quite different, and the difference between them 
depends on the degree of correlation betwen X1 and X2.  In the first 
method, the two parameters, b1 and b2 are estimated separately, such that 
the entire variability in Y shared between X1 anfd X2 is allocated to the 
parameter estimate for X1.  In the second method the two parameter 
estimates compete for the variability in Y shared between X1 and X2.
Which method is preferable and why?
We'd be very grateful for some insight or a useful reference.
Thanks,
Dan Kehler
Acadia University
Wolfville, NS, CANADA
B0P IX0

Return to Top

Subject: Confounding variables in regression
From: Dan Kehler <005769k@ace.acadiau.ca>
Date: Thu, 07 Nov 1996 15:32:34 -0800

We've been struggling with this question for a while:
Given the common situation of wanting to know the effect of a variable,  
X2 on Y, independent of the the effect of X1 on Y, when X1 and X2 are 
correlated, which of the following two methods is more appropriate?
method 1:  
model 1 = Y ~ X1
model 2 = residuals(model 1) ~ X2 , test for significance of X2
Method 2:  
model 1 = Y ~ X1 + X2, test for significance of X2.
This is what I think is going on:
These two methods are quite different, and the difference between them 
depends on the degree of correlation betwen X1 and X2.  In the first 
method, the two parameters, b1 and b2 are estimated separately, such that 
the entire variability in Y shared between X1 anfd X2 is allocated to the 
parameter estimate for X1.  In the second method the two parameter 
estimates compete for the variability in Y shared between X1 and X2.
Which method is preferable and why?
We'd be very grateful for some insight or a useful reference.
Thanks,
Dan Kehler
Acadia University
Wolfville, NS, CANADA
B0P IX0

Return to Top

Subject: Q: pmf of cos .... (need help)
From: wxu@cs.utexas.edu (Wei Xu)
Date: 7 Nov 1996 14:13:16 -0600

Hi netters,
I have the following question need your help.
Let R and D be two n-dimensional random vectors:
R = [r_1,...,r_n] and D = [d_1,...,d_n].
Their jumps or increments (i.e., r_{i+1} - r_i and d_{i+1} - d_i for all i )
are iid (independent identically distributed) random variables, 
say, zero-mean Laplacian variables.
Let m1 and m2 be averaged values of R and D:
	m1 = (r_1 + ...+r_n)/n,    
	m2 = (d_1 + ... + d_n)/n,
Define two new vectors
	R1 = (r_1 - m1, ..., r_n - m1)
        D1 = (d_1 - m2, ..., d_n - m2).
Now, I want to know the pmf of the following variable Y:
Y = cos(angle between R1 and D1) = /sqrt()
where  means dot product of two vectors x and y.
Does anybody know how or where( which books or papers) 
I can find the answer?
Please give me an email if you can help.
Thank you very much!
Wei Xu

Return to Top

Subject: pseudo random sequence using (ax+b) mod c
From: Richard Lay
Date: Thu, 07 Nov 1996 10:52:20 -0800

Hi all.
I have recently heard of a (rather old) method for generating pseudo
random sequences of numbers using the relation:
x(new) = (ax+b)mod c
How does one pick a and b to make sure that the x's go through all c
numbers before repeating?
Rich

Return to Top

Subject: Help finding reference
From: bwallet@nswc.navy.mil (Brad Wallet)
Date: Thu, 7 Nov 1996 19:03:13 GMT

I'm trying to find the authors of a paper from the JSM in
1995.  There was a paper presented there by S. Chatterjee
and M. Laudato entitled "Genetic algorithms and their 
statistical applications."  Can anyone help me find them?
Brad

Return to Top

Subject: case deletion formulas
From: Jan Deleeuw
Date: 07 Nov 1996 10:53:52 -0800

I have added the little file with the basic case deletion formulas to
http://www.stat.ucla.edu/textbook/formulas/
If you have the Adobe plugin installed you can read the pdf version
in Netscape. You can download the LaTeX version for cutting and pasting.
Nothing original, just for reference.
-- 
Jan de Leeuw; UCLA Department of Statistics; UCLA Statistical Consulting
US mail: 8118 Math Sciences, 405 Hilgard Ave, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw@stat.ucla.edu  
                 www: http://www.stat.ucla.edu/~deleeuw

Return to Top

Subject: Re: linear regression with errors
From: Jim Hunter
Date: Thu, 07 Nov 1996 14:43:39 -0500

M.A. Cremonini wrote:
> 
> Hi,
> I must solve a problem like this:
> 
>                 Ax=B
> 
> A is my data matrix ans B is a column vector with results.
> B is formed by numbers which can span a certain range i.e. all
> the numbers within
> the range can be solutions of the corresponding equation:
> 
>         |a11  a12| |x1|     |B1 +/- e1|
>         |a21  a22| |  |  =  |B2 +/- e2|
>         |a31  a32| |x2|     |B3 +/- e3|
>         |a41  a42| |  |     |B4 +/- e4|
> 
> For example:
> 
> if A11*x1+A12*x2 = Q and Q is within the range (B1-e1 - B1+e1) the result
> is the one searched for.
> 
> Is there a 'one shot' way to solve this system (I mean just using matrix
> algebra and without using a minimization procedure)?
> Thank you for your help.
> Mauro
> 
In a sense there's a "one shot" method. You can restate the problem
in a standard minimax form as follows:
Let E be a diagonal matrix of the tolerances e1, e2, ...
Then the problem is:
given A and B, find x and d so that
        Ax = B + Ed    
with |d(i)| <= 1.
Assuming each e(i) > 0, we then need to find some x so that
        || Fx - C ||oo  <= 1.
Here F = E^-1*A, C = E^-1*B, and ||*|| is the infinity or
maximum norm. So, what you what to do is to find
    min || Fx - C ||oo.
     x 
Your original problem has a solution iff the vector that
minimizes this has a residual R = Ax' - b so that
     || R ||oo <= 1.
You should be able to find software to handle this on the
net somewhere, possibly try netlib. Look for functions concerned
with "minimax", "min-max", or Remez exchange. 
Note: If you're a Matlab user, the Matlab remez function won't
be of use. That's a specialized version of the Remez algorithm
for designing digital filters.
----
Jim

Return to Top

Subject: Re: Random number generation
From: William Snyder
Date: Thu, 07 Nov 1996 15:55:34 -0800

Hi,
I found another reference on this. Have a look at:
http://www.mcs.anl.gov/dbpp/text/node119.html
It is an on-line book on parallel algorithms. This page talks about
distributed random number generation.
-Will
-- 
Dr. William C. Snyder   will@icess.ucsb.edu
Imaging Scientist       http://www.icess.ucsb.edu/~will/will.html  
Institute for Computational Earth Systems Science
University of California Santa Barbara

Return to Top

Subject: Earthquake Page
From: Jan de Leeuw
Date: Thu, 07 Nov 1996 16:04:57 -0800

I updated the Northridge Earthquake page at
http://www.stat.ucla.edu/cases/northridge/
This has about 3000 aftershocks, with location and magnitude, 
a nice dataset. It also has the Caltech and USGS bulletins
of January 1994, and quite a bit of background info.
I replaced the ps.gz files with gifs that display directly in the
browser, and I made the page look current. I will update it as
soon as we have an earthquake with magnitude larger than 6.6.   
--- Jan

Return to Top

Subject: Re: Meta-Analysis ??
From: orourke@utstat.toronto.edu (Keith O'Rourke)
Date: Thu, 7 Nov 1996 15:24:19 GMT

 wrote:
<>I am trying to learn about the procedures involved in performing a
<>Meta-Analysis. However, no seems to be able to direct me to a good
<>source. Does anyone have any ideas??  Any help would be appreciated!!

Return to Top
Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.math 11602

Directory

Articles