Newsgroup sci.stat.math 11359

Articles

Subject: Re: post hocs
From: llefkovi@ccs.carleton.ca (llefkovi)
Date: 20 Oct 1996 13:27:43 GMT

Stan Baldwin (sabald01@pop.uky.edu) wrote:
> I need a little help. I am not a statistician!
> I'm running a 2-way ANOVA and I want to run post hocs. I know that I can
> run post hocs like Bonferroni or Scheffe but if I understand things
> correctly they assume that I'll be running all possible pairwise
> comparisions. That's not what I want to do. Suppose my design has 100
> possible comparisons but I only what to make 25 comparisons. The
> Bonferroni and Scheffe is overkill and I loose power. Is there a post
> hoc that allows me to tell it how many comparisions of means I want to
> make and then it compute significance? 
> Another question:  In a 2-way ANOVA, if I get no interaction but do get
> significant F scores for my groups, can I still run a post hoc comparing
> individual means. Does anyone know of a good, clear article on this
> subject. 
> Last question: What post hoc can I run if my ANOVA design is a repeated
> measures, i.e. I have both dependent and independent observations. If I
> want to compare two dependent means for possible differences, what post
> hoc can I use?
> Many thanks for any help!!!!
> Stan
> email sabald01@pop.uky.edu
Since you seem to know which subsets of the levels of the factors and
interactions are of interest, have you considered the use of contrasts
(orthogonal or otherwise) incorporated in the anova?
Leonard Lefkovitch

Return to Top

Subject: solution of 1 - p = [ 1 - f(x,t) ] / [ 1 - f(0,t) ]
From: "Faisal Fairag"
Date: 20 Oct 1996 15:53:27 GMT

Hi every one , I need some help. One of my friend working in a statistical
problem and he reach to this problem. can any one give me a hint how to 
solve this  problem. The problem is :
Solve for x in terms of  t , p and  a   in the follownig expression :
1 - p =  [ 1 - f(x,t) ]  /  [ 1 - f(0,t) ]
wher 
f(x,t) = ( 1 / PI ) * g(x,t)
g(x,t) = sum_{m =1, inf}{ h(m) * h2(m,a,x,t) }
h(m) = (-1)**(3m-3) / [ (2m-1)!(2m-1) ]
h2(m,a,x,t) = GAMMA((2m-1)/a+1) (x+t)**(2m-1)
GAMMA is the gamma function (i.e)
GAMMA(a) = integral_{0,inf}{x**(a-1) exp(-x) dx}
PI = 3.14......
please send me e-mail to fafst2@pitt.edu

Return to Top

Subject: Re: Data Mining Book
From: aacbrown@aol.com (AaCBrown)
Date: 20 Oct 1996 14:03:23 -0400

Hein Hundal  in <3266C77F.2EC4@kincyb.com> writes:
> Often the books use regression on half the data set and
> use the second half of the data set to test the model. . . . 
> Is this a standard technique? . . . I don't know what data
> mining is, but it sounds like something I might like to learn about.
Data Mining is a disparaging term for running lots of analyses until you
get the answer you want. For example, if you have 10 independent variables
there are 1,024 regressions you could run by including or excluding
variables. Plus each variable can be transformed in many different ways.
If you keep searching you will find models that, just by chance, pass all
normal statistical significance tests. But that does not mean they are
reliable.
The Validation procedure you describe is a good idea in many cases. Among
other things it guards against data mining. If you have plenty of data,
your results will be much more reliable if you fit on one data set and
validate on another. However we often do not have enough data to do this.
Aaron C. Brown
New York, NY

Return to Top

Subject: Re: Tracking polls
From: "Joseph M. Knapp"
Date: Sun, 20 Oct 1996 15:04:44 -0400

This is a multi-part message in MIME format.
--------------55FA6A2DC5
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Richard F Ulrich wrote:
> When similar polls have been conducted numerous times, the best
> judgement of "How volatile is the race?"  -  is probably based on how
> polls have varied in the past.  They do vary about as the Standard
> Errors would suggest, I am pretty sure, for one agency, though
> there CAN be bigger differences between agencies.
--------------55FA6A2DC5
Content-Type: text/plain; charset=us-ascii; name="Zogsim.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="Zogsim.txt"
I ask about how much volatility can be expected when a 4% 'margin
of error" is advertised because of some results I got from a simulation
of the tracking poll done by Reuters/Zogby. It seems to indicate that
wide swings would be seen even if the underlying preferences don't
change at all.
Simulated Reuters tracking poll
	- 300 people sampled each day
	- 3-day running avergae reported (total sample 900)
	- 3.3% margin of error (95%) reported
	- assume that support for each candidate is constant:
	  Clinton 50%, Dole 38%, Perot 6%.
P = Perot,  D = Dole,  C = Clinton,  x = other or undecided
             % preference (likely voters)           Clinton lead (%)
    0    5   10  30   35   40   45   50   55   60   5   10   15   20
    |----|----|   |----|----|----|----|----|----|   |----|----|----|
  3 |      Px |   |      D           C          |   |      o       |
  4 |     P x |   |        D       C            |   |  o           |
  5 |    P  x |   |        D        C           |   |   o          |
  6 |    Px   |   |       D            C        |   |       o      |
  7 |     Px  |   |     D               C       |   |          o   |
  8 |     Px  |   |     D              C        |   |         o    |
  9 |    P   x|   |    D               C        |   |          o   |
 10 |    P   x|   |      D           C          |   |      o       |
 11 |    P   x|   |       D         C           |   |    o         |
 12 |    P   x|   |        D       C            |   |  o           |
 13 |    P  x |   |       D          C          |   |     o        |
 14 |     Px  |   |     D               C       |   |          o   |
 15 |     Px  |   |    D                C       |   |           o  |
 16 |   x  P  |   |      D              C       |   |         o    |
 17 |    Px   |   |        D          C         |   |     o        |
 18 |     Px  |   |      D            C         |   |       o      |
 19 |     xP  |   |      D            C         |   |       o      |
 20 |     xP  |   |     D              C        |   |         o    |
 21 |    x P  |   |       D           C         |   |      o       |
 22 |     Px  |   |        D        C           |   |   o          |
 23 |     P x |   |        D       C            |   |  o           |
 24 |     P x |   |    D               C        |   |          o   |
 25 |    P  x |   |   D                  C      |   |             o|
 26 |     Px  |   |    D                C       |   |           o  |
 27 |      Px |   |       D         C           |   |    o         |
 28 |      P  x   |      D        C             |   |   o          |
 29 |      Px |   |        D      C             |   | o            |
    |----|----|   |----|----|----|----|----|----|   |----|----|----|
    0    5   10  30   35   40   45   50   55   60   5   10   15   20
So it can be seen that any amount of bogus "news" can be generated
from such a poll, even if nothing is really changing except for
inherent statistical gyrations. Every time the poll moves a smidgen, Zogby
is out touting the shift as being due to this or that external event.
Joe
--------------55FA6A2DC5--

Return to Top

Subject: Re: Data Mining Book
From: Robert Dodier
Date: Sun, 20 Oct 1996 13:00:41 -0600

Hein Hundal wrote:
[snip]
> I am also trying to find references for two other subjects: data
> mining, and factor analysis.  I don't know what data mining is, but it
> sounds like something I might like to learn about. 
[snip]
Data mining, as mentioned by another poster in this thread, is
a term used pejoratively by statisticians, but more positively by
computer scientists; for them, data mining means finding information
in databases too large for human comprehension.
I don't belong to the data mining camp, but I know they exist. You
might try the home page of the Knowledge Discovery and Data Mining
Foundation, http://www.kdd.org/, to see what literature you can find.
Hope this helps,
Robert Dodier

Return to Top

Subject: Re: multivariate t distribution
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 20 Oct 1996 15:53:43 -0500

In article <54ahpv$sh5@newsgate.dircon.co.uk>,
Danny Alexander  wrote:
>Does anybody have or know where I can find some code for the maximum
>likelihood estimation of the parameters of a multivariate t model of
>some data I have. Preferably including the code for MLE of the degrees
>of freedom too.
>If not, any good refernces for the algorithms for doing this?
I do not know of any published algorithms; there may be some.
But I would suggest that the logarithm of the density be written
as 
C - .5*ln(det \Sigma) - (ln(1 + .5*h*(x-m)'\Sigma^{-1}(x-a)))/h + \phi(t),
where \phi involves the logarithm of the \Gamma function of .5*|1/t +- p|,
p being the dimension of the space, and other expressions so it behaves
well at t=0.  If the Beta distribution is to be excluded, t must be
positive, but the maximum may occur at t=0.  Not doing it this way
can involve numerical instabilities.  Newton's method, gradient methods,
or other standard numerical procedures will work, starting from a 
good estiamtor.  
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Weibull distribution : 1) What kind of distribution is it? 2) Pls suggest a program to fit it
From: "HAMEL, Jean"
Date: Sun, 20 Oct 1996 17:35:00 -0400

I have been suggested to use the Weibull distribution to model a
distribution of fibre length distribution fibers in paper sheet?
1) What is the nature of the distribution i.e. where does it come from?
3) What stat program would be best to fit it?
Thanks in advance for your cooperation
J. Hamel

Return to Top

Subject: Calc of SD for parameters of a fit to a nonlinear equation...
From: tga@cs.cs.appstate.edu (Terry Anderson)
Date: 21 Oct 1996 01:21:57 GMT

Richard Timmer  wrote:
> So, I am asking whether some one has developed a solution, knows where I can
> look, or provide helpful tips for determining the S.D. of "a" and "b" for a
> fit to the following equation:
>
>        y = a*(1-(EXP(-1*(b*x))))  (or in general terms, y = f(x) in which
>                                   f(x) contains two parameters a & b)
Here's a reference:  
"Error analysis for parameters determined in nonlinear least-squares fits"
by  Keith H. Burrell,  American Journal of Physics,  v. 58 (2),
Feb. 1990,  pp. 160-4.
Abstract:  "This article includes a calculation of the error in parameters
derived from the least-squares method of fitting nonlinear models to
experimental data.  The formula reduces to the well-known result for the
case of a linear least-squares fit.  It differs, however, from a method
for calculating the error that is often employed for the nonlinear case.
The difference between the current result and that method's is
illustrated with examples from least-squares fits to spectroscopic
data."  
--
        Terry Anderson                  tga@math.appstate.edu
        Math Sciences Dept.             Appalachian State University
        Boone, NC  28608   USA          (704)  262 - 2357
	http://www.mathsci.appstate.edu/u/math/tga/

Return to Top

Subject: Re: Data Mining Book
From: Blaise F Egan
Date: Mon, 21 Oct 1996 09:34:52 +0100

Hein Hundal wrote:
> 
> Often the books use regression on
> half the data set and use the second half of the data set to test the
> model, then they reverse the halves.  (I.e. use the second half for the
> regression and testing the result against the first half.)  Is this a
> standard technique?  I have also seen the data set divided up into
> thirds.  Can anyone recommend references for getting the most
> information out of a set of data?
> 
>    I am also trying to find references for two other subjects: data
> mining, 
1. The technique of splitting the data into 2 halves fitting to one and 
testing on the other is called 2-fold cross-validation. Yes, 
cross-validation is a standard technique. 5 and 10-fold cross-validation 
(where you fit to 80% and 90% of the data) are also common, as is 
fitting to all observations but one, then re-doing as many times as you 
have observations but leaving out a different observation each time.
See 'A Leisurely Look at the Boststrap, the Jacknife and 
Cross-Vaildation' by Bradley Efron and Gail Gong, The American 
Statistician, Feb 1983 Vol 37 No 1.
2. 'Data mining is the process of automatically extracting non-obvious, 
hidden knowledge from a database.' 
A good reference is 'Knowledge Discovery in Databases' Edited by Gregory 
Piatetsky-Shapiro and William J Frawley, 1991 The AAAI Press. ISBN 
0-262-66070-9
Blaise F Egan
Data Mining Group
BT Labs

Return to Top

Subject: IDA-97 Call for Papers
From: Michael Berthold
Date: Mon, 21 Oct 1996 09:30:11 +0100

                           CALL FOR PAPERS
  The Second International Symposium on Intelligent Data Analysis (IDA-97)
                 Birkbeck College, University of London
                         4th-6th August 1997 
                         In Cooperation with 
           AAAI, ACM SIGART, BCS SGES, IEEE SMC, and SSAISB
               [ http://web.dcs.bbk.ac.uk/ida97.html ]
Objective
=========
For many years  the intersection  of computing  and data  analysis contained
menu-based statistics  packages and not  much else.  Recently, statisticians
have embraced computing,  computer scientists are using statistical theories
and methods, and researchers in all corners are inventing algorithms to find
structure in vast  online datasets.  Data analysts  now have access to tools
for exploratory  data analysis,  decision tree induction,  causal induction,
function  finding,  constructing  customised  reference  distributions,  and
visualisation.  There are  prototype  intelligent  assistants  to  advise on
matters of design and analysis.  There are tools for traditional, relatively
small samples and for enormous datasets.  
The focus of  IDA-97  will be  "Reasoning About Data".  We are interested in
intelligent systems that reason about how to analyze data,  perhaps as human
analysts do.  Analysts often  bring exogenous  knowledge about  data to bear
when they decide how to analyze it;  they use intermediate results to decide
how to proceed;  they reason about how much  analysis the data will actually
support;  they consider which methods will be most informative;  they decide
which aspects of a model are most uncertain and focus attention there;  they
sometimes  have  the  luxury  of  collecting more  data,  and plan  to do so
efficiently.  In short, there is a strategic aspect to data analysis, beyond
the tactical choice of this or that test, visualisation or variable.
Topics 
======
The following topics are of particular interest to IDA-97:
     * APPLICATIONS & TOOLS
         - analysis of different kinds of data (e.g., censored, temporal etc)
         - applications (e.g., commerce, engineering, finance, legal,
                          manufacturing, medicine, public policy, science)
         - assistants, intelligent agents for data analysis
         - evaluation of IDA systems
         - human-computer interaction in IDA
         - IDA systems and tools
         - information extraction, information retrieval
     * THEORY & GENERAL PRINCIPLES
         - analysis of IDA algorithms
         - bias
         - classification
         - clustering
         - data cleaning
         - data pre-processing
         - experiment design
         - model specification, selection, estimation
         - reasoning under uncertainty
         - search
         - statistical strategy
         - uncertainty and noise in data
     * ALGORITHMS & TECHNIQUES
         - Bayesian inference and influence diagrams
         - bootstrap and randomization
         - causal modeling
         - data mining
         - decision analysis
         - exploratory data analysis
         - fuzzy, neural and evolutionary approaches
         - knowledge-based analysis
         - machine learning
         - statistical pattern recognition
         - visualization
Submissions
===========
Participants  who wish to present a paper are requested to submit a manu-
script, not exceeding 10 single-spaced pages. We strongly encourage  that 
the manuscript is formatted following  the Springer's  "Advice to Authors 
for the Preparation of Contributions to  LNCS Proceedings"  which  can be
found  on the IDA-97 web page. This submission format is identical to the 
one for the  final  camera-ready copy of accepted papers. In addition, we 
request a separate page detailing the paper title, authors' names, postal 
and email addresses, phone and fax numbers.
Email submissions in Postscript form are encouraged. Otherwise, five hard 
copies of the manuscripts should be submitted.
Submissions should be sent to the IDA-97 Program Chairs:
Central, North and South America:        Elsewhere:
Paul Cohen                               Xiaohui Liu
Department of Computer Science           Department of Computer Science
Lederle Graduate Research Center         Birkbeck College
University of Massachusetts, Amherst     University of London
Amherst, MA 01003-4610                   Malet Street
USA                                      London WC1E 7HX, UK
cohen@cs.umass.edu                       hui@dcs.bbk.ac.uk
IMPORTANT DATES
February 1st, 1997              Submission of papers
April 15th, 1997                Notification of acceptance
May 15th, 1997                  Final camera ready paper
Review
======
All submissions will  be reviewed on the basis of relevance, originality, 
significance,  soundness and clarity.  At least two referees  will review 
each submission independently. Results of the  review will be send to the
first author via email, unless requested otherwise.
Publications
============
Papers which are accepted and presented at the  conference will appear in
the IDA-97 proceedings, to be published by Springer-Verlag in its Lecture
Notes in  Computer Science  series. Authors  of the  best papers  will be
invited to extend their papers for further review  for a special issue of 
"Intelligent Data Analysis: An International Journal".
IDA-97 Organisation
===================
General Chair:            Xiaohui Liu
Program Chairs:           Paul Cohen, Xiaohui Liu
Steering Comm. Chair:     Paul Cohen, University of Massachusetts, USA
Exhibition Chair:         Richard Weber, MIT GmbH, Aachen, Germany
Finance Chair:            Sylvie Jami, Birkbeck College, UK
Local Arrangements Chair: Trevor Fenner, Birkbeck College, UK
Public. and Proc. Chair:  Michael Berthold, University of Karlsruhe, Germany
Sponsorship Chair:        Mihaela Ulieru, Simon Fraser University, Canada
Steering Committee
Michael Berthold          University of Karlsruhe, Germany
Fazel Famili              National Research Council, Canada
Doug Fisher               Vanderbilt University, USA
Alex Gammerman            Royal Holloway London, UK
David Hand                Open University, UK
Wenling Hsu               AT&T; Consumer Lab, USA
Xiaohui Liu               Birkbeck College, UK
Daryl Pregibon            AT&T; Research, USA
Evangelos Simoudis        IBM Almaden Research, USA
Program Committee
Eric Backer               Delft University of Technology, The Netherlands
Riccardo Bellazzi         University of Pavia, Italy
Michael Berthold          University of Karlsruhe, Germany
Carla Brodley             Purdue University, USA
Gongxian Cheng            Birkbeck College, UK
Fazel Famili              National Research Council, Canada
Julian Faraway            University of Michigan, USA
Thomas Feuring            WWU Muenster, Germany
Alex Gammerman            Royal Holloway London, UK
David Hand                The Open University, UK
Rainer Holve              Forwiss Erlangen, Germany
Wenling Hsu               AT&T; Research, USA
Larry Hunter              National Library of Medicine, USA
David Jensen              University of Massachusetts, USA
Frank Klawonn             University of Braunschweig, Germany
David Lubinsky            University of Witwatersrand, South Africa
Ramon Lopez de Mantaras   Artificial Intelligence Research Institute, Spain 
Sylvia Miksch             Stanford University, USA
Rob Milne                 Intelligent Applications Ltd, UK
Gholamreza Nakhaeizadeh   Daimler-Benz Forschung und Technik, Germany
Claire Nedellec           Universite Paris-Sud, France
Erkki Oja                 Helsinki University of Technology, Finland
Henri Prade               University Paul Sabatier, France
Daryl Pregibon            AT&T; Research, USA
Peter Ross                University of Edinburgh, UK
Steven Roth               Carnegie Mellon University, USA
Lorenza Saitta            University of Torino, Italy
Peter Selfridge           AT&T; Research, USA
Rosaria Silipo            University of Florence, Italy
Evangelos Simoudis        IBM Almaden Research, USA
Derek Sleeman             University of Aberdeen, UK
Paul Snow                 Delphi, USA
Rob St. Amant             North Carolina State University, USA
Lionel Tarassenko         Oxford University, UK
John Taylor               King's College London, UK
Loren Terveen             AT&T; Research, USA
Hans-Juergen Zimmermann   RWTH Aachen, Germany
Enquiries
=========
Detailed information  regarding IDA-97 can be found  on the World Wide Web 
Server of the  Department of Computer Science at Birkbeck College, London:
                 http://web.dcs.bbk.ac.uk/ida97.html
Apart from presentation of research papers, IDA-97 also welcomes demonstr-
ations of software and publications  related to  intelligent data analysis  
and welcomes those organisations who may wish to partly sponsor the confe-
rence. 
Relevant enquiries may be sent  to appropriate chairs whose details can be 
found in the above-mentioned IDA-97 web page, or to
                  IDA-97 Administrator 
                  Department of Computer Science
                  Birkbeck College
                  Malet Street
                  London WC1E 7HX, UK
                  E-mail: ida97-enquiry@dcs.bbk.ac.uk
                  Tel: (+44) 171 631 6722
                  Fax: (+44) 171 631 6727
There is also a  moderated IDA-97  discussion list. To subscribe, send the 
word "subscribe" in the message body to:
                  ida97-request@dcs.bbk.ac.uk

Return to Top

Subject: FS: Probability theory book by Stroock
From: Bert Hochwald
Date: Mon, 21 Oct 1996 06:54:14 -0500

Math/Probability book for sale:
Daniel W. Stroock, "Probability Theory: An Analytic View," Cambridge
University Press, 1993.
(From the preface: This book is intended for graduate students
who have a good undergraduate introduction to probability theory, a
reasonably sophisticated introduction to modern analysis, and who now want
to learn what these two topics have to say about each other...)
Excellent condition, hardcover (with dust jacket): $29 shipped
Has my name written inside but is otherwise in new condition.
If you think the price is too high, make an offer.
			Bert Hochwald
                        hochwald@research.bell-labs.com

Return to Top

Subject: Multivariate t distribution
From: Operative 0
Date: Mon, 21 Oct 1996 12:13:37 GMT

Does anyone have any code available for maximum likelihood estimation of
the parameters of multivariate t distributions from sample data?
Dan

Return to Top

Subject: Prony's method, and the arithmetic of distributions
From: Tony Corso
Date: Mon, 21 Oct 96 11:27:37 PDT

In the 1986 SIGAPL Conf. Proceedings Groner and Cook Published �Arithmetic of 
statistical distributions�.
 In it they describe a suite of APL programs that uses �Prony�s method� to 
generate a representation of a set of distributions that they can then 
manipulate arithmetically.
i.e. they can say  something like
Normal (1,3) + Normal ( 1.5,4) 
and get back the result
Normal (2.5, 5)
i.e. the sum of a normal distribution with mean=1 and stddev=3  with another  
independent normal distribution  mean =1.5, stddev 4 is a normal distribution 
with mean=2.5 and stddev=5.
I would very much like to do this sort of thing for LOGNORMAL distributions, 
(with a known covariance ).
The article says that the authors would next work on multivariate 
distributions and lognormals, but I�ve been unable to locate any other work by 
them. They cite  a 1981 article by Hellerman as reference #5, but provide only 
4 references. 
Unfortunately the article is a little vague as to what Prony�s method is, and 
a web search seems to indicate that Prony�s method has something to do with 
signal processing. 
might anybody point me in the right direction, (considering that I�m no EE 
signal processing gearhead)?
TIA
Regards
Tony

Return to Top

Subject: Simulation of population
From: bhat@news.cs.columbia.edu (Dinkar N Bhat)
Date: 21 Oct 1996 13:15:06 -0400

HI,
Can anyone give me pointers on how to simulate populations of bivariate
normal distribution with some given correlation? Help with simulation of other
bivariate distributions (exponential e.g) is also appreciated. -- Thanks
-Dinkar

Return to Top

Subject: Generating Correlated Variables - Help
From: DE ALMEIDA ADELINO
Date: Mon, 21 Oct 1996 10:41:32 -0600

I need help in generating correlated random variables for a simulation
program that I am preparing.
All I've been able to find deals with normal variates and what I need is
a general routine for variables that may not be normal.
Can someone help me with this? I'm not a matematician and soem of the
stuff I've come across is hard to digest.
Thank you for your help
Adelino

Return to Top

Subject: Re: Prony's method, and the arithmetic of distributions
From: Jim Hunter
Date: Mon, 21 Oct 1996 14:19:41 -0400

Tony Corso wrote:
> Unfortunately the article is a little vague as to what Prony�s method is, and
> a web search seems to indicate that Prony�s method has something to do with
> signal processing.
> 
Prony' method is a way of modelling sampled data as a linear combination
of exponentials.
Original reference:
de Prony, Baron (Gaspard Riche), [a very, very long title in French],
J. Ec. Polytech., vol. 1, cahier 22, pp. 24-76, 1795.
If your library doesn't have that try
Digital Spectral Analysis with Applications, S. Lawrence Marple,
Prentice-Hall, 1987.
or just about any other text that has a 
"Spectrum Analysis", or "Spectrum Estimation" in the title.
> might anybody point me in the right direction, (considering that I�m no EE
> signal processing gearhead)?
I should say not. Gearheads are M.E.s
-- Jim

Return to Top

Subject: Re: Question: Dealing with outliers?
From: Rodney Sparapani
Date: Mon, 21 Oct 1996 16:37:48 -0400

CWBern wrote:
> 
> I know there are many methods for dealing with outliers.  My question is
> for the pharmaceutical or med device people.  What is the method (is there
> a standard?) that is acceptable to the FDA when an outlier is throwing off
> the calculations. Specifically, two extreme outliers are causing the
> normality assumption to be violated.  This is in regards to a validation
> study.  If the outliers are not included, the data becomes normal, and
> everything work out nicely.
> How can I justify not including these outliers in the calculations.  ( my
> customer does not want to change his analysis test, so please don't tell
> me of some fantastic non-parametric test).  Please refer to any written
> standards or industry wide accepted techniques.
> 
> thanks.
Well, it depends which part of the FDA is interested and may, in fact,
depend on whom you are working with.  However, there is a guideline to
use Grubbs' method for identifying outliers.  From my experience, it is
highly unlikely that an outlier can be generated by any well-behaved
physical process that will violate Grubbs' test.  We have usually found
that these "outliers" have either been misrecorded or misread from the
physical instruments themselves.  In that case, Grubbs' method is very
helpful in identifying these values.
Frank Grubbs, Procedures for Detecting Outlying Observations in Samples,
Technometrics, Vol. 11, No. 1, February, 1969, pp. 1-21
Rodney

Return to Top

Subject: Error components problem
From: Theodore Sternberg
Date: 21 Oct 1996 23:02:04 GMT

How can I recover the random sequence y from the following?
	z_t = y_t + x_t
where z is known, and all we know about y and x is 
	1) their respective variances,
	2) y is serially uncorrelated,
	3) x is serially correlated,
	4) x is independent of y.
Might as well assume everything is normally distributed.  
***************
Ted Sternberg
San Jose, California, USA

Return to Top

Subject: Std Error and C.I. for smoothed (running weighted) average
From: Rodger Whitlock
Date: 21 Oct 96 16:11:41 PDT

I work with date-related data which is *very* noisy and may have no 
observations on one date and multiple observations on another date.
I have found that an effective strategy for extracting trend information 
is to group the data by uniform periods (usually weeks), and then use a 
binomial smoothing function covering an odd number of time periods to 
compute a running average. This is, I believe, the discrete equivalent to 
using a Gaussian smoothing function.
Each point in the smoothed data is effectively a weighted arithmetic mean 
of single observations. By making sure that the smoothing function is 
normalized to a sum of 1.00, one can also derive an effective number of 
observations contributing to each point. The total number of observations 
contributing to the smoothed results totals to slightly less than the 
actual number input because the smoothing process effectively smears 
observations near the ends beyond the actual date range observed.
The problem is that I am not certainhow to calculate the standard error 
and (say) 95% C.I. for the smoothed (weighted mean) observations. I have 
worked through the calculations arguing by analogy with ordinary 
calculations, and one ends up having non-integral degrees of freedom for 
each point. The results of this calculation look moderately convincing, 
but I would be *much* happier if I had an appopriate reference, or advice 
from someone knowledgable about this.
None of the (relatively elementary) references I have at my disposal 
touch on calculating the statistics of smoothed data nor of weighted 
means.
Replies by email welcome
----
Rodger Whitlock

Return to Top

Subject: 0.5*infinity
From: Bill Shipley and Lyne Labrecque
Date: Mon, 21 Oct 1996 21:34:43 -0400

Consider an X-Y graph (#1) in which both X and Y are REAL variables that can vary from 
-infinity to +infinity.  The area of this graph is therefore infinity*infinity 
(...?...).  Now, restrict Y to be greater than, or equal to, zero.  Is the area of this 
new graph (#2) 1/2infinity*infinity?  What if Y is restricted to the range (1,10).  What 
is the area of this third graph? (10-1)*infinity?  What is the ratio of the area of this 
plane divided by the area of the first plane? Is it:
9*infinity/(infinity*infinity)=9/infinity?
What is the area of graph #3/graph#2?  Is it:
9*infinity/(0.5*infinity*infinity)=18/infinity?
If all of this is true then does it follow that (18/infinity) is twice as large as 
(9/infinity)?
A reference would be appreciated.

Return to Top

Subject: Q: Gambling Theory Result from Bell Labs
From: Alex Castaldo
Date: 22 Oct 1996 02:12:34 GMT

Have you ever seen the following theorem before:
"A gambler takes repeated binomial gambles.  At each step he/she invests
a fraction alpha of their current wealth.  Researchers at Bell Labs have
proved that the optimal strategy in this situation is to set alpha
equal to p - q ".
Is this correct?  Who proved this and where is it published?  Thanks
for any info.

Return to Top

Subject: Question:Decision Level
From: Raj Lincoln
Date: Tue, 22 Oct 1996 12:39:53 -0700

Hi,
	Could someone please suggest on how to tackle the following 
problem?
Q:	The receiver of a digital communication system is designed to 
operate with a BER (bit error rate) of 10^(-10).  The reciver must decide 
whether the observed signal is greater than the decision level and hence 
a digital one (or mark) is present or whether the observed signal is less 
than the decision level and hence a digital zero (or space) is present.  
To operate at a BER of x, the receiver can only make a mistake a fraction 
x of the time.  If the receiver has an rms noise level of 12 mV, and the 
noise is caused by a multitude of factors, find the decision level that 
just permits operation at the specified BER.
Thanks in advance

Return to Top

Subject: Random Phase and Frequency
From: cr40@cityscape.co.uk (Ben Rickman)
Date: Mon, 21 Oct 1996 23:23:08 GMT

I was wondering whether the solution to the following problem is known
If f is uniformly distributed between fmin and fmax and phi is 
uniformly distributed between -pi and +pi is the distribution of 
        cos(2*pi*f*t+phi) 
known?
The random variables f and phi are assumed independent. The variable t
is deterministic (e.g. time)
Any info. gratefully received
Ben Rickman

Return to Top

Subject: Is Logistic a Polya distribution?
From: RVICKSON@MANSCI.uwaterloo.ca (Ray Vickson)
Date: Mon, 21 Oct 1996 19:57:33 GMT

-Is the Logistic distribution known to be Polya of some order r (2<= 
r <=infinity)?
-Related question: does anyone know a "reasonable" expression for the 
characteristic function of the Logistic?
-Thanks
Ray
------------------------------------------------------------------------------
R.G. Vickson
Department of Management Sciences
University of Waterloo
(519) 888-4729

Return to Top

Subject: Good Probability Books?
From: "G. Chan"
Date: Tue, 22 Oct 1996 09:39:43 -0600

HI!  I'm taking a first year statistics course and were doing a lot of
probabilities and I having a hard time learning this material.  So I was
wondering if anyone can refer me to some really good books that can help
me a lot or any computer software that will help me with learning of
probabilities.  Thank you so very much.
G.Chan

Return to Top

Subject: Re: Random Phase and Frequency
From: bvanhof@eecs.wsu.edu (Barry Vanhoff)
Date: Tue, 22 Oct 1996 18:26:45 GMT

In article <845940391.28083.0@ciscr40.demon.co.uk>,
	cr40@cityscape.co.uk (Ben Rickman) writes:
>I was wondering whether the solution to the following problem is known
>
>If f is uniformly distributed between fmin and fmax and phi is 
>uniformly distributed between -pi and +pi is the distribution of 
>
>        cos(2*pi*f*t+phi) 
>
>known?
Correction.  I made an error on my previous post.  Don't flame too hard!
The error was:
u(-pi,pi] + 2*fmid*t*u(-pi,pi] = (1 + 2*fmid*t)*u(-pi,pi]
It looked good when working it out, but you can't do that!  You need to
do a convolution between u(-pi,pi] and 2*fmid*t*u(-pi,pi] to get fx(x),
then substitute ...
Barry

Return to Top

Subject: Re: Random Phase and Frequency
From: bvanhof@eecs.wsu.edu (Barry Vanhoff)
Date: Tue, 22 Oct 1996 18:14:58 GMT

In article <845940391.28083.0@ciscr40.demon.co.uk>,
	cr40@cityscape.co.uk (Ben Rickman) writes:
>I was wondering whether the solution to the following problem is known
>
>If f is uniformly distributed between fmin and fmax and phi is 
>uniformly distributed between -pi and +pi is the distribution of 
>
>        cos(2*pi*f*t+phi) 
>
>known?
Yes.
For the general case of y=cos(x), the solution is in most textbooks as
a function of the distribution of x, ie fx(x).
The problem is to determine the distribution of x.
u(-pi,pi] + 2*fmid*t*u(-pi,pi] = (1 + 2*fmid*t)*u(-pi,pi]
1) find fmid
2) the problem reduces to finding the distribution of g(t)*u(-pi,pi], 
   and inserting it into the solution to y=cos(x).
3) the solution will be a sum over all values of x which will produce
   a solution y.  ie, for t=0, there are two values of x that will
   give the same solution y (plot cos(x) and you'll see why).
Hope this helps.
Barry Vanhoff
bvanhof@eecs.wsu.edu

Return to Top

Subject: Call for Papers
From: Ilya Molchanov
Date: Tue, 22 Oct 1996 19:52:36 +0100

Special Issue on RANDOM SETS in the Journal of PATTERN RECOGNITION
edited by Ilya Molchanov (University of Glasgow)  
	  Edward Dougherty (Texas A&M; University)
Random sets have played a role in image processing and pattern
recognition since the seminal text, RANDOM SETS AND INTEGRAL GEOMETRY,
by George Matheron (1975).  At first there were only a few researchers
using random set theory but recently the number has begun to grow.
Two conferences devoted to random sets occured in 1996, one at the
University of Minnesota and another at the Ecole des Mines in
Fontainebleau.  The purpose of the present special issue is to provide
a forum for current research in both the theory and application of
random sets, and to give a sampling of current trends to a wide
audience.
Potential topical areas include random-set theory, spatial statistics,
image analysis, random geometry, texture analysis, random-set modeling
for pattern recognition, filtering in the context of random sets,
stochastic mathematical morphology, coverage processes, point
processes, random measures, set statistics, and applications in all of
the aforementioned areas.
The final date for manuscript submission is November 1, 1997.  All
manuscripts will be peer reviewed for acceptance.  For consideration,
please submit four (4) copies of the complete manuscript to
	Dr. Ilya Molchanov
        University of Glasgow
        Department of Statistics
        Glasgow G12 8QW
        Scotland, U.K.
------------------------------------------------------------------------
I. Molchanov
University of Glasgow		: e-mail: ilya@stats.gla.ac.uk	
Department of Statistics	: Ph.: + 44 141 339 8855 ext 2116
Glasgow G12 8QW			: Fax: + 44 141 330 4814
Scotland, U.K.			: http://www.stats.gla.ac.uk/~ilya/
------------------------------------------------------------------------

Return to Top

Subject: Genarting Correlated Variables
From: DE ALMEIDA ADELINO
Date: Tue, 22 Oct 1996 13:48:05 -0600

I'm working on a random number generator that is to be used in some
simulation (it is to be coupled with a structural analysis program) and
have run into a problem that I haven't been able to solve or to find a
reference that might help me solve it: the generation of correlated
random variables.
So far I've come across an algorithm to generate correlated normal
variates but wht I need is a genreal procedure that is applicable to the
variates, regardless of their particular distribution... does such a
beast exist? Where can I find it?
Your help is most appreciated.
Adelino

Return to Top

Subject: Use of statistical graphics
From: statvis@bellcore.com (Deborah Swayne)
Date: 22 Oct 1996 21:12:37 GMT

******************************************************************
**********              Questionnaire              ***************
* Your experience with interactive graphical statistics software *
******************************************************************
The purpose of this questionnaire is to gather information
about the status of interactive statistical graphics in 
contemporary practice. 
These are some of the questions that we would like to address using
your responses:  How are statisticians and other data analysts
using interactive, dynamic statistical graphics?  Which methods, if
any, have come into wide use?  Which methods have yet to
demonstrate their usefulness?  What new methods or extensions to
existing methods are frustrating analysts by their absence?
The results of this survey are to appear in a special issue of
Computational Statistics.  We also invite the submission of
papers about applications of interactive statistical graphics.
Please send your responses, questions and other comments to
statvis@bellcore.com.  Paper mail can be sent to 
Deborah Swayne
Bellcore
445 South Street
MCC-1A316B
Morristown, NJ 07960-6438
USA
1:  Describe your data.  (If you would like to comment on
more than one data problem, describe as many sets of data
as you like.  Label your examples however you like, and then you
can refer back to them when you comment on the methods.)
If your data are proprietary, please say as much about them as
you find appropriate.
What is the subject of the data?
What is the structure of the data?  
How many cases, variables?  
What kind of variables -- nominal, ordinal, metric-continuous,
  metric-discrete?
Are there missing values?
2: What interactive graphics software did you use, or try to use?
On what computing platform(s) -- a UNIX workstation, a PC running 
MS Windows, PC running OS/2, PC running LINUX, other?
DataDesk
Lisp Stat interactive graphics
S Plus interactive graphics
SAS Insight or JMP
XGobi
XploRe
Voyager
other ...
3: For the following methods ...
  Which did you try?  Did you find it reasonably easy to use?
  Was it useful?  If not, why not?  If so, what did it tell you 
  about your data?  (Be as terse or as verbose as you like.)
Interactive methods
  Brushing, linked brushing
    Between like plots: boxplot, scatterplot, barchart ...
    Between unlike plots: boxplot to scatterplot, scatterplot to
      barchart ...
  Scaling
  Identification, linked identification
Subsetting of the data set
Viewing multiple plots in rapid succession
  One-variable plots
  Two-variable plots
  Higher-d plots
Parallel coordinate plots
Rotation
Grand tour
  Manipulation of the grand tour direction
  Interactive projection pursuit
    Which indices?
Did you print out many plots?
4: What methods or software do you expect to use in the future?  Are
there any that you plan never to use again?  (This is a good place
for appreciative testimonials or expressions of frustration or disgust.)
5: What would you have liked to do that you couldn't find a way to do?
6: If you could publish an account of your experiences with
interactive graphical software for data analysis, would you like 
to do so?
7: About you:
Where do you work?  (check all that apply)
 in industry
 for a university
 for yourself
 other 
What is your field of work or study?
  Agriculture
  Biology
  Chemistry
  Economics
  Engineering
  Statistics
  other
Did you receive this questionnaire
 on the s-news mailing list?
 on usenet?  which group(s)?
 by direct email?
 other
8: Any other comments?
Thank you very much for your participation.  We hope that gaining
some answers to these questions will help guide future research and
development of graphics for data analysis.
Again, send email to statvis@bellcore.com, and paper mail to
Deborah Swayne at the address given above.
Deborah Swayne     Sigbert Klinke
Bellcore           Humboldt-University of Berlin

Return to Top

Subject: Re: Random Phase and Frequency
From: billt@egret0.stanford.edu (Bill Tompkins)
Date: 22 Oct 1996 21:28:53 GMT

Ben Rickman writes:
>I was wondering whether the solution to the following problem is known
>
>If f is uniformly distributed between fmin and fmax and phi is 
>uniformly distributed between -pi and +pi is the distribution of 
>
>        cos(2*pi*f*t+phi) 
>
>known?
>
>The random variables f and phi are assumed independent. The variable t
>is deterministic (e.g. time)
Assuming that you are picking one time and measuring this function, then
the arbitrary frequency does not matter: you are equally likely to pick
any given phase of the cosine.  Given y=cos(x), the probability of getting
a certain value of y for uniformly distributed x is given by the derivative
of the inverse.  Thus P(y) = 1/(1-y^2)^(1/2), -1<=y<=1, 0 otherwise.
If you are picking a bunch of times, each measurement will be distributed
this way, but they will not be independent (not iid).
-Bill
billt@leland.stanford.edu

Return to Top

Subject: Re: Genarting Correlated Variables
From: radford@cs.toronto.edu (Radford Neal)
Date: 22 Oct 96 23:28:58 GMT

In article <326D24F5.41C67EA6@bechtel.Colordao.edu>,
DE ALMEIDA ADELINO   wrote:
>[I] have run into a problem that I haven't been able to solve or to find a
>reference that might help me solve it: the generation of correlated
>random variables.
>
>So far I've come across an algorithm to generate correlated normal
>variates but wht I need is a genreal procedure that is applicable to the
>variates, regardless of their particular distribution... does such a
>beast exist? Where can I find it?
No such beast exists, because the problem is not well defined.  In
general, two random variables with given distributions can have a
given degree of correlation in many ways.  The normal distribution is
an exceptional case, in which specifying the marginal normal
distributions and the degree of correlation suffices to determine the
joint distribution.
To see this, consider the simplest case:  Generating pairs of values
for X and Y in which both X and Y have the uniform(-1,1) distribution, 
with zero correlation between X and Y.  One way of achieving this is to
make X and Y independent, generating uniformly from the square in the
(x,y) plane with corners at (-1,-1) and (1,1).  Another, very different,
way is to generate uniformly from the union of two line segments, one
going from (-1,-1) to (1,1) and the other from (-1,1) to (1,-1).
   Radford Neal
----------------------------------------------------------------------------
Radford M. Neal                                       radford@cs.utoronto.ca
Dept. of Statistics and Dept. of Computer Science radford@utstat.utoronto.ca
University of Toronto                     http://www.cs.utoronto.ca/~radford
----------------------------------------------------------------------------

Return to Top

Subject: Re: Generating Correlated Variables
From: radford@cs.toronto.edu (Radford Neal)
Date: 22 Oct 96 23:34:29 GMT

In article <326D24F5.41C67EA6@bechtel.Colordao.edu>,
DE ALMEIDA ADELINO   wrote:
>[I] have run into a problem that I haven't been able to solve or to find a
>reference that might help me solve it: the generation of correlated
>random variables.
>
>So far I've come across an algorithm to generate correlated normal
>variates but wht I need is a genreal procedure that is applicable to the
>variates, regardless of their particular distribution... does such a
>beast exist? Where can I find it?
No such beast exists, because the problem is not well defined.  In
general, two random variables with given distributions can have a
given degree of correlation in many ways.  The normal distribution is
an exceptional case, in which specifying the marginal normal
distributions and the degree of correlation suffices to determine the
joint distribution.
To see this, consider the simplest case:  Generating pairs of values
for X and Y in which both X and Y have the uniform(-1,1) distribution, 
with zero correlation between X and Y.  One way of achieving this is to
make X and Y independent, generating uniformly from the square in the
(x,y) plane with corners at (-1,-1) and (1,1).  Another, very different,
way is to generate uniformly from the union of two line segments, one
going from (-1,-1) to (1,1) and the other from (-1,1) to (1,-1).
   Radford Neal
----------------------------------------------------------------------------
Radford M. Neal                                       radford@cs.utoronto.ca
Dept. of Statistics and Dept. of Computer Science radford@utstat.utoronto.ca
University of Toronto                     http://www.cs.utoronto.ca/~radford
----------------------------------------------------------------------------

Return to Top

Subject: Q: How to linearize a nonlinear stochastic ODE?
From: Rodney Beard
Date: 23 Oct 1996 02:31:44 GMT

Hi, I have a problem I need to linearize a nonlinear stochastic ODE of Ito
type?
Does anyone have any ideas? It's actually a system of coupled ODE's with state
dependent diffusion. Can I just Taylor expand it?
Rodney Beard
University of Queensland

Return to Top

Subject: fitting function without independent variable
From: mjy@cns.nyu.edu (Mark Young)
Date: 22 Oct 1996 23:39:17 GMT

I am looking for a treatment of the problem
of curve fitting in the case where there is
NO INDEPENDENT variable. For example, you
have a sensor system that measures space and
time (all subject to errors) and you need to fit
a curve to the data.
I initially dealt with this problem for the case
of fitting lines. As usual, the solution is derived
from minimizing the sum of the squared distances of
the data from a putative line. Points on the line
are indexed by a free parameter. In linear regression
this "parameter" is the independent variable and is
KNOWN. In this case - there is no independent variable
and there is an additional initial step to the
solution in which the free parameter must, in effect,
be determined by finding the line passing through the
data point that is perpendicular to the putatively fitted
line. So the error components are much messier than
for regression calculations. The 2D case is simple
and the result is the so-called "standard deviation
line" (in between the x-on-y and y-on-x regression lines).
The 3D case yields horrendous equations and it is not clear
to me that they boil down to the s.d.-line.
But now I want to fit cubic splines - just calculating the components of the error function (distance from curve to data-point, squared) will require finding the appropriate
root of a 6-degree polynomial. That method is going nowhere!
There is a simpler way, right?
Cheers,
Mark
p.s. Please communicate by e-mail. Thanks in advance for any help.

Return to Top

Subject: Re: 0.5*infinity
From: Woody
Date: Tue, 22 Oct 1996 21:28:34 -0400

Bill Shipley and Lyne Labrecque wrote:
> 
> Consider an X-Y graph (#1) in which both X and Y are REAL variables that can vary from
> -infinity to +infinity.  The area of this graph is therefore infinity*infinity
> (...?...).  Now, restrict Y to be greater than, or equal to, zero.  Is the area of this
> new graph (#2) 1/2infinity*infinity?  What if Y is restricted to the range (1,10).  What
> is the area of this third graph? (10-1)*infinity?  What is the ratio of the area of this
> plane divided by the area of the first plane? Is it:
> 9*infinity/(infinity*infinity)=9/infinity?
> What is the area of graph #3/graph#2?  Is it:
> 9*infinity/(0.5*infinity*infinity)=18/infinity?
> If all of this is true then does it follow that (18/infinity) is twice as large as
> (9/infinity)?
> A reference would be appreciated.
Any multiples of infinity are also equal to infinity. These values of
infinity cannot be compared. Comparing values of infinity only result in
undefined results.
Michael Woodall
Mathematics Teacher, Montreal
-- 
Woody!

Return to Top

Subject: Royal Statistical Society Certificate
From: hlynka@uwindsor.ca (Myron Hlynka)
Date: Tue, 22 Oct 1996 17:08:53 GMT

A student from a country in the Caribbean has applied for admission
for the undergraduate program here at the U. of Windsor. He has
submitted a document indicating that he has received a Royal
Statistical Society (London) Higher Certificate, which apparently
can be obtained by passing a set of exams. The topics are fairly
standard - statistical analysis, inference, nonparametrics, etc.
Presumably these are at a lower undergraduate level.
Can anyone give a more precise indication of the level of this
certificate? Is it considered equivalent to [a] course[s] in
any university?
Or, can anyone give me a contact so I can find out (preferably
am e-mail contact)?
    Thanks, Myron Hlynka, 
            Dept. of Math & Stat., 
            U. of Windsor,
            Windsor, Ontario, Canada
-- 
                   Myron Hlynka
                   Dept. of Math. & Stat.
                   University of Windsor
                   Windsor, Ontario, Canada

Return to Top

Subject: Faculty Positions Available
From: btguan@ccms.ntu.edu.tw ()
Date: 23 Oct 1996 04:40:49 GMT

Subject: Faculty Positions Annoucement
The Department of Forestry, National Taiwan University, is
seeking applicants with solid academic training for two positions 
at lecturer (assistant professor) level in the following two fields:
(1)  Environmental Monitoring and Planning 
(2)  Resources Inventory
MINIMUM QUALIFICATIONS:
1.  Ph.D., and at least one degree (BS, MS, Ph.D.) obtained
    from a forest resources related program.
2.  fluency in spoken and written Chinese.
APPLICATIONS:
Applications should include: (1) a curriculum vitae,
(2) transcripts (undergraduate and graduate), (3) a copy of
the applicant's Ph.D. dissertation, (4) copies of published
research papers, (5) two letters of recommendation and (6) an
indication of which field the applicant is applying.
The closing date is December 12, 1996.
CONTACT:
Professor Ya-Nan Wang (Chairwoman)
Department of Forestry, National Taiwan University
#1, Section 4, Roosevelt Rd., Taipei, Taiwan 106, R.O.C.
Phone: +886-2-3633352, Fax: +886-2-3654520
E-mail: m627@ccms.ntu.edu.tw

Return to Top

Subject: covariance
From: Jordi Riu
Date: Wed, 23 Oct 1996 10:16:45 +0100

Hi everybody,
I have a problem and I was wondering if anybody could help me. I have
three variables (a, b, c), and I know the variances and covariances
between them (i.e. var(a), var(b), var(c), covar(a,b), covar(a,c), covar
(b,c)). I need to create a new variable d=b+c, and by the theory of
error propagation I can find var(d), but I need to know cov(a,d) also
and I don't know how to find it.
Any help or reference will be apreciated.
-- 
  Jordi Riu Rusell            		tel.:   34-(9)-77-558187
  Departament de Quimica      		fax.:   34-(9)-77-559563
  Universitat Rovira i Virgili		e-mail: rusell@quimica.urv.es
  Pl. Imp. Tarraco, 1  43005-Tarragona
  Catalonia - Spain

Return to Top

Subject: Looking for a better estimator for a simple expectation
From: Hein Hundal
Date: Tue, 22 Oct 1996 11:43:38 -0700

Hello all,
   Once again I am seeking some knowledge and/or a reference to confirm
a rumor.  I am conducting an experiment with the following possible
outcomes: {-11, -9, -7, -5, ..., 9, 11} (odd numbers between -12 and
12.)  I will perform the experiment approximately 5000 times.  The
result of any one experiment does not affect the results of any later
experiments.  The goal is to find the expectation.  I am rather certain
the expectation is between -0.2 and +0.2, and the standard deviation is
about 3.5.  I have always been under the impression that the best
estimator for the expectation was the average of a sample for this type
of experiment.  With 5000 trials I would expect the average to be
accurate to within 2 * 3.5 / Sqrt(5000) <= 0.1, 95% of the time.  About
a month ago I sat in on a job interview.  During the interview the
applicant mentioned that there was a better estimator available for this
kind of experiment.  I didn't believe him at the time, but then he
mentioned a paper that he had written on the subject, so I took his word
for it.  I really wish I had written down the paper's title because a
better estimator would be very useful for me.
So the questions is "Is there a better estimate for the expectation than
the average?"  If so, a reference would be appreciated.  As you might
guess from my post, I don't have a background in statistics, but I can
read mathematical papers, so a technical reference is fine.    
Thanks for any help.
Hein Hundal

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.math 11359

Directory

Articles