Newsgroup sci.stat.consult 21260

Articles

Subject: need advice/software for incomplete unbalanced block design
From: serross@sci.shizuoka.ac.jp (Robert M. Ross)
Date: 28 Nov 1996 08:02:15 GMT

I have data that I wish to analyze using an unbalanced incomplete block
design, to test for differences among treatments.  I would greatly
appreciate information on what software exists for this sort of analysis. 
Briefly:
There are 5 blocks and 5 treatments. There are no replicates for most of
the cells. 
I would like to use a nonparametric technique if possible.  This might
include the Durbin Test or some sort of permutation/resampling technique. 
Thanks very much for any advice you can share. 
Rob Ross
Institute of Life and Earth Sciences
Shizuoka University
836 Oya
Shizuoka 422
JAPAN 
serross@sci.shizuoka.ac.jp

Return to Top

Subject: Re: Question about GG distribution
From: Greg Heath
Date: Thu, 28 Nov 1996 07:41:47 -0500

On Sat, 23 Nov 1996, Igor Kozintsev wrote:
> (sorry for this latex code)
If you are really sorry, why not provide an ASCII translation?
Greg
Gregory E. Heath     heath@ll.mit.edu      The views expressed here are
M.I.T. Lincoln Lab   (617) 981-2815        not necessarily shared by 
Lexington, MA        (617) 981-0908(FAX)   M.I.T./LL or its sponsors
02173-9185, USA

Return to Top

Subject: Hourly rate for consulting in stats
From: "Michael A. Kadar"
Date: Thu, 28 Nov 1996 10:32:29 -0800

Greetings to all (and Happy Thanksgiving!),
I hope this isn't too much of a distraction from the technical
issues addressed in this forum, but I need some information about
consulting rates.
I've recently joined an organization that provides consulting
services in engineering and process improvement. Up until my
hire, the organization was somewhat lacking in formal statistical
methods. My role has been to provide internal and external
consulting and training in applied statistics. The external 
customer is invariably an automotive manufacturer or one of its 
higher level suppliers.
Can anyone tell me what an appropriate hourly rate might be for
these services?  It seems to me that our rate is a bit low, but I 
really don't know. I'd be interested in hearing as many opinions 
on this as possible.
Thanks in advance,
M.

Return to Top

Subject: USA-NJ-Northern-Statistical Consultant
From: priresumes@trilogycnslt.com (Employment)
Date: 28 Nov 1996 08:23:59 -0800

Responsibilities Include:  Measurement and analysis on integrated drug,
medical, and self report data, and patient profile databases to assess the
effectiveness of clinical programs (including Seniors', disease management
and DUR).  The incumbent will coordinate across departments ensuring that
established outcomes and analytic goals are met.   The areas to be managed
include data management, statistical anallysis and project reporting.  The
consultant should be able to execute statistical programming and generate
standard and ad-hoc reports.   Qualifications:  Must have significant
experience programming SAS, a Master's Degree in statistics, or a related
mathematical / epidemiological/ computer field.  It is preferred that the
incumbent have 3-5 years post-graduate  work experience and proficiency
with data.  A detail oriented and organized individual who can work
independently and on multiple projects simultaneously is preferred.     
Ref-3652
Our regional offices located in Waukegan, IL, Durham NC, Palo Alto, CA,
and Princeton, NJ provide off-site and on-site services to clients in over
30 states. Our Clinical Trial Management group assists Research and
Development organizations in the monitoring and quality assurance of
clinical trial studies. Or System Professionals function in all aspects of
the application development life cycle, manage data, and develop reports.
In Research Statistics, we design statistical experiments, evaluate the
results for the areas of Life Science, Finance, Marketing, Economics, and
Engineering.
Please reply to: priresumes@trilogycnslt.com
Trilogy Consulting Corporation
101 Carnegie Center, Suite 211
Princeton, NJ  08540
http://www.trilogycnslt.com/TCC_Home
fax: 609.520.0730

Return to Top

Subject: Wanted - Old Software - Bass Statistical System
From: voss2@beast.Trenton.EDU (Ann Voss)
Date: 28 Nov 1996 18:58:30 GMT

The Bass Statistical System was developed as a SAS clone in the 1980s. I 
am interested in aquiring a (legitimate) copy of this.Preferably both 
programs and manuals.     Henry Voss, 11 Llanfair Lane, Ewing NJ 
08618-1011 phone (609)882-2612

Return to Top

Subject: Non-paired t as conservative?
From: "Michael Kleiman (SOC)"
Date: Thu, 28 Nov 1996 13:11:11 -0500

My question relates to using the t-test for the difference between means
for uncorrelated data when one actually has correlated data, as a way of
being conservative.  I realize this is iffy on methodological grounds but
am trying to understand the issue statistically and would appreciate help. 
An examination of the formula for the standard error of the difference
between 2 means shows that when data are correlated, the size of the
standard error is reduced.  How much it is reduced depends on the size of
the correlation coefficient.  Since the standard error of the difference
between means is the denomenator of the t-test, it follows that a
researcher who performs the test using the formula for uncorrelated data
when he actually has correlated data is using an unnecesarily stringent
test.  My question is:  *how* unecessarily stringent; that is, given a
particular set of data, how do I calculate how extra stringent I've been?
Here's an example (you'll recognize it from my previous posting): a runner
tests shoes A and shoes B using correct methodology, does a t-test for the
difference between means using the formula for uncorrelated data, and
concludes there is a statistically significant difference between the
shoes at p=.05.  According to Downie and Heath 1970, p. 73, in the section
headed "Differences Between Means--Correlated Data", "We say that data are
correlated when they consist of 2 sets of measurements on the same
individuals..."  Thus the runner is a correlated t situation, and the
standard error (the denomenator of the t-test) calculated under the
uncorrelated t-test formula will be reduced depending on the size of the
correlation.  Is this correlation 1, since it's the same runner?  I don't
think that's right, because I think maybe then the standard error becomes
0!  If it's not 1, then how do I calculate it from my data?  I believe I
have the correct formula for quantifying "extra stringentness"; my problem
is I don't know how to get the correlation coefficient required by that
formula.  When data are "correlated" in a correlated t-test scenario, how
does one quantify *how* correlated they are?  Any help much appreciated.

Return to Top

Subject: Re: CART software for survival data?
From: thomas lumley
Date: Thu, 28 Nov 1996 10:51:47 -0800

On Wed, 27 Nov 1996, Ronan M Conroy wrote:
> Does anyone know of software that performs tree-structured survival
> analysis or logistic regression?
There is code in the statlib S archive (called survcart) for survival 
analysis. S-PLUS also does binary tree-structured regression. 
thomas lumley
UW biostatistics

Return to Top

Subject: How to calculate average
From: JUNJIA@morst.govt.nz
Date: Fri, 29 Nov 1996 10:18:17 -0800

I have a data set containing 20 countries' data from 1980 to 1990. I like 
to calculate average value among 20 countries in each year from 1980 to 
1990.
My problem is that in some years, several countries' data are missing. 
When I make calculation, average values in these years seem not 
consistent with those of years without missing data. Especially, if 
important countries' data are missing, average value will be affected a 
lot. I am trying to use existing data to estimate missing value in some 
years. But time series are too short to be convinced. 
Any help and suggestions will be greatly appreciated.
Junjia

Return to Top

Subject: Lognormal PDF, mean and standard deviation
From: hrkman@msn.com (H. K.)
Date: 28 Nov 96 23:59:18 -0800

Are the following two equations correct?	
Expected Value = exp (mu^2+.5*sigma^2)
Std. Dev.= (Expected Value ^2 *(exp(sigma^2)-1))^.5
I generated 2000 psuedo-random numbers from a lognormal distribution 
with parameters mu=2 and sigma=3, and noticed the above two formulae 
give values much larger than the simple avg and standard deviation of 
the actual sample. Isn't the sample average a point estimate of the 
expected value of lognormal pdf, and likewise sample standard 
deviation that of lognormal pdf?.

Return to Top

Subject: Re: Are the two regression lines too many?
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 28 Nov 1996 21:20:29 -0500

In article ,
Mike  wrote:
>Andrew Kukla started a thread:
>> >>The formula for the regression line of Y on X is Y=b*X+a, where
>> >>"b" is a slop and "a" is an intercept parameter. However, if I
>> >>calculate the regression of X on Y (X=c*Y+d), then the regression
>> >>line will differ from the previous substantially. I understand
>> >>that different direction of deviations causes a different result
>> >>of calculation of the least squares.
>> >>1: If this is a mathematical phenomenon then shouldn't we use
>> >>some other formula that could "average" these two regression
>> >>lines and have only one?
I suggest that you decide what you want it for, and then act accordingly.
If you want to predict Y given X, the regression of Y on X is appropriate.
If you wnat to predict X given Y, use the regression of X on Y.
If you have a structural model, and the deviation from the line is due
to an "error" which is uncorrelated with X, use the regression of Y on X.
If you have a structural model, and both variables are subject to error,
consult someone who understands the problem.  NO simple regression procedure
of any kind is appropriate.  If the systematic part is normal, the problem
is not identified, and there is no way of finding what should be wanted,
unless one has additional information.
>With a response from Hans-Peter Peipho
>> >You could do a principal component analysis and fit the data cloud by the
>> >first component.
>The principal component idea is a good one, in that it minimizes sum of
>squared residuals perpendicular to its major axis, and is the same "line"
>without needing to decide which variable is "X" or "independent" and which
>is "Y" or "dependent".
This is an atrocious idea.  It is dependent on the scales, and normalizing
the variances introduces errors all over the place.  If the appropriate
problem is being considered, selection or scaling becomes irrelevant, but
this is not the case for principal components.
Do not try to find religious mantras.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Re: Non-paired t as conservative?
From: hrubin@b.stat.purdue.edu (Herman Rubin)
Date: 28 Nov 1996 21:26:10 -0500

In article ,
Michael Kleiman (SOC)  wrote:
>My question relates to using the t-test for the difference between means
>for uncorrelated data when one actually has correlated data, as a way of
>being conservative. 
This may or may not be conservative; if the correlation is negative, it
iw the other way.
Also, using paired data eliminates the problem of unequal variances,
and is generally less sensitive to normality.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
hrubin@stat.purdue.edu	 Phone: (317)494-6054	FAX: (317)494-0558

Return to Top

Subject: Markov chains
From: Brian Bull
Date: Fri, 29 Nov 1996 15:17:04 +1200

It's said that in a Markov chain
P(X_n | X_n-1, X_n-2,...) = P(X_n | X_n-1)
Is this _always_ the case? can one use Markov methods 
 when it does not apply?

Return to Top

Subject: Thanks ! re: my question about calculating the chi-square function
From: Rafael Santos
Date: 26 Nov 1996 12:36:09 +0900

Many many thanks for all those who answered my question. Since I am
unable to get the books some recommended, I used Excel to create a
table of values in the range I want and inserted it into my
program. It worked fine.
thanks again,
Rafael.

Return to Top

Subject: Re: Analyzing nominal and ordinal data...
From: bellour@upso.ucl.ac.be (F. Bellour)
Date: Fri, 29 Nov 1996 13:29:35 +0100

In article <572cut$bg5@usenet.srv.cis.pitt.edu>, wpilib+@pitt.edu (Richard
F Ulrich) wrote:
> 1) Q: Is there a  *reason*  for ever preferring log-linear evaluation 
> of a contingency table over the Pearson test? or a reference?
> 
> 2) A: Siegel does provide rank-order tests, etc. 
> 
> 3) Q: Since most cluster algorithms assume a meaningful DISTANCE
> metric, rather than just ORDINAL variables, is Lorr(1983) special? and
> how does this address the original question?  (If the intention here
> was to advise, `if you have a bunch of related variables, then you 
> should try to create a composite score'... then I like the intention.)
> 
Dear M. Ulrich,
   I am glad to read your comments because you might correct my thinking
which may well be faulty. So, I am going to answer to your two questions,
but not with a defense tone. I would just like to have a constructive
discussion on the topic.
(1) As I have read it many times, log-linear analyses are a mean for
grasping effects over all the cells of a model and spot those cells which
are responsible for the effects. When you have a let's say 3 by 4 tables,
the classical tests only allow you either to test the discrepancy between
observed and expected for the whole table at once or to test whether to
cells are different from each other and then test this again but for two
other cells, and so on until you have done the whole table. But doing so
many tests increases the error rate without controlling for it. That's why
log-linear are more adequate. 
What do you think? Is this a good reason for using log-linear models?
(2) I mentioned Lorr as a quick reference. He distinguishes within
distance measures (a) metric measures which are suitable for interval
scales and (b) non metric measures which are suitable for ordinal scale.
My advise was that the choice of the distance measure depends on the type
of scale as Lorr explains. Afterwards, the data can obviously undergo a
cluster analysis in order to reduce the information they provide. What do
you think? Is there any problem with such an advise?
I hope you don't mind helping me to get a better understanding. Thank you
for your previous comments anyway.
Best thoughts,
F.Bellour
-- 
F.Bellour
PhD Student
U.C.L. Belgium
E-mail: bellour@upso.ucl.ac.be
Phone office: 00-32-10-478640

Return to Top

Subject: Re: advice needed on generating simulated data sets
From: Jeremy Miles
Date: Fri, 29 Nov 1996 12:02:38 +0000

David Nichols wrote:
> 
> In article <56tft4$3oba@news.doit.wisc.edu>,
> Dr Mike  wrote:
> >One of our grad students wishes to generate some randomly selected data
> >arranged like those for a 1-way repeated-measures design.  She would
> >like to
> >be able to specify
> >(i) mean and standard deviation for the normally-distributed population
> >for each variable and (ii) the correlations between the variables
> >and then have some software package do the generation of some samples.
> >We have software that conveniently generates individual-variable data
> >sets
> >assuming a normal distribution but not any that allows for generation
> >of
> >dependent samples.
> >
> >Does anyone know of a package that they would recommend for generating
> >such multivariate samples?
> >
> >Any email responses will be greatly appreciated.
> >
> >- Mike Hogan < mehogan@facstaff.wisc.edu >
> >       Sr. Database Admin, CWC Project
> >       U of Wisconsin Psych Dept, Madison, WI
> >
> 
> You can do this with just about any software package. To do all aspects
> of it, you'd need one that could do a Cholesky decomposition of a matrix
> for you. Postmultiplying a data matrix created as pseudo-random normal
> deviates by the upper triangular Cholesky decomposition of the correlation
> matrix desired will produce the desired intercorrelations (if the variables
> are perfectly uncorrelated to start with) or will produce correlations
> as if the variables were created from a population with the desired
> intercorrelations.
> 
Myself and Mark Shevlin have written a paper, presented at the Computers
in Psychology  '96 in York, UK, which presents the SPSS code to do
this.  Those friendly people at the CTI Psychology Centre have put it on
teh web at:
http://www.york.ac.uk/inst/ctipsych/web/CiP96CD/MILES/XHTML/PAPER.HTM

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.consult 21260

Directory

Articles