Newsgroup sci.stat.consult 22081

Articles

Subject: Re: signs of eigenvector
From: "John C. Nash"
Date: Fri, 17 Jan 1997 15:48:52 -0500

The signs of eigenvector elements can be inverted without
changing validity.
      A x  =  ev * x
defines the eigenvalue ev and eigenvector x.
Clearly
     A (-x) = ev * (-x)
is also a solution.
Different algorithms make different choices.
More tricky is the case where there are multiple eigenvectors for
a single eigenvalue. For a symmetric matrix A, we can then form
an infinite number of sets of equivalent eigenvector solutions
by appropriate linear combinations of the initial eigenvectors
(we want to maintain orthogonality).
A consulting firm once balked at paying me modest consulting fees for a
problem such as this and flailed around for 2 weeks (several programmers
at work!) when converting a program from a Univac to an IBM 360. To their
horror they finally learned that the program was working fine on both
machines, but the rounding in the arithmetic was enough different to
change the output eigenvectors considerably. One of the programmers did
the learning by simply asking me if there was a problem and I took pity
on him, since the suits were starting to get nasty.
JN
John C. Nash, Professor of Management, Faculty of Administration,
University of Ottawa, 136 Jean-Jacques Lussier Private,
P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada
email: jcnash@uottawa.ca, voice mail: 613 562 5800 X 4796
fax 613 562 5164,  Web URL = http://macnash.admin.uottawa.ca

Return to Top

Subject: Re: 4 dim normal integral
From: John Uebersax <71302.2362@COMPUSERVE.COM>
Date: Fri, 17 Jan 1997 10:05:40 EST

While on the subject, does anyone know of a polynomial approximation
for a two-dimensional normal itegral?
Also, for an example of multidimensional normal integration by
Gaussian quadrature, with apparently suitable accuracy, see
the article by Bock & Gibbons in one of the latest Biometrics issues.
--
John Uebersax
Flagstaff, AZ
71302.2362@compuserve.com

Return to Top

Subject: STARTERS'BOOK: DATABASES
From: robert.shaw@welcom.gen.nz (Robert Shaw)
Date: Fri, 17 Jan 1997 20:41:21 GMT

 * Crossposted from: 0
        "DESIGN DATABASES AND DRIVE MICROSOFT ACCESS"
Last year I taught a night school class which deal with
database design and the use of Microsoft Access. 
The students were mainly business people wanting to make
practical use of either Access 2.0 or Access for Windows
95. Some students wanted to be able to manage hobbies or 
sports organisations. 
Their first problem was to design their database, then
they wanted it in action as quickly as possible. Existing
textbooks were expensive, did not deal with design, and
contained more than the basic essential information.
The notes I wrote as courses proceeded have
been compiled into a learning guide for those who want to
build a database quickly.
Should you be interested in a copy of the learning guide 
"Design Databases and Drive Microsoft Access" 
the cost is US $29.95, which includes package and postage. 
Just send a cheque or credit card number, expiry date and name. 
Do not forget your name and address for postal delivery.
My address is:
      Robert Shaw
      49 Sea Vista Drive
      Pukerua Bay
      Porirua City
      NEW ZEALAND
Comments on the usefulness of the book are most welcome.
===========================================================
===========================================================
___ Blue Wave/QWK v2.20 [NR]
___ Blue Wave/QWK v2.20 [NR]

Return to Top

Subject: New User
From: "M. Stojanovic"
Date: Fri, 17 Jan 1997 11:22:19 -0800

Thank you for your fast answer.  I like to ask somebody what will be the
best software for data entry,cleanning the errors and coding, for not to
large survey.  Right know I have SPSS 7.5 . I can do the data entry with
this software, but what I should use for clleaning the errors and for
the rest.
Best regards,
Mirjana stojanovic

Return to Top

Subject: HELP: First crossing of non-stationary random process
From: Henrik Hansen
Date: Fri, 17 Jan 1997 15:33:52 +0100

Given the stochastic process
Y(k)=Y(k-1)+d+w(k)		(1)
where d is a constant gain and w(k) is a normal process with mean m and
variance v. The objective is to determine the ditribution of the
stochastic variable, X, defined as
X={k for which Y(k)>=t and Y(k-i)
Return to Top
Subject: Re: STAT-L Digest - 15 Jan 1997 to 16 Jan 1997
From: Franz-Josef Mueter 
Date: Fri, 17 Jan 1997 15:15:03 -0900

> ANNE KNOX   wrote:
>
> I am using an abundance cover scale (Braun-Blanquet) and have taken the
> midpoints of the percent cover classes (3, 15.5, 38, 63, 88).  I was trying to
> examine the effect of several independant variables
> on cover.  A regression didn't work, however, since the residuals were not
 even
> close to being normally distributed (an effect of the discrete ratios I was
> told).  The best advice that I have received so
> far has been to either use Spearman's rank correlation coefficient or
 bootstrap
> analysis.  Is that correct, and does anyone have any additional advice for me?
>
The arcsine transformation is a useful transformation for percentages.
Percentages follow a binomial distribution and can be made nearly normal
if the square root of each proportion is transformed to its arcsine (or
inverse sine or sin^-1).
                p' = arcsin(sqrt(p))
The transformation is not very good at the extreme ends of the data (near
0 and 100%). A discussion of the arcsine transformation can be found in:
Zar, J.H. 1984. "Biostratistical Analysis". Prentice Hall.

Return to Top
Subject: matrixing and bracketing
From: carl-gustav.johansson@ferring.se
Date: Fri, 17 Jan 1997 14:02:46 -0800

Hallo,
I would like to have information about matrixing and bracketing in stability studies for 
pharmaceuticals. Does anyone know how to make a layout design and how to proceed when 
values are outside the spec. limits.
Sincerely,
Carl-Gustav Johansson

Return to Top
Subject: POLYCHORIC MATRIX
From: Robert Flynn Corwyn 
Date: Fri, 17 Jan 1997 22:09:12 -0600

Could someone please explain how to use a polychoric matrix in CALIS?
The programing language is what we need.
Thanks
Robert Flynn Corwyn

Return to Top
Subject: Re: Combining Neural/Fuzzy Models with Statistical Models
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Date: Sat, 18 Jan 1997 00:57:53 GMT

In article <32DA5910.2AA2@colorado.edu>, Robert Dodier  writes:
|> Andrew Gray wrote:
|> 
|> >     I'm working on combining neural networks and fuzzy logic models
|> > with statistical techniques (regression and data reduction) for
|> > software metrics (for example, predicting development time based on
|> > the type and size of system).  While there has been a lot of work on
|> > neural-fuzzy, neural-genetic, fuzzy-genetic, etc. type systems I've
|> > only ever found a small number of researchers using AI/statistical
|> > techniques (presumably at least partially an indication of how few AI
|> > researchers follow the statistical side of things, and vice versa).
|> 
|> At the risk of reviving age-old threads about fuzzy logic vs.
|> probability, let me advise you not to bother with fuzzy logic.
|> There are two parts to fuzzy logic, one defensible and the other
|> not. The ``fuzzy'' part is one solution to the problem of
|> representing uncertain knowledge -- this is the defensible part.
|> The ``logic'' part is an attempt to reason with uncertain knowledge 
|> -- this is an undefensible hack.
The trouble with fuzzy logic as usually presented is that you can't
reason about two uncertain propositions without knowing how the
uncertainties are related--it's like trying to work with marginal
probability distributions without knowing the joint distribution.
But some fuzzy logicians have noticed this problem and developed
fuzzy logics involving "correlations" between fuzzy propositions.
See the post included below.
|> ... As discussed at length in
|> Warren Sarle's neural networks FAQ (sorry, I don't have pointer)
|> it's useful to consider neural networks as extensions of or variations
|> on conventional regression and classification schemes.
ftp://ftp.sas.com/pub/neural/FAQ.html
______________________________________________________________________
From: William Siler 
Newsgroups: comp.ai.fuzzy
Subject: New Fuzzy Logic
Date: Fri, 20 Sep 96 20:25:09 -0500
Organization: Delphi (info@delphi.com email, 800-695-4005 voice)
Lines: 107
Message-ID: 
Reposting article removed by rogue canceller.
DIGEST OF BUCKLEY, JJ AND SILER, W: A NEW T-NORM. SUBMITTED TO
FUZZY SETS AND SYSTEMS, 1996.
Fuzzy systems theory has been criticized for not obeying all
the laws of classical set theory and classical logic. The
t-norm and t-conorm here presented obey all the laws of the
corresponding classical theory. A somewhat similar theory has
been proposed by Thomas (1994), except that he does not claim
that the distributive property is maintained.
We first propose a source of fuzziness. We suppose that the
truth value > 0 and < 1 of a fuzzy logical statement A is drawn
from a number of underlying (probably implicit) correlated
random variables whose values alpha[i] are binary, i.e. 0 or 1
with a Bernoulli distribution, and that the truth value of A is
a simple average of these binary values. (George Klir (1994)
proposed a similar process where the random values are binary
opinions of experts as to truth or falsehood of a statement.)
If this is so, then
a = truth(A) = sum(alpha[i] / n
b = truth(B) = sum(beta[i] / n)
r = correlation coefficient(alpha[i], beta[i])
aANDb = truth value(A AND B)
aORb = truth value(A OR B)
sa = standard deviation(alpha) = sqrt(p(alpha)*(1 - p(alpha))
sb = standard deviation(beta) = sqrt(p(beta)*(1 - p(beta))
aANDb = a*b + r*sa*sb
aORb = a + b - a*b - r*sa*sb
rmax = (min(a,b) - a*b) / (sa*sb)
for r = rmax, min(a,b) = a*b + r*sa*sb
rmin =  (a*b - max(a+b-1, 0)) / (sa*sb)
for r = rmin, max(a+b-1, 0) = a*b + rmin*sa*sb
Proofs of the following theorems are in the appendices of our
paper.
Theorem 1:
  1. maxr = ru, ru <= 1
  2. minr = rl, rl >= -1
  3. rl <= r <= ru
Theorem 2:
  1. aANDb = a*b + r*sa*sb = a*b + cov(a,b)
  2. aORb = a + b - a*b - r*sa*sb = a + b - a*b - cov(a,b)
Theorem 3:
  1. If r = ru, aANDb = min(a,b) and aORb = max(a,b)
  2. If r = 0, aANDb = a*b and aORb = a+b-ab
  3. If r = rl, aANDb = max(a+b-1, 0) and aORb = max(a+b, 1)
We now suppose that this basic process is inaccessible to us,
but that we do have a history of a number of instances of the
truths of statement A and statement B. Now, given a value of r,
the correlation coefficient between a, the truth values of A,
and b, the truth values of B, the t-norm and t-conorm
appropriate to this history, T (t-norm) and C (t-conorm) are
defined for [a, b] on S, a restricted subset of [0,1]x[0,1].
Theorem 4.
  (The 5 parts of this theorem define the subset S of
[0,1]x[0,1]
  possible for r = 1, 0 < r < 1, r = 0, -1 < r < 0 and r = -1.)
  Given a value of r, it may be that not all (a,b) combinations
  are possible; e.g. a = .25, b = .75 is not possible for r = 1
  in the binary process described above.)
Theorem 5.
  1. (Shows that for 0 < r < 1 and (a,b) in S,
    ab < T(a,b) <= min(a,b).)
  2. (Shows that for -1 < r < 0 and (a,b) in S,
    max(a+b-1) <= T(A,B) < ab)
  3. (Shows that for -1 <= r <= 1 and (a,b) in S,
    max(a+b-1, 0) <= T(a,b) <= min(a,b) and
    max(a,b) <= C(a,b) <= min(a+b, 1).
Theorem 6. Shows that T is a t-norm and C is a t-conorm on S.
Theorem 7.
  1. A AND A = A, r is 1.
  2. A OR 0 = 0, any r.
  3. A OR X = A, any r.
  4. A AND NOT-A = 0, r is -1.
  5. A OR A = A, r is 1.
  6. A OR X = A, any r.
  7. A OR 0 = A, any r.
  8. A OR NOT-A = X, r is -1.
  9. NOT-(A AND B) = NOT-A OR NOT-B, any r.
  10. NOT-(A AND B) = NOT-A AND NOT-B, any r.
  11. A OR (A AND B) = A, any appropriate r.
  12. A AND (A OR B) = A, any appropriate r.
  13. A AND (B OR C) = (A AND B) OR (A AND C), any appropriate
r.
  14. A OR (B AND C) = (A OR B) AND (A OR C), any appropriate
r.
References:
Klir, GJ (1994). Multivalued logics vesus model logics:
alternate frameworks for uncertainty modelling. In: Advances in
Fuzzy Theory and Technology, Vol II: 3-47. Duke University
Press, Durham, NC.
Thomas, SF (1994). Fuzzy Logic and Probability. ACG Press,
Wichita, KS.
______________________________________________________________________
-- 
Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
 *** Do not send me unsolicited commercial or political email! ***

Return to Top
Subject: CART Workshop in Chicago
From: Patrick Fleury 
Date: Fri, 17 Jan 1997 21:04:58 -0800

***Final Notice***
The Chicago Chapter of the American Statistical Association is happy to
announce that it will present a half-day workshop on 
                CART 
(Classification and Regression Trees) 
            conducted by 
          Dr. Dan Steinberg
           Salford Systems 
CART methodology concerns the use of tree structured algorithms to
classify data into discrete classes.  The terminology was invented by
Breiman, et. al., in the early 1980's.  The technique has found uses in
both medical and market research statistics.  For example, one tree
structured classifier uses blood pressure, age and sinus tachycardia to
classify heart patients as either high risk or not.  Another might use
age related variables and other demographics to decide who should appear
on a mailing list.  There will be other examples from both fields
discussed in the workshop.  
Dr. Steinberg is President of Salford Systems of San Diego, California. 
He holds a Ph.D. degree in Economics from Harvard and has held positions
at both the University of California at San Diego and San Diego State.
He is the leader of the team that ported the original version of CART to
the PC.  He is well known for his previous work on statistical methods
in economics and especially for his work on logistic regression. 
Time: 1:00 PM - 5:00 PM
Registration: 12:30 PM - 1:00 PM
Date January 31, 1997
Place:  The University of Illinois at Chicago
        College of Nursing
        Third Floor Lounge 
        845 South Damen Avenue
 Admission: Members of the Chicago Chapter ASA:  $80.00
           Non Members of the Chicago Chapter: $92.00
           Student Members: $40.00
           Student Non-Members: $46.00 
           (The difference between Student and Regular 			  	   
admission is subsidized by the Lucile Derrick Fund)
Advance Registration is encouraged because there will be hand outs and
we would like to have an idea of how many to make.  Registrations at the
door will be accepted as space permits.  
We regret that we cannot accept credit cards. Payment is to be by cash
or check only.  Payment may be sent in early or paid at registration. 
Please make checks payable to "ASA-Chicago".
Directions: 
The UIC School of Nursing is just north of the corner of Damen and
Taylor in Chicago.
To get to the School of Nursing, take the Eisenhower Expressway (either
east or west) and get off at Damen.  Proceed to 845 South Damen three
blocks south of the expressway.
Parking is available in parking structure D1 at 1100 South Wood.  The
parking structure is at the corner of Taylor and Wood two blocks east of
Damen.
For more information please e-mail pfleury@mcs.com
or send mail to:
CART Workshop
opNUMERICS
Suite 4A
151 N. Kenilworth
Oak Park, IL 60301

Return to Top
Subject: Correlations involving ratios
From: chris@agri.upm.edu.my
Date: Fri, 17 Jan 1997 21:41:06 -0600

	I have another "puzzle" that's been on my mind. It is concerning correlating a  ratio with another variable. To illustrate my point, consider this simple example (by the way, the correlations are actual not hypothetical). Let's have two indepedent variables: soil carbon (C), and soil nitrogen (N). I'm interested to know what's their linear relationship with another soil property, that is soil stability (S). Correlations (r) with S are as follow:
r between C & S    =  .63**
r between N & S    =  .00
r between C:N &S;  =  .51*
Here, the C:N is a ratio obtained by dividing C with N. As shown, the r
between C:N and S is 0.51*. Something peculiar happens when I start to
play around with the data. Now, suppose I were to change the scale of C
by adding the value 10 to every value in the variable C; that is, tC = C
+ 10. Using this new variable tC, I now form a new ratio between carbon
and nitrogen, that is tC:N. Correlations with S are now as
follow:
r between C  & S     =  .63**
r between tC & S     =  .63** (ok)
r between N  & S     =  .00
r between C:N  & S  =  .51*
r between tC:N & S  =  .07  (!!!)
As shown, changing the scale of carbon did not affect its r with S --
this is correctly so. If you were to correlate tC with C you would obtain
a perfect 1.00 relationship. In simple regression, changing the scale of
variable only shifts the line upwards or downwards - the slope remains
equal.
	However, the r between the tC:N with S is clearly different from
the r between C:N and S! If I were to use tC to form the ratio between
carbon and nitrogen, my interpretation of results would be different - I
would conclude that carbon:nitrogen ratio is insignificantly related to
S. But had I used C, I would instead conclude that carbon:nitrogen ratio
has a significant relationship! Why is this so???? What's happening
here???
	Note: this pecularity also happens with other data, so there is
no data entry or calculation error
here.
	Anyone?
P.S. By the way, in soil science using such ratios is quite common. This
is why this is quite shock
finding.
-------------------==== Posted via Deja News ====-----------------------
      http://www.dejanews.com/     Search, Read, Post to Usenet

Return to Top
Subject: Re: Underdispersion causes
From: Bob Wheeler 
Date: Fri, 17 Jan 1997 21:36:14 -0500

Benjamin Chan wrote:
> 
> I'm curious as to possible causes for underdispersion in generalized
> linear models.  I'm fitting a Poisson model to some count data and the
> model I'm fitting looks underdispersed.
> 
> --
> 
> +-------------------------------------------------+
> |  Benjamin Chan, M.S., Assistant Statistician I  |
> |  UC Davis Medical Center, Primary Care Center   |
> |  2221 Stockton Blvd., Room 3107                 |
> |  Sacramento CA  95817                           |
> |  Voice = 916-734-7004;  Fax = 916-734-2732      |
> +-------------------------------------------------+
Dependence among the observations is a common problem.
	Bob Wheeler, ECHIP, Inc.

Return to Top
Subject: Re: 4 dim normal integral
From: orourke@utstat.toronto.edu (Keith O'Rourke)
Date: Tue, 14 Jan 1997 18:19:13 GMT

Any recent text on numerical integration should suffice.
My favourite is 
Sloan IH, Joe S. Lattice methods for multiple intergration.
Oxford, Clarendon Press. 1994
but you may wish to look for available software for expendience. 
I believe Splus has a "sub-group adaptive" multiple quadrature 
routine that may handle your 4 dimensional problem.
Keith O'Rourke
The Toronto Hosp.

Return to Top
Subject: Re: Introduction to non-parametric stats.
From: "Vassili P. Leonov" 
Date: Sat, 18 Jan 97 15:31:30 +0700

> Date:    Tue, 14 Jan 1997 16:05:06 -0700
> From:    DAN.HUSTON@ASU.EDU
> Subject: introduction to non-parametric stats
> There are 2 messages totalling 80 lines in this issue.
>
> Topics of the day:
>
>   1. introduction to non-parametric stats
>   2. Test for trend - Help!
>
> ----------------------------------------------------------------------
>
>
> Hello Folks-
>
> I am looking for an introductary book on non-parametric stats.  Any
> suggestions?
>
> Thanks-
>
> Dan Huston                                              Phone (602) 965-2420
> Measurement, Statistics and Methodological Studies      Fax   (602) 965-0300
> Psychology in Education                                   dan.huston@asu.edu
> 325 Payne Hall                           http://seamonkey.ed.asu.edu/~huston
> Arizona State University
> Tempe, AZ 85287-0611
>
> ------------------------------
>
   I recommend you the following good books:
1. Richard P. Runion
        Nonparametric Statistics. A contemporary Approach
        1977 by Addison-Wesley Publishing Company, Inc., Reading,
        Massachusetts USA
2. Hollander M., Wolfe D.A. (1973). Nonparametric Statistical Methods.
        New York: Wiely
3. Lehmann E.L. (1975). Nonparametrics: Statistical Methods Based on
        Ranks.  San Francisko: Holden-Day.
4. Noether G.E. (1967).  Elements of Nonparametric Statistics,
        New York: Wiley.
5. Walsh J.E. (1962) Handbook jof Nonparametric Statistics, vol.1, V
        an Nostrand, Princeton, N.J.
6. Handbook of Applicable Mathematics. Chief Editor: Walter Ledermann.
        Volume VI: Statistics. Part A & B.
        A Wiley-Interscience Publication JOHN WILEY & SONS
        Chchester-New York-Brisbane-Toronto-Singapore (1984).
  I wish successes!
--- 
    Centre of Applied Statistics Stat-Point
    Vassili P. Leonov
    E-mail: point@statleo.tomsk.su

Return to Top
Subject: Re: Getting means and ... proc GLM
From: "Vassili P. Leonov" 
Date: Sat, 18 Jan 97 15:29:09 +0700

Date: Tue, 7 Jan 1997 15:54:04 EST
From: Tim Benner 
Subject: Getting means and sd from proc GLM
Tim Benner wrote:
>    I do alot of within and between subject analysis using the GLM procedure
> and I'd like to have Sas print the means and standard deviations in the GLM
> output.  I'd like to get a mean and sd for each of my dependant variables.
> Is there a way of getting proc GLM to print these out, without me having to
> do a seperate proc means?
> Tim Benner
> Dept of Kinesiology
> Penn State University
For a conclusion average and MSE it is possible to use in PROC GLM
Options MEANS, SCHEFFE and CLDIFF
Example SAS of the program for this purpose of a case:
  PROC GLM DATA = A1;
     CLASS B;
        MODEL X1-X5 = B;
         MEANS X1-X5 / SCHEFFE CLDIFF;
  RUN;
I wish successes in study of a perfect
statistical package SAS!.
--- 
    Centre of Applied Statistics Stat-Point
    Vassili P. Leonov
    E-mail: point@statleo.tomsk.su

Return to Top
Subject: Performance Index for Covering Designs, Wheels
From: bm373592@muenchen.org (Uenal Mutlu)
Date: Sat, 18 Jan 1997 19:37:35 GMT

(I've moved this thread to a meaningful subject and also included sci.stat.consult)
Here are some relevant excerpts from the previous postings. The task 
is twofold: creating wheels (covering designs) which cover the most of 
the possible cases in less blocks as possible (ie. a combinatorial 
optimization problem), and a formula which allows the comparison of 
designs with different nbr of blocks etc. (ie. a practical performance 
index for such designs):
#####
>>A few years ago, I wrote a program to improve partial wheels.  I have
>>ran this program overnight with the improved wheel and it has improved
>>it further to 44.45%  And it's not done yet...
>>
>I think 2 conditions have to be fulfilled to find an optimal solution for 
>that problem:
>a) all tickets should be as different as possible (any two tickets should
>share as less as possible numbers)
>b) each number should appear as often as any other number
>
>Here is a solution based on these conditions:
>1 9 17 25 33 41
>2 10 18 26 34 42
>3 11 19 27 35 43
>4 12 20 28 36 44
>5 13 21 29 37 45
>6 14 22 30 38 46
>7 15 23 31 39 47
>8 16 24 32 40 48
>1 16 23 30 37 44
>2 9 24 31 38 45
>3 10 17 32 39 46
>4 11 18 25 40 47
>5 12 19 26 33 48
>6 13 20 27 34 41
>7 14 21 28 35 42
>8 15 22 29 36 43
>1 15 21 27 38 48
>2 16 22 28 39 41
>3 9 23 29 40 42
>4 10 24 30 33 43
>5 11 17 31 34 44
>6 12 18 32 35 45
>7 13 19 25 36 46
>8 14 20 26 37 47
>1 14 19 32 34 49
>2 15 20 25 35 49
>3 16 21 26 36 49
>4 9 22 27 37 49
>5 10 23 28 38 49
>6 11 24 29 39 49
>7 12 17 30 40 49
>8 13 18 31 33 49  
>
>Using the first 27 tickets your percentage for at least a 3 win is 
>44.383%
>Using all 32 tickets percentage is at 50.83%
>
>I would be interested to see the 44.45% wheel.
#####
>>Using the first 27 tickets your percentage for at least a 3 win is 
>>44.383%  Using all 32 tickets percentage is at 50.83%
>
>We should use a general formula for comparing of such wheels with 
>differing nbr of tickets: 
>
>  WCD = Percent / Tickets
>
>where 
> - WCD = "Wheel Covering Degree". The higher, the better. 
> - Percent is in the range 0.0 to 100.0 for minWin (ie. for 
>   "at least x-win")
> - Tickets is the nbr of single tickets the wheel consists of
>
>Warning: 
>  it can be used only for the same type of wheels (ie. k must 
>  be equal), and percent should be "for at least minWin"
>
>Example for minWin = 3+ (ie. for at least 3):
>
> Percent  Tickets -> WCD
> ---------------------------
>  44.383    27       1.64381 
>  50.83     32       1.58844 
>  70.0      54       1.29630           
> 100.0     168       0.59524 
>
>The higher the WCD, the better the wheel. It simply says 
>"each ticket covers WCD points of the Percent value"
#####
>I had a similar discussion on this with Normand and currently in 1st
>place is John Rawson with 44.43945% (27 lines - 230,160 combs covered
>per line avg = 1.6459%). 
>
>But if one comes up with 28 lines would it be fair to expect them to
>have at least 1.6459% per line ?
#####
>>But if one comes up with 28 lines would it be fair to expect them to
>>have at least 1.6459% per line ?
>
>The WCD formula already covers this case. You're mixing up the 
>different kinds of percent values and your calculation doesn't 
>take into account the nbr of tickets, but the WCD formula does. 
>
>(Your 1.6459% is simply the result of 230160 / 13983816 * 100
>ie. simple percent calculation, but the actual number of tickets 
>isn't included anywhere, so this won't help in case of differing
>nbr of tickets)
>
>Cf. for example the first 2 entries in the table above, then you 
>should see that the formula indeed allows the comparison of your 
>example of 28 tickets too, but one needs both the winPercent (or the 
>number of winning combinations to get the winPercent) and the nTickets 
>(here 28 and 27) values of both wheels for the WCD formula. 
>
>And, your percent value (1,6459%) has nothing todo with the WCD value 
>(ie. they are very different things; the WCD is NOT a percent value, 
>it simply is a number). Both can't be compared together. Apple and Orange :-)
>
>Anyone know of a better formula for comparing of such wheels of similar 
>type but with different nbr of tickets? 
#####
>>I think 2 conditions have to be fulfilled to find an optimal solution for 
>>that problem:
>>a) all tickets should be as different as possible (any two tickets should
>>share as less as possible numbers)
>>b) each number should appear as often as any other number
>
>Right.  But, it's not that simple.  If we keep adding tickets, at one
>point it will become possible to cover more combinations by having
>more redundancy between tickets.  As an example of this, when the 174
>ticket cover was discovered for the Lotto 6/49 it had 18 tickets that
>had 5 identical numbers, yet we couldn't see how to improve it further
>with tickets that had less redundancy.
>
>Is 27 tickets below or above that point?
>
>Also, if you randomly generate several sets of 27 tickets that have no
>duplicating pairs of numbers you will notice that they do not cover
>the same number of combinations!  So, what is the best cover possible
>using only such tickets?
>
>This may or may not be the cover we are looking for.
#####
>>Also, if you randomly generate several sets of 27 tickets that have no
>>duplicating pairs of numbers you will notice that they do not cover
>>the same number of combinations!  So, what is the best cover possible
>>using only such tickets?
>>
>>This may or may not be the cover we are looking for.
>
>It is the right direction to go I would say. Of course the best is, 
>and always has been, designing wheels which have the highest number 
>of winning combinations in less tickets as possible. If we take this 
>as the only major criteria then we end up with this kind of wheels, 
>which IMHO are much better, and also cheaper, for the player than the 
>other 100% guaranteed wheels. And, IMHO exactly here should be the 
>true difference between wheels and covering designs.
>
>But the designing of such wheels isn't easy either (this maybe because
>it's a relatively new field for me; no simple combine formula and so on... :-)
>But it's a good challenge for wheel designers, group and design theoretists.
>
>So, for this type of wheels we need a different way to measure their 
>strength. In an other posting I introduced the WCD formula, which seems 
>suitable for this task; ie. 
>
>  WCD = winPercent / nTickets
>
>where
>  winPercent = NbrOfWinningCombinations / C(v,k) * 100
>  nTickets   = nbr of tickets the wheel has
>
>(One also should always state, besides the other usual params of the 
>wheel, the minWin all these values are for. Ie. "for 3 or more winning 
>numbers" etc.)
>
>The WCD is a number only (ie. no percent value). The higher the WCD, 
>the better the wheel. It should be used for comparison of 2 or more 
>wheels of same type with possibly differing number of tickets.
>
>(cf. also the other postings on this which also include some examples)
#####
>|> The WCD is a number only (ie. no percent value). The higher the WCD, 
>|> the better the wheel. It should be used for comparison of 2 or more 
>|> wheels of same type with possibly differing number of tickets.
>|> 
>|> (cf. also the other postings on this which also include some examples)
>|> 
>
>  I'm not sure it's that easy to compare wheels of differing number of
>tickets by this method.  Let's take a very simple example to make this
>clear.  Consider a 6/14 lottery with the winning criteria being a 
>3+ match.
>  The number of combinations is 3003 and the best wheels I can think of
>are, for 1 to 4 tickets:
>
>
>    Number of tickets = 1:   Numbers  1,  2,  3,  4,  5,  6
>
>                                       1589   100
>    Coverage is 1589, Therefore WCD =  ---- * ---  =  52.91 %
>                                       3003    1
>
>
>    Number of tickets = 2:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>
>                                       2778   100
>    Coverage is 2778, Therefore WCD =  ---- * ---  =  46.25 %
>                                       3003    2
>
>    Number of tickets = 3:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>                                      1,  2,  3,  4, 13, 14
>
>                                       2988   100
>    Coverage is 2988, Therefore WCD =  ---- * ---  =  33.17 %
>                                       3003    3
> 
>    Number of tickets = 4:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>                                      1,  2,  3,  4, 13, 14
>                                      5,  6,  7,  8, 13, 14
>
>                                       3003   100
>    Coverage is 3003, Therefore WCD =  ---- * ---  =  25.00 %
>                                       3003    4
> 
>  Which would you say is the best wheel, and how does your 
>judgement compare to the WCD rating?
>
>  The point I'm trying to make is that the WCD is always going to go
>down with increasing number of tickets, because you can't avoid
>coverage redundancy as you introduce more tickets.  The same applies
>to the various wheels in 6/49.  Consider again a 'wheel' of 1 ticket.
>Its coverage is 260524 out of 13983816, giving a WCD of 1.863%.  This
maybe a typo: should be 260624; it is the maxCover value, ie. the
nbr of combinations 1 block ideally should cover.
I'm currently working on an new method, but the relations for the 
given examples remain similar I fear. The advantage of the new formula
is that it gives a true p-value (ie. percent value if multiplied by 100).
>is to be compared with the best current 27 ticket wheel with a
>coverage of about 44.5%, ie a WCD of 1.648%.  Then the 168 wheel has
>a WCD of just 100/168 or 0.595%.  Again, does the WCD give a good
>guide to which is the best wheel?  Essentially the WCD will _always_
>give the best result for a wheel of 1 ticket, and will favour wheels
>of lower number of tickets.  Now I'm not a great expert or user of
>wheels, but I'm fairly sure that most people who are keen on wheels
>do not consider a wheel of one ticket to be a good wheel!
>
>  So what I'm really trying to say is that I don't think the WCD
>is a good means to compare two wheels of different numbers of tickets..
>However, I can't say I've got any better suggestion.
#####
>Hi Stephen and all the others,
>
>I've moved the discussion about the Performance Index for Wheels (WCD etc.) 
>to a new subject called "Performance Index for Covering Designs, Wheels".
>The first posting under the new subject contains IMHO all the relevant 
>excerpts from the previous postings. 
Ie. it's exactly this posting :-)
(BTW, the old subject was "Dimitris Challenge" in r.g.l.)

Return to Top
Subject: Performance Index for Covering Designs, Wheels
From: bm373592@muenchen.org (Uenal Mutlu)
Date: Sat, 18 Jan 1997 19:37:35 GMT

(I've moved this thread to a meaningful subject and also included sci.stat.consult)
Here are some relevant excerpts from the previous postings. The task 
is twofold: creating wheels (covering designs) which cover the most of 
the possible cases in less blocks as possible (ie. a combinatorial 
optimization problem), and a formula which allows the comparison of 
designs with different nbr of blocks etc. (ie. a practical performance 
index for such designs):
#####
>>A few years ago, I wrote a program to improve partial wheels.  I have
>>ran this program overnight with the improved wheel and it has improved
>>it further to 44.45%  And it's not done yet...
>>
>I think 2 conditions have to be fulfilled to find an optimal solution for 
>that problem:
>a) all tickets should be as different as possible (any two tickets should
>share as less as possible numbers)
>b) each number should appear as often as any other number
>
>Here is a solution based on these conditions:
>1 9 17 25 33 41
>2 10 18 26 34 42
>3 11 19 27 35 43
>4 12 20 28 36 44
>5 13 21 29 37 45
>6 14 22 30 38 46
>7 15 23 31 39 47
>8 16 24 32 40 48
>1 16 23 30 37 44
>2 9 24 31 38 45
>3 10 17 32 39 46
>4 11 18 25 40 47
>5 12 19 26 33 48
>6 13 20 27 34 41
>7 14 21 28 35 42
>8 15 22 29 36 43
>1 15 21 27 38 48
>2 16 22 28 39 41
>3 9 23 29 40 42
>4 10 24 30 33 43
>5 11 17 31 34 44
>6 12 18 32 35 45
>7 13 19 25 36 46
>8 14 20 26 37 47
>1 14 19 32 34 49
>2 15 20 25 35 49
>3 16 21 26 36 49
>4 9 22 27 37 49
>5 10 23 28 38 49
>6 11 24 29 39 49
>7 12 17 30 40 49
>8 13 18 31 33 49  
>
>Using the first 27 tickets your percentage for at least a 3 win is 
>44.383%
>Using all 32 tickets percentage is at 50.83%
>
>I would be interested to see the 44.45% wheel.
#####
>>Using the first 27 tickets your percentage for at least a 3 win is 
>>44.383%  Using all 32 tickets percentage is at 50.83%
>
>We should use a general formula for comparing of such wheels with 
>differing nbr of tickets: 
>
>  WCD = Percent / Tickets
>
>where 
> - WCD = "Wheel Covering Degree". The higher, the better. 
> - Percent is in the range 0.0 to 100.0 for minWin (ie. for 
>   "at least x-win")
> - Tickets is the nbr of single tickets the wheel consists of
>
>Warning: 
>  it can be used only for the same type of wheels (ie. k must 
>  be equal), and percent should be "for at least minWin"
>
>Example for minWin = 3+ (ie. for at least 3):
>
> Percent  Tickets -> WCD
> ---------------------------
>  44.383    27       1.64381 
>  50.83     32       1.58844 
>  70.0      54       1.29630           
> 100.0     168       0.59524 
>
>The higher the WCD, the better the wheel. It simply says 
>"each ticket covers WCD points of the Percent value"
#####
>I had a similar discussion on this with Normand and currently in 1st
>place is John Rawson with 44.43945% (27 lines - 230,160 combs covered
>per line avg = 1.6459%). 
>
>But if one comes up with 28 lines would it be fair to expect them to
>have at least 1.6459% per line ?
#####
>>But if one comes up with 28 lines would it be fair to expect them to
>>have at least 1.6459% per line ?
>
>The WCD formula already covers this case. You're mixing up the 
>different kinds of percent values and your calculation doesn't 
>take into account the nbr of tickets, but the WCD formula does. 
>
>(Your 1.6459% is simply the result of 230160 / 13983816 * 100
>ie. simple percent calculation, but the actual number of tickets 
>isn't included anywhere, so this won't help in case of differing
>nbr of tickets)
>
>Cf. for example the first 2 entries in the table above, then you 
>should see that the formula indeed allows the comparison of your 
>example of 28 tickets too, but one needs both the winPercent (or the 
>number of winning combinations to get the winPercent) and the nTickets 
>(here 28 and 27) values of both wheels for the WCD formula. 
>
>And, your percent value (1,6459%) has nothing todo with the WCD value 
>(ie. they are very different things; the WCD is NOT a percent value, 
>it simply is a number). Both can't be compared together. Apple and Orange :-)
>
>Anyone know of a better formula for comparing of such wheels of similar 
>type but with different nbr of tickets? 
#####
>>I think 2 conditions have to be fulfilled to find an optimal solution for 
>>that problem:
>>a) all tickets should be as different as possible (any two tickets should
>>share as less as possible numbers)
>>b) each number should appear as often as any other number
>
>Right.  But, it's not that simple.  If we keep adding tickets, at one
>point it will become possible to cover more combinations by having
>more redundancy between tickets.  As an example of this, when the 174
>ticket cover was discovered for the Lotto 6/49 it had 18 tickets that
>had 5 identical numbers, yet we couldn't see how to improve it further
>with tickets that had less redundancy.
>
>Is 27 tickets below or above that point?
>
>Also, if you randomly generate several sets of 27 tickets that have no
>duplicating pairs of numbers you will notice that they do not cover
>the same number of combinations!  So, what is the best cover possible
>using only such tickets?
>
>This may or may not be the cover we are looking for.
#####
>>Also, if you randomly generate several sets of 27 tickets that have no
>>duplicating pairs of numbers you will notice that they do not cover
>>the same number of combinations!  So, what is the best cover possible
>>using only such tickets?
>>
>>This may or may not be the cover we are looking for.
>
>It is the right direction to go I would say. Of course the best is, 
>and always has been, designing wheels which have the highest number 
>of winning combinations in less tickets as possible. If we take this 
>as the only major criteria then we end up with this kind of wheels, 
>which IMHO are much better, and also cheaper, for the player than the 
>other 100% guaranteed wheels. And, IMHO exactly here should be the 
>true difference between wheels and covering designs.
>
>But the designing of such wheels isn't easy either (this maybe because
>it's a relatively new field for me; no simple combine formula and so on... :-)
>But it's a good challenge for wheel designers, group and design theoretists.
>
>So, for this type of wheels we need a different way to measure their 
>strength. In an other posting I introduced the WCD formula, which seems 
>suitable for this task; ie. 
>
>  WCD = winPercent / nTickets
>
>where
>  winPercent = NbrOfWinningCombinations / C(v,k) * 100
>  nTickets   = nbr of tickets the wheel has
>
>(One also should always state, besides the other usual params of the 
>wheel, the minWin all these values are for. Ie. "for 3 or more winning 
>numbers" etc.)
>
>The WCD is a number only (ie. no percent value). The higher the WCD, 
>the better the wheel. It should be used for comparison of 2 or more 
>wheels of same type with possibly differing number of tickets.
>
>(cf. also the other postings on this which also include some examples)
#####
>|> The WCD is a number only (ie. no percent value). The higher the WCD, 
>|> the better the wheel. It should be used for comparison of 2 or more 
>|> wheels of same type with possibly differing number of tickets.
>|> 
>|> (cf. also the other postings on this which also include some examples)
>|> 
>
>  I'm not sure it's that easy to compare wheels of differing number of
>tickets by this method.  Let's take a very simple example to make this
>clear.  Consider a 6/14 lottery with the winning criteria being a 
>3+ match.
>  The number of combinations is 3003 and the best wheels I can think of
>are, for 1 to 4 tickets:
>
>
>    Number of tickets = 1:   Numbers  1,  2,  3,  4,  5,  6
>
>                                       1589   100
>    Coverage is 1589, Therefore WCD =  ---- * ---  =  52.91 %
>                                       3003    1
>
>
>    Number of tickets = 2:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>
>                                       2778   100
>    Coverage is 2778, Therefore WCD =  ---- * ---  =  46.25 %
>                                       3003    2
>
>    Number of tickets = 3:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>                                      1,  2,  3,  4, 13, 14
>
>                                       2988   100
>    Coverage is 2988, Therefore WCD =  ---- * ---  =  33.17 %
>                                       3003    3
> 
>    Number of tickets = 4:   Numbers  1,  2,  3,  4,  5,  6
>                                      7,  8,  9, 10, 11, 12
>                                      1,  2,  3,  4, 13, 14
>                                      5,  6,  7,  8, 13, 14
>
>                                       3003   100
>    Coverage is 3003, Therefore WCD =  ---- * ---  =  25.00 %
>                                       3003    4
> 
>  Which would you say is the best wheel, and how does your 
>judgement compare to the WCD rating?
>
>  The point I'm trying to make is that the WCD is always going to go
>down with increasing number of tickets, because you can't avoid
>coverage redundancy as you introduce more tickets.  The same applies
>to the various wheels in 6/49.  Consider again a 'wheel' of 1 ticket.
>Its coverage is 260524 out of 13983816, giving a WCD of 1.863%.  This
maybe a typo: should be 260624; it is the maxCover value, ie. the
nbr of combinations 1 block ideally should cover.
I'm currently working on an new method, but the relations for the 
given examples remain similar I fear. The advantage of the new formula
is that it gives a true p-value (ie. percent value if multiplied by 100).
>is to be compared with the best current 27 ticket wheel with a
>coverage of about 44.5%, ie a WCD of 1.648%.  Then the 168 wheel has
>a WCD of just 100/168 or 0.595%.  Again, does the WCD give a good
>guide to which is the best wheel?  Essentially the WCD will _always_
>give the best result for a wheel of 1 ticket, and will favour wheels
>of lower number of tickets.  Now I'm not a great expert or user of
>wheels, but I'm fairly sure that most people who are keen on wheels
>do not consider a wheel of one ticket to be a good wheel!
>
>  So what I'm really trying to say is that I don't think the WCD
>is a good means to compare two wheels of different numbers of tickets..
>However, I can't say I've got any better suggestion.
#####
>Hi Stephen and all the others,
>
>I've moved the discussion about the Performance Index for Wheels (WCD etc.) 
>to a new subject called "Performance Index for Covering Designs, Wheels".
>The first posting under the new subject contains IMHO all the relevant 
>excerpts from the previous postings. 
Ie. it's exactly this posting :-)
(BTW, the old subject was "Dimitris Challenge" in r.g.l.)

Return to Top
Subject: Re: PC & CCA
From: Pablo del Monte Luna 
Date: Sat, 18 Jan 1997 12:41:40 -0800

Richard F Ulrich wrote:
> If you used ALL of the principal components from each analysis, then
> the Canonical Correlation could have been run on all the inital
> variables, and it would have accounted for the same variance.  Is
> that reasonable?  "80% of the variability" is enough redundancy
> that I would be cautious about  a) overfitting, if there were not a
> large number of cases per variable;  or  b) correlated error, if
> your data sources are not independent.
> 
> Maybe that will help you consider what you have, but your question
> was not specific enough for me to say much more.
> 
> Rich Ulrich, biostatistician                wpilib+@pitt.edu
> http://www.pitt.edu/~wpilib/index.html   Univ. of Pittsburgh
One one hand, I have 8 climatic indices (20 years of monthly standarized anomalies, say 
N=192) that represents the oceanic and atmospheric conditions in the Tropical Pacific.
On the other, I have 6 areas (extratropics) each subdivided in 6 2X2 boxes . Each 
subdivision is an historical record of sea surface temperature, sea level pressure and 
scalar wind, separately, in form of anomalies too. The variables are naturally related.
From the 8 original indices, Principal Components Analysis defined 3 factors accounting 
89% of the original variability; and for the 6 series of each box (for each variable) 
the first three modes explain 80% of the variance 90% of the time, only 4 modes the last 
10%.
Then, I made Canonical Correlation Analysis between the factor scores of the indices 
with every set of factor scores of each area (for each variable) to obtain 9 canonical 
R's per variable.
I apologize for my poor vocabulary and syntax.
In advance, thak you very much for your time and help.
	Pablo del Monte
	Marine Biologist
	Fisheries Group
	Center for Biological Research
	La Paz, B.C.S., Mexico
	http://www.cibnor.mx
	delmonte@cibnor.mx

Return to Top
Subject: CFP: 1997 Fall Technical Conference
From: "Randall D. Tobias" 
Date: Thu, 16 Jan 1997 16:02:47 GMT

                             41st Annual
                      Fall Technical Conference
                                 1997
                           Call for Papers
                "Mining Data for Quality Improvement"
                       Omni Inner Harbor Hotel
                         Baltimore, Maryland
                         October 16-17, 1997
Co-sponsored by:
   American Society for Quality Control
      - Chemical and Process Industries Division
      - Statistics Division
   American Statistical Association
      - Section on Physical and Engineering Sciences
Applied and expository papers are  needed  for  parallel  sessions  in
Statistics, Quality Control, and Tutorial / Case Study.
Detailed submission instructions are available on the Web
   http://www.sas.com/ftc97/
or you can request them from one  of  the  following  members  of  the
program committee:
   Susan L. Albin
   Department of Industrial Engineering
   Rutgers University
   PO Box 909
   Piscataway, NJ  08855-0909
   phone: 908-445-2238
   email: salbin@rci.rutgers.edu
   FAX: 908-445-5467
   Sharon Fronheiser (to whom paper correspondance should be addressed)
   Eastman Kodak Company
   151 Mill Hollow Crossing
   Rochester, NY  14626
   phone: 716-588-2014
   email: sharonf@kodak.com
   FAX: 716-722-4415
   Randy Tobias (to whom electronic correspondance should be addressed)
   SAS Institute Inc.
   SAS Campus
   Cary, NC  27513-2414
   tel: 919-677-8000 x7933
   email: sasrdt@unx.sas.com
   FAX: 919-677-8123
The submission  process will  start on  August 1, 1996 and conclude on
January 17, 1997.  Papers should be  strongly justified by application
to  a problem in  quality  control,  or  the  chemical,  physical,  or
engineering sciences.  The mathematical level of papers may range from
none,  to  that  of  the  Journal of  Quality  Technology, or  that of
Technometrics.
--
Randy Tobias          SAS Institute Inc.     sasrdt@unx.sas.com
(919) 677-8000 x7933  SAS Campus Dr.         us024621@interramp.com
(919) 677-8123 (Fax)  Cary, NC   27513-2414
   Faith, faith is an island in the setting sun.
   But proof, yes: proof is the bottom line for everyone.
                                                       -- Paul Simon

Return to Top
Subject: polychoric matrix in CALIS
From: Robert Flynn Corwyn 
Date: Sat, 18 Jan 1997 15:28:06 -0600

I'm sorry if this has been discussed before. I've spent weeks looking through
manuals, and browsing the SAS web site. There appears to be very very little
documentation on using a polychoric correlation matrix in the CALIS procedure.
I've found a snippet on the polychoric macro, and another section on the
plcorr option in proc freq. However, after an enormous amount of wasted time,
I have not been able to use a polychoric matrix in CALIS, or even find that it
can be done. After first learning SPSS, and realizing that SAS has alot to
offer, I must admit that I've found SAS documentation of very little use-
except for basic stuff.  I've even gotten the excellent Hatcher book on CALIS,
but even it doesn't answer my polychoric question.  Can someone please help me?
1) is it possible to use polychoric matrices in CALIS?
2) if so, does an example exist anywhere?
Thanks in advance,
Robert Flynn Corwyn

Return to Top
Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.consult 22081

Directory

Articles