![]() |
![]() |
Back |
The signs of eigenvector elements can be inverted without changing validity. A x = ev * x defines the eigenvalue ev and eigenvector x. Clearly A (-x) = ev * (-x) is also a solution. Different algorithms make different choices. More tricky is the case where there are multiple eigenvectors for a single eigenvalue. For a symmetric matrix A, we can then form an infinite number of sets of equivalent eigenvector solutions by appropriate linear combinations of the initial eigenvectors (we want to maintain orthogonality). A consulting firm once balked at paying me modest consulting fees for a problem such as this and flailed around for 2 weeks (several programmers at work!) when converting a program from a Univac to an IBM 360. To their horror they finally learned that the program was working fine on both machines, but the rounding in the arithmetic was enough different to change the output eigenvectors considerably. One of the programmers did the learning by simply asking me if there was a problem and I took pity on him, since the suits were starting to get nasty. JN John C. Nash, Professor of Management, Faculty of Administration, University of Ottawa, 136 Jean-Jacques Lussier Private, P.O. Box 450, Stn A, Ottawa, Ontario, K1N 6N5 Canada email: jcnash@uottawa.ca, voice mail: 613 562 5800 X 4796 fax 613 562 5164, Web URL = http://macnash.admin.uottawa.caReturn to Top
While on the subject, does anyone know of a polynomial approximation for a two-dimensional normal itegral? Also, for an example of multidimensional normal integration by Gaussian quadrature, with apparently suitable accuracy, see the article by Bock & Gibbons in one of the latest Biometrics issues. -- John Uebersax Flagstaff, AZ 71302.2362@compuserve.comReturn to Top
* Crossposted from: 0 "DESIGN DATABASES AND DRIVE MICROSOFT ACCESS" Last year I taught a night school class which deal with database design and the use of Microsoft Access. The students were mainly business people wanting to make practical use of either Access 2.0 or Access for Windows 95. Some students wanted to be able to manage hobbies or sports organisations. Their first problem was to design their database, then they wanted it in action as quickly as possible. Existing textbooks were expensive, did not deal with design, and contained more than the basic essential information. The notes I wrote as courses proceeded have been compiled into a learning guide for those who want to build a database quickly. Should you be interested in a copy of the learning guide "Design Databases and Drive Microsoft Access" the cost is US $29.95, which includes package and postage. Just send a cheque or credit card number, expiry date and name. Do not forget your name and address for postal delivery. My address is: Robert Shaw 49 Sea Vista Drive Pukerua Bay Porirua City NEW ZEALAND Comments on the usefulness of the book are most welcome. =========================================================== =========================================================== ___ Blue Wave/QWK v2.20 [NR] ___ Blue Wave/QWK v2.20 [NR]Return to Top
Thank you for your fast answer. I like to ask somebody what will be the best software for data entry,cleanning the errors and coding, for not to large survey. Right know I have SPSS 7.5 . I can do the data entry with this software, but what I should use for clleaning the errors and for the rest. Best regards, Mirjana stojanovicReturn to Top
Given the stochastic process Y(k)=Y(k-1)+d+w(k) (1) where d is a constant gain and w(k) is a normal process with mean m and variance v. The objective is to determine the ditribution of the stochastic variable, X, defined as X={k for which Y(k)>=t and Y(k-i)Return to Top
Subject: Re: STAT-L Digest - 15 Jan 1997 to 16 Jan 1997
From: Franz-Josef Mueter
Date: Fri, 17 Jan 1997 15:15:03 -0900
> ANNE KNOXReturn to Topwrote: > > I am using an abundance cover scale (Braun-Blanquet) and have taken the > midpoints of the percent cover classes (3, 15.5, 38, 63, 88). I was trying to > examine the effect of several independant variables > on cover. A regression didn't work, however, since the residuals were not even > close to being normally distributed (an effect of the discrete ratios I was > told). The best advice that I have received so > far has been to either use Spearman's rank correlation coefficient or bootstrap > analysis. Is that correct, and does anyone have any additional advice for me? > The arcsine transformation is a useful transformation for percentages. Percentages follow a binomial distribution and can be made nearly normal if the square root of each proportion is transformed to its arcsine (or inverse sine or sin^-1). p' = arcsin(sqrt(p)) The transformation is not very good at the extreme ends of the data (near 0 and 100%). A discussion of the arcsine transformation can be found in: Zar, J.H. 1984. "Biostratistical Analysis". Prentice Hall.
Subject: matrixing and bracketing
From: carl-gustav.johansson@ferring.se
Date: Fri, 17 Jan 1997 14:02:46 -0800
Hallo, I would like to have information about matrixing and bracketing in stability studies for pharmaceuticals. Does anyone know how to make a layout design and how to proceed when values are outside the spec. limits. Sincerely, Carl-Gustav JohanssonReturn to Top
Subject: POLYCHORIC MATRIX
From: Robert Flynn Corwyn
Date: Fri, 17 Jan 1997 22:09:12 -0600
Could someone please explain how to use a polychoric matrix in CALIS? The programing language is what we need. Thanks Robert Flynn CorwynReturn to Top
Subject: Re: Combining Neural/Fuzzy Models with Statistical Models
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Date: Sat, 18 Jan 1997 00:57:53 GMT
In article <32DA5910.2AA2@colorado.edu>, Robert DodierReturn to Topwrites: |> Andrew Gray wrote: |> |> > I'm working on combining neural networks and fuzzy logic models |> > with statistical techniques (regression and data reduction) for |> > software metrics (for example, predicting development time based on |> > the type and size of system). While there has been a lot of work on |> > neural-fuzzy, neural-genetic, fuzzy-genetic, etc. type systems I've |> > only ever found a small number of researchers using AI/statistical |> > techniques (presumably at least partially an indication of how few AI |> > researchers follow the statistical side of things, and vice versa). |> |> At the risk of reviving age-old threads about fuzzy logic vs. |> probability, let me advise you not to bother with fuzzy logic. |> There are two parts to fuzzy logic, one defensible and the other |> not. The ``fuzzy'' part is one solution to the problem of |> representing uncertain knowledge -- this is the defensible part. |> The ``logic'' part is an attempt to reason with uncertain knowledge |> -- this is an undefensible hack. The trouble with fuzzy logic as usually presented is that you can't reason about two uncertain propositions without knowing how the uncertainties are related--it's like trying to work with marginal probability distributions without knowing the joint distribution. But some fuzzy logicians have noticed this problem and developed fuzzy logics involving "correlations" between fuzzy propositions. See the post included below. |> ... As discussed at length in |> Warren Sarle's neural networks FAQ (sorry, I don't have pointer) |> it's useful to consider neural networks as extensions of or variations |> on conventional regression and classification schemes. ftp://ftp.sas.com/pub/neural/FAQ.html ______________________________________________________________________ From: William Siler Newsgroups: comp.ai.fuzzy Subject: New Fuzzy Logic Date: Fri, 20 Sep 96 20:25:09 -0500 Organization: Delphi (info@delphi.com email, 800-695-4005 voice) Lines: 107 Message-ID: Reposting article removed by rogue canceller. DIGEST OF BUCKLEY, JJ AND SILER, W: A NEW T-NORM. SUBMITTED TO FUZZY SETS AND SYSTEMS, 1996. Fuzzy systems theory has been criticized for not obeying all the laws of classical set theory and classical logic. The t-norm and t-conorm here presented obey all the laws of the corresponding classical theory. A somewhat similar theory has been proposed by Thomas (1994), except that he does not claim that the distributive property is maintained. We first propose a source of fuzziness. We suppose that the truth value > 0 and < 1 of a fuzzy logical statement A is drawn from a number of underlying (probably implicit) correlated random variables whose values alpha[i] are binary, i.e. 0 or 1 with a Bernoulli distribution, and that the truth value of A is a simple average of these binary values. (George Klir (1994) proposed a similar process where the random values are binary opinions of experts as to truth or falsehood of a statement.) If this is so, then a = truth(A) = sum(alpha[i] / n b = truth(B) = sum(beta[i] / n) r = correlation coefficient(alpha[i], beta[i]) aANDb = truth value(A AND B) aORb = truth value(A OR B) sa = standard deviation(alpha) = sqrt(p(alpha)*(1 - p(alpha)) sb = standard deviation(beta) = sqrt(p(beta)*(1 - p(beta)) aANDb = a*b + r*sa*sb aORb = a + b - a*b - r*sa*sb rmax = (min(a,b) - a*b) / (sa*sb) for r = rmax, min(a,b) = a*b + r*sa*sb rmin = (a*b - max(a+b-1, 0)) / (sa*sb) for r = rmin, max(a+b-1, 0) = a*b + rmin*sa*sb Proofs of the following theorems are in the appendices of our paper. Theorem 1: 1. maxr = ru, ru <= 1 2. minr = rl, rl >= -1 3. rl <= r <= ru Theorem 2: 1. aANDb = a*b + r*sa*sb = a*b + cov(a,b) 2. aORb = a + b - a*b - r*sa*sb = a + b - a*b - cov(a,b) Theorem 3: 1. If r = ru, aANDb = min(a,b) and aORb = max(a,b) 2. If r = 0, aANDb = a*b and aORb = a+b-ab 3. If r = rl, aANDb = max(a+b-1, 0) and aORb = max(a+b, 1) We now suppose that this basic process is inaccessible to us, but that we do have a history of a number of instances of the truths of statement A and statement B. Now, given a value of r, the correlation coefficient between a, the truth values of A, and b, the truth values of B, the t-norm and t-conorm appropriate to this history, T (t-norm) and C (t-conorm) are defined for [a, b] on S, a restricted subset of [0,1]x[0,1]. Theorem 4. (The 5 parts of this theorem define the subset S of [0,1]x[0,1] possible for r = 1, 0 < r < 1, r = 0, -1 < r < 0 and r = -1.) Given a value of r, it may be that not all (a,b) combinations are possible; e.g. a = .25, b = .75 is not possible for r = 1 in the binary process described above.) Theorem 5. 1. (Shows that for 0 < r < 1 and (a,b) in S, ab < T(a,b) <= min(a,b).) 2. (Shows that for -1 < r < 0 and (a,b) in S, max(a+b-1) <= T(A,B) < ab) 3. (Shows that for -1 <= r <= 1 and (a,b) in S, max(a+b-1, 0) <= T(a,b) <= min(a,b) and max(a,b) <= C(a,b) <= min(a+b, 1). Theorem 6. Shows that T is a t-norm and C is a t-conorm on S. Theorem 7. 1. A AND A = A, r is 1. 2. A OR 0 = 0, any r. 3. A OR X = A, any r. 4. A AND NOT-A = 0, r is -1. 5. A OR A = A, r is 1. 6. A OR X = A, any r. 7. A OR 0 = A, any r. 8. A OR NOT-A = X, r is -1. 9. NOT-(A AND B) = NOT-A OR NOT-B, any r. 10. NOT-(A AND B) = NOT-A AND NOT-B, any r. 11. A OR (A AND B) = A, any appropriate r. 12. A AND (A OR B) = A, any appropriate r. 13. A AND (B OR C) = (A AND B) OR (A AND C), any appropriate r. 14. A OR (B AND C) = (A OR B) AND (A OR C), any appropriate r. References: Klir, GJ (1994). Multivalued logics vesus model logics: alternate frameworks for uncertainty modelling. In: Advances in Fuzzy Theory and Technology, Vol II: 3-47. Duke University Press, Durham, NC. Thomas, SF (1994). Fuzzy Logic and Probability. ACG Press, Wichita, KS. ______________________________________________________________________ -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. *** Do not send me unsolicited commercial or political email! ***
Subject: CART Workshop in Chicago
From: Patrick Fleury
Date: Fri, 17 Jan 1997 21:04:58 -0800
***Final Notice*** The Chicago Chapter of the American Statistical Association is happy to announce that it will present a half-day workshop on CART (Classification and Regression Trees) conducted by Dr. Dan Steinberg Salford Systems CART methodology concerns the use of tree structured algorithms to classify data into discrete classes. The terminology was invented by Breiman, et. al., in the early 1980's. The technique has found uses in both medical and market research statistics. For example, one tree structured classifier uses blood pressure, age and sinus tachycardia to classify heart patients as either high risk or not. Another might use age related variables and other demographics to decide who should appear on a mailing list. There will be other examples from both fields discussed in the workshop. Dr. Steinberg is President of Salford Systems of San Diego, California. He holds a Ph.D. degree in Economics from Harvard and has held positions at both the University of California at San Diego and San Diego State. He is the leader of the team that ported the original version of CART to the PC. He is well known for his previous work on statistical methods in economics and especially for his work on logistic regression. Time: 1:00 PM - 5:00 PM Registration: 12:30 PM - 1:00 PM Date January 31, 1997 Place: The University of Illinois at Chicago College of Nursing Third Floor Lounge 845 South Damen Avenue Admission: Members of the Chicago Chapter ASA: $80.00 Non Members of the Chicago Chapter: $92.00 Student Members: $40.00 Student Non-Members: $46.00 (The difference between Student and Regular admission is subsidized by the Lucile Derrick Fund) Advance Registration is encouraged because there will be hand outs and we would like to have an idea of how many to make. Registrations at the door will be accepted as space permits. We regret that we cannot accept credit cards. Payment is to be by cash or check only. Payment may be sent in early or paid at registration. Please make checks payable to "ASA-Chicago". Directions: The UIC School of Nursing is just north of the corner of Damen and Taylor in Chicago. To get to the School of Nursing, take the Eisenhower Expressway (either east or west) and get off at Damen. Proceed to 845 South Damen three blocks south of the expressway. Parking is available in parking structure D1 at 1100 South Wood. The parking structure is at the corner of Taylor and Wood two blocks east of Damen. For more information please e-mail pfleury@mcs.com or send mail to: CART Workshop opNUMERICS Suite 4A 151 N. Kenilworth Oak Park, IL 60301Return to Top
Subject: Correlations involving ratios
From: chris@agri.upm.edu.my
Date: Fri, 17 Jan 1997 21:41:06 -0600
I have another "puzzle" that's been on my mind. It is concerning correlating a ratio with another variable. To illustrate my point, consider this simple example (by the way, the correlations are actual not hypothetical). Let's have two indepedent variables: soil carbon (C), and soil nitrogen (N). I'm interested to know what's their linear relationship with another soil property, that is soil stability (S). Correlations (r) with S are as follow: r between C & S = .63** r between N & S = .00 r between C:N &S; = .51* Here, the C:N is a ratio obtained by dividing C with N. As shown, the r between C:N and S is 0.51*. Something peculiar happens when I start to play around with the data. Now, suppose I were to change the scale of C by adding the value 10 to every value in the variable C; that is, tC = C + 10. Using this new variable tC, I now form a new ratio between carbon and nitrogen, that is tC:N. Correlations with S are now as follow: r between C & S = .63** r between tC & S = .63** (ok) r between N & S = .00 r between C:N & S = .51* r between tC:N & S = .07 (!!!) As shown, changing the scale of carbon did not affect its r with S -- this is correctly so. If you were to correlate tC with C you would obtain a perfect 1.00 relationship. In simple regression, changing the scale of variable only shifts the line upwards or downwards - the slope remains equal. However, the r between the tC:N with S is clearly different from the r between C:N and S! If I were to use tC to form the ratio between carbon and nitrogen, my interpretation of results would be different - I would conclude that carbon:nitrogen ratio is insignificantly related to S. But had I used C, I would instead conclude that carbon:nitrogen ratio has a significant relationship! Why is this so???? What's happening here??? Note: this pecularity also happens with other data, so there is no data entry or calculation error here. Anyone? P.S. By the way, in soil science using such ratios is quite common. This is why this is quite shock finding. -------------------==== Posted via Deja News ====----------------------- http://www.dejanews.com/ Search, Read, Post to UsenetReturn to Top
Subject: Re: Underdispersion causes
From: Bob Wheeler
Date: Fri, 17 Jan 1997 21:36:14 -0500
Benjamin Chan wrote: > > I'm curious as to possible causes for underdispersion in generalized > linear models. I'm fitting a Poisson model to some count data and the > model I'm fitting looks underdispersed. > > -- > > +-------------------------------------------------+ > | Benjamin Chan, M.S., Assistant Statistician I | > | UC Davis Medical Center, Primary Care Center | > | 2221 Stockton Blvd., Room 3107 | > | Sacramento CA 95817 | > | Voice = 916-734-7004; Fax = 916-734-2732 | > +-------------------------------------------------+ Dependence among the observations is a common problem. Bob Wheeler, ECHIP, Inc.Return to Top
Subject: Re: 4 dim normal integral
From: orourke@utstat.toronto.edu (Keith O'Rourke)
Date: Tue, 14 Jan 1997 18:19:13 GMT
Any recent text on numerical integration should suffice. My favourite is Sloan IH, Joe S. Lattice methods for multiple intergration. Oxford, Clarendon Press. 1994 but you may wish to look for available software for expendience. I believe Splus has a "sub-group adaptive" multiple quadrature routine that may handle your 4 dimensional problem. Keith O'Rourke The Toronto Hosp.Return to Top
Subject: Re: Introduction to non-parametric stats.
From: "Vassili P. Leonov"
Date: Sat, 18 Jan 97 15:31:30 +0700
> Date: Tue, 14 Jan 1997 16:05:06 -0700 > From: DAN.HUSTON@ASU.EDU > Subject: introduction to non-parametric stats > There are 2 messages totalling 80 lines in this issue. > > Topics of the day: > > 1. introduction to non-parametric stats > 2. Test for trend - Help! > > ---------------------------------------------------------------------- > > > Hello Folks- > > I am looking for an introductary book on non-parametric stats. Any > suggestions? > > Thanks- > > Dan Huston Phone (602) 965-2420 > Measurement, Statistics and Methodological Studies Fax (602) 965-0300 > Psychology in Education dan.huston@asu.edu > 325 Payne Hall http://seamonkey.ed.asu.edu/~huston > Arizona State University > Tempe, AZ 85287-0611 > > ------------------------------ > I recommend you the following good books: 1. Richard P. Runion Nonparametric Statistics. A contemporary Approach 1977 by Addison-Wesley Publishing Company, Inc., Reading, Massachusetts USA 2. Hollander M., Wolfe D.A. (1973). Nonparametric Statistical Methods. New York: Wiely 3. Lehmann E.L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San Francisko: Holden-Day. 4. Noether G.E. (1967). Elements of Nonparametric Statistics, New York: Wiley. 5. Walsh J.E. (1962) Handbook jof Nonparametric Statistics, vol.1, V an Nostrand, Princeton, N.J. 6. Handbook of Applicable Mathematics. Chief Editor: Walter Ledermann. Volume VI: Statistics. Part A & B. A Wiley-Interscience Publication JOHN WILEY & SONS Chchester-New York-Brisbane-Toronto-Singapore (1984). I wish successes! --- Centre of Applied Statistics Stat-Point Vassili P. Leonov E-mail: point@statleo.tomsk.suReturn to Top
Subject: Re: Getting means and ... proc GLM
From: "Vassili P. Leonov"
Date: Sat, 18 Jan 97 15:29:09 +0700
Date: Tue, 7 Jan 1997 15:54:04 EST From: Tim BennerReturn to TopSubject: Getting means and sd from proc GLM Tim Benner wrote: > I do alot of within and between subject analysis using the GLM procedure > and I'd like to have Sas print the means and standard deviations in the GLM > output. I'd like to get a mean and sd for each of my dependant variables. > Is there a way of getting proc GLM to print these out, without me having to > do a seperate proc means? > Tim Benner > Dept of Kinesiology > Penn State University For a conclusion average and MSE it is possible to use in PROC GLM Options MEANS, SCHEFFE and CLDIFF Example SAS of the program for this purpose of a case: PROC GLM DATA = A1; CLASS B; MODEL X1-X5 = B; MEANS X1-X5 / SCHEFFE CLDIFF; RUN; I wish successes in study of a perfect statistical package SAS!. --- Centre of Applied Statistics Stat-Point Vassili P. Leonov E-mail: point@statleo.tomsk.su
Subject: Performance Index for Covering Designs, Wheels
From: bm373592@muenchen.org (Uenal Mutlu)
Date: Sat, 18 Jan 1997 19:37:35 GMT
(I've moved this thread to a meaningful subject and also included sci.stat.consult) Here are some relevant excerpts from the previous postings. The task is twofold: creating wheels (covering designs) which cover the most of the possible cases in less blocks as possible (ie. a combinatorial optimization problem), and a formula which allows the comparison of designs with different nbr of blocks etc. (ie. a practical performance index for such designs): ##### >>A few years ago, I wrote a program to improve partial wheels. I have >>ran this program overnight with the improved wheel and it has improved >>it further to 44.45% And it's not done yet... >> >I think 2 conditions have to be fulfilled to find an optimal solution for >that problem: >a) all tickets should be as different as possible (any two tickets should >share as less as possible numbers) >b) each number should appear as often as any other number > >Here is a solution based on these conditions: >1 9 17 25 33 41 >2 10 18 26 34 42 >3 11 19 27 35 43 >4 12 20 28 36 44 >5 13 21 29 37 45 >6 14 22 30 38 46 >7 15 23 31 39 47 >8 16 24 32 40 48 >1 16 23 30 37 44 >2 9 24 31 38 45 >3 10 17 32 39 46 >4 11 18 25 40 47 >5 12 19 26 33 48 >6 13 20 27 34 41 >7 14 21 28 35 42 >8 15 22 29 36 43 >1 15 21 27 38 48 >2 16 22 28 39 41 >3 9 23 29 40 42 >4 10 24 30 33 43 >5 11 17 31 34 44 >6 12 18 32 35 45 >7 13 19 25 36 46 >8 14 20 26 37 47 >1 14 19 32 34 49 >2 15 20 25 35 49 >3 16 21 26 36 49 >4 9 22 27 37 49 >5 10 23 28 38 49 >6 11 24 29 39 49 >7 12 17 30 40 49 >8 13 18 31 33 49 > >Using the first 27 tickets your percentage for at least a 3 win is >44.383% >Using all 32 tickets percentage is at 50.83% > >I would be interested to see the 44.45% wheel. ##### >>Using the first 27 tickets your percentage for at least a 3 win is >>44.383% Using all 32 tickets percentage is at 50.83% > >We should use a general formula for comparing of such wheels with >differing nbr of tickets: > > WCD = Percent / Tickets > >where > - WCD = "Wheel Covering Degree". The higher, the better. > - Percent is in the range 0.0 to 100.0 for minWin (ie. for > "at least x-win") > - Tickets is the nbr of single tickets the wheel consists of > >Warning: > it can be used only for the same type of wheels (ie. k must > be equal), and percent should be "for at least minWin" > >Example for minWin = 3+ (ie. for at least 3): > > Percent Tickets -> WCD > --------------------------- > 44.383 27 1.64381 > 50.83 32 1.58844 > 70.0 54 1.29630 > 100.0 168 0.59524 > >The higher the WCD, the better the wheel. It simply says >"each ticket covers WCD points of the Percent value" ##### >I had a similar discussion on this with Normand and currently in 1st >place is John Rawson with 44.43945% (27 lines - 230,160 combs covered >per line avg = 1.6459%). > >But if one comes up with 28 lines would it be fair to expect them to >have at least 1.6459% per line ? ##### >>But if one comes up with 28 lines would it be fair to expect them to >>have at least 1.6459% per line ? > >The WCD formula already covers this case. You're mixing up the >different kinds of percent values and your calculation doesn't >take into account the nbr of tickets, but the WCD formula does. > >(Your 1.6459% is simply the result of 230160 / 13983816 * 100 >ie. simple percent calculation, but the actual number of tickets >isn't included anywhere, so this won't help in case of differing >nbr of tickets) > >Cf. for example the first 2 entries in the table above, then you >should see that the formula indeed allows the comparison of your >example of 28 tickets too, but one needs both the winPercent (or the >number of winning combinations to get the winPercent) and the nTickets >(here 28 and 27) values of both wheels for the WCD formula. > >And, your percent value (1,6459%) has nothing todo with the WCD value >(ie. they are very different things; the WCD is NOT a percent value, >it simply is a number). Both can't be compared together. Apple and Orange :-) > >Anyone know of a better formula for comparing of such wheels of similar >type but with different nbr of tickets? ##### >>I think 2 conditions have to be fulfilled to find an optimal solution for >>that problem: >>a) all tickets should be as different as possible (any two tickets should >>share as less as possible numbers) >>b) each number should appear as often as any other number > >Right. But, it's not that simple. If we keep adding tickets, at one >point it will become possible to cover more combinations by having >more redundancy between tickets. As an example of this, when the 174 >ticket cover was discovered for the Lotto 6/49 it had 18 tickets that >had 5 identical numbers, yet we couldn't see how to improve it further >with tickets that had less redundancy. > >Is 27 tickets below or above that point? > >Also, if you randomly generate several sets of 27 tickets that have no >duplicating pairs of numbers you will notice that they do not cover >the same number of combinations! So, what is the best cover possible >using only such tickets? > >This may or may not be the cover we are looking for. ##### >>Also, if you randomly generate several sets of 27 tickets that have no >>duplicating pairs of numbers you will notice that they do not cover >>the same number of combinations! So, what is the best cover possible >>using only such tickets? >> >>This may or may not be the cover we are looking for. > >It is the right direction to go I would say. Of course the best is, >and always has been, designing wheels which have the highest number >of winning combinations in less tickets as possible. If we take this >as the only major criteria then we end up with this kind of wheels, >which IMHO are much better, and also cheaper, for the player than the >other 100% guaranteed wheels. And, IMHO exactly here should be the >true difference between wheels and covering designs. > >But the designing of such wheels isn't easy either (this maybe because >it's a relatively new field for me; no simple combine formula and so on... :-) >But it's a good challenge for wheel designers, group and design theoretists. > >So, for this type of wheels we need a different way to measure their >strength. In an other posting I introduced the WCD formula, which seems >suitable for this task; ie. > > WCD = winPercent / nTickets > >where > winPercent = NbrOfWinningCombinations / C(v,k) * 100 > nTickets = nbr of tickets the wheel has > >(One also should always state, besides the other usual params of the >wheel, the minWin all these values are for. Ie. "for 3 or more winning >numbers" etc.) > >The WCD is a number only (ie. no percent value). The higher the WCD, >the better the wheel. It should be used for comparison of 2 or more >wheels of same type with possibly differing number of tickets. > >(cf. also the other postings on this which also include some examples) ##### >|> The WCD is a number only (ie. no percent value). The higher the WCD, >|> the better the wheel. It should be used for comparison of 2 or more >|> wheels of same type with possibly differing number of tickets. >|> >|> (cf. also the other postings on this which also include some examples) >|> > > I'm not sure it's that easy to compare wheels of differing number of >tickets by this method. Let's take a very simple example to make this >clear. Consider a 6/14 lottery with the winning criteria being a >3+ match. > The number of combinations is 3003 and the best wheels I can think of >are, for 1 to 4 tickets: > > > Number of tickets = 1: Numbers 1, 2, 3, 4, 5, 6 > > 1589 100 > Coverage is 1589, Therefore WCD = ---- * --- = 52.91 % > 3003 1 > > > Number of tickets = 2: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > > 2778 100 > Coverage is 2778, Therefore WCD = ---- * --- = 46.25 % > 3003 2 > > Number of tickets = 3: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > 1, 2, 3, 4, 13, 14 > > 2988 100 > Coverage is 2988, Therefore WCD = ---- * --- = 33.17 % > 3003 3 > > Number of tickets = 4: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > 1, 2, 3, 4, 13, 14 > 5, 6, 7, 8, 13, 14 > > 3003 100 > Coverage is 3003, Therefore WCD = ---- * --- = 25.00 % > 3003 4 > > Which would you say is the best wheel, and how does your >judgement compare to the WCD rating? > > The point I'm trying to make is that the WCD is always going to go >down with increasing number of tickets, because you can't avoid >coverage redundancy as you introduce more tickets. The same applies >to the various wheels in 6/49. Consider again a 'wheel' of 1 ticket. >Its coverage is 260524 out of 13983816, giving a WCD of 1.863%. This maybe a typo: should be 260624; it is the maxCover value, ie. the nbr of combinations 1 block ideally should cover. I'm currently working on an new method, but the relations for the given examples remain similar I fear. The advantage of the new formula is that it gives a true p-value (ie. percent value if multiplied by 100). >is to be compared with the best current 27 ticket wheel with a >coverage of about 44.5%, ie a WCD of 1.648%. Then the 168 wheel has >a WCD of just 100/168 or 0.595%. Again, does the WCD give a good >guide to which is the best wheel? Essentially the WCD will _always_ >give the best result for a wheel of 1 ticket, and will favour wheels >of lower number of tickets. Now I'm not a great expert or user of >wheels, but I'm fairly sure that most people who are keen on wheels >do not consider a wheel of one ticket to be a good wheel! > > So what I'm really trying to say is that I don't think the WCD >is a good means to compare two wheels of different numbers of tickets.. >However, I can't say I've got any better suggestion. ##### >Hi Stephen and all the others, > >I've moved the discussion about the Performance Index for Wheels (WCD etc.) >to a new subject called "Performance Index for Covering Designs, Wheels". >The first posting under the new subject contains IMHO all the relevant >excerpts from the previous postings. Ie. it's exactly this posting :-) (BTW, the old subject was "Dimitris Challenge" in r.g.l.)Return to Top
Subject: Performance Index for Covering Designs, Wheels
From: bm373592@muenchen.org (Uenal Mutlu)
Date: Sat, 18 Jan 1997 19:37:35 GMT
(I've moved this thread to a meaningful subject and also included sci.stat.consult) Here are some relevant excerpts from the previous postings. The task is twofold: creating wheels (covering designs) which cover the most of the possible cases in less blocks as possible (ie. a combinatorial optimization problem), and a formula which allows the comparison of designs with different nbr of blocks etc. (ie. a practical performance index for such designs): ##### >>A few years ago, I wrote a program to improve partial wheels. I have >>ran this program overnight with the improved wheel and it has improved >>it further to 44.45% And it's not done yet... >> >I think 2 conditions have to be fulfilled to find an optimal solution for >that problem: >a) all tickets should be as different as possible (any two tickets should >share as less as possible numbers) >b) each number should appear as often as any other number > >Here is a solution based on these conditions: >1 9 17 25 33 41 >2 10 18 26 34 42 >3 11 19 27 35 43 >4 12 20 28 36 44 >5 13 21 29 37 45 >6 14 22 30 38 46 >7 15 23 31 39 47 >8 16 24 32 40 48 >1 16 23 30 37 44 >2 9 24 31 38 45 >3 10 17 32 39 46 >4 11 18 25 40 47 >5 12 19 26 33 48 >6 13 20 27 34 41 >7 14 21 28 35 42 >8 15 22 29 36 43 >1 15 21 27 38 48 >2 16 22 28 39 41 >3 9 23 29 40 42 >4 10 24 30 33 43 >5 11 17 31 34 44 >6 12 18 32 35 45 >7 13 19 25 36 46 >8 14 20 26 37 47 >1 14 19 32 34 49 >2 15 20 25 35 49 >3 16 21 26 36 49 >4 9 22 27 37 49 >5 10 23 28 38 49 >6 11 24 29 39 49 >7 12 17 30 40 49 >8 13 18 31 33 49 > >Using the first 27 tickets your percentage for at least a 3 win is >44.383% >Using all 32 tickets percentage is at 50.83% > >I would be interested to see the 44.45% wheel. ##### >>Using the first 27 tickets your percentage for at least a 3 win is >>44.383% Using all 32 tickets percentage is at 50.83% > >We should use a general formula for comparing of such wheels with >differing nbr of tickets: > > WCD = Percent / Tickets > >where > - WCD = "Wheel Covering Degree". The higher, the better. > - Percent is in the range 0.0 to 100.0 for minWin (ie. for > "at least x-win") > - Tickets is the nbr of single tickets the wheel consists of > >Warning: > it can be used only for the same type of wheels (ie. k must > be equal), and percent should be "for at least minWin" > >Example for minWin = 3+ (ie. for at least 3): > > Percent Tickets -> WCD > --------------------------- > 44.383 27 1.64381 > 50.83 32 1.58844 > 70.0 54 1.29630 > 100.0 168 0.59524 > >The higher the WCD, the better the wheel. It simply says >"each ticket covers WCD points of the Percent value" ##### >I had a similar discussion on this with Normand and currently in 1st >place is John Rawson with 44.43945% (27 lines - 230,160 combs covered >per line avg = 1.6459%). > >But if one comes up with 28 lines would it be fair to expect them to >have at least 1.6459% per line ? ##### >>But if one comes up with 28 lines would it be fair to expect them to >>have at least 1.6459% per line ? > >The WCD formula already covers this case. You're mixing up the >different kinds of percent values and your calculation doesn't >take into account the nbr of tickets, but the WCD formula does. > >(Your 1.6459% is simply the result of 230160 / 13983816 * 100 >ie. simple percent calculation, but the actual number of tickets >isn't included anywhere, so this won't help in case of differing >nbr of tickets) > >Cf. for example the first 2 entries in the table above, then you >should see that the formula indeed allows the comparison of your >example of 28 tickets too, but one needs both the winPercent (or the >number of winning combinations to get the winPercent) and the nTickets >(here 28 and 27) values of both wheels for the WCD formula. > >And, your percent value (1,6459%) has nothing todo with the WCD value >(ie. they are very different things; the WCD is NOT a percent value, >it simply is a number). Both can't be compared together. Apple and Orange :-) > >Anyone know of a better formula for comparing of such wheels of similar >type but with different nbr of tickets? ##### >>I think 2 conditions have to be fulfilled to find an optimal solution for >>that problem: >>a) all tickets should be as different as possible (any two tickets should >>share as less as possible numbers) >>b) each number should appear as often as any other number > >Right. But, it's not that simple. If we keep adding tickets, at one >point it will become possible to cover more combinations by having >more redundancy between tickets. As an example of this, when the 174 >ticket cover was discovered for the Lotto 6/49 it had 18 tickets that >had 5 identical numbers, yet we couldn't see how to improve it further >with tickets that had less redundancy. > >Is 27 tickets below or above that point? > >Also, if you randomly generate several sets of 27 tickets that have no >duplicating pairs of numbers you will notice that they do not cover >the same number of combinations! So, what is the best cover possible >using only such tickets? > >This may or may not be the cover we are looking for. ##### >>Also, if you randomly generate several sets of 27 tickets that have no >>duplicating pairs of numbers you will notice that they do not cover >>the same number of combinations! So, what is the best cover possible >>using only such tickets? >> >>This may or may not be the cover we are looking for. > >It is the right direction to go I would say. Of course the best is, >and always has been, designing wheels which have the highest number >of winning combinations in less tickets as possible. If we take this >as the only major criteria then we end up with this kind of wheels, >which IMHO are much better, and also cheaper, for the player than the >other 100% guaranteed wheels. And, IMHO exactly here should be the >true difference between wheels and covering designs. > >But the designing of such wheels isn't easy either (this maybe because >it's a relatively new field for me; no simple combine formula and so on... :-) >But it's a good challenge for wheel designers, group and design theoretists. > >So, for this type of wheels we need a different way to measure their >strength. In an other posting I introduced the WCD formula, which seems >suitable for this task; ie. > > WCD = winPercent / nTickets > >where > winPercent = NbrOfWinningCombinations / C(v,k) * 100 > nTickets = nbr of tickets the wheel has > >(One also should always state, besides the other usual params of the >wheel, the minWin all these values are for. Ie. "for 3 or more winning >numbers" etc.) > >The WCD is a number only (ie. no percent value). The higher the WCD, >the better the wheel. It should be used for comparison of 2 or more >wheels of same type with possibly differing number of tickets. > >(cf. also the other postings on this which also include some examples) ##### >|> The WCD is a number only (ie. no percent value). The higher the WCD, >|> the better the wheel. It should be used for comparison of 2 or more >|> wheels of same type with possibly differing number of tickets. >|> >|> (cf. also the other postings on this which also include some examples) >|> > > I'm not sure it's that easy to compare wheels of differing number of >tickets by this method. Let's take a very simple example to make this >clear. Consider a 6/14 lottery with the winning criteria being a >3+ match. > The number of combinations is 3003 and the best wheels I can think of >are, for 1 to 4 tickets: > > > Number of tickets = 1: Numbers 1, 2, 3, 4, 5, 6 > > 1589 100 > Coverage is 1589, Therefore WCD = ---- * --- = 52.91 % > 3003 1 > > > Number of tickets = 2: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > > 2778 100 > Coverage is 2778, Therefore WCD = ---- * --- = 46.25 % > 3003 2 > > Number of tickets = 3: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > 1, 2, 3, 4, 13, 14 > > 2988 100 > Coverage is 2988, Therefore WCD = ---- * --- = 33.17 % > 3003 3 > > Number of tickets = 4: Numbers 1, 2, 3, 4, 5, 6 > 7, 8, 9, 10, 11, 12 > 1, 2, 3, 4, 13, 14 > 5, 6, 7, 8, 13, 14 > > 3003 100 > Coverage is 3003, Therefore WCD = ---- * --- = 25.00 % > 3003 4 > > Which would you say is the best wheel, and how does your >judgement compare to the WCD rating? > > The point I'm trying to make is that the WCD is always going to go >down with increasing number of tickets, because you can't avoid >coverage redundancy as you introduce more tickets. The same applies >to the various wheels in 6/49. Consider again a 'wheel' of 1 ticket. >Its coverage is 260524 out of 13983816, giving a WCD of 1.863%. This maybe a typo: should be 260624; it is the maxCover value, ie. the nbr of combinations 1 block ideally should cover. I'm currently working on an new method, but the relations for the given examples remain similar I fear. The advantage of the new formula is that it gives a true p-value (ie. percent value if multiplied by 100). >is to be compared with the best current 27 ticket wheel with a >coverage of about 44.5%, ie a WCD of 1.648%. Then the 168 wheel has >a WCD of just 100/168 or 0.595%. Again, does the WCD give a good >guide to which is the best wheel? Essentially the WCD will _always_ >give the best result for a wheel of 1 ticket, and will favour wheels >of lower number of tickets. Now I'm not a great expert or user of >wheels, but I'm fairly sure that most people who are keen on wheels >do not consider a wheel of one ticket to be a good wheel! > > So what I'm really trying to say is that I don't think the WCD >is a good means to compare two wheels of different numbers of tickets.. >However, I can't say I've got any better suggestion. ##### >Hi Stephen and all the others, > >I've moved the discussion about the Performance Index for Wheels (WCD etc.) >to a new subject called "Performance Index for Covering Designs, Wheels". >The first posting under the new subject contains IMHO all the relevant >excerpts from the previous postings. Ie. it's exactly this posting :-) (BTW, the old subject was "Dimitris Challenge" in r.g.l.)Return to Top
Subject: Re: PC & CCA
From: Pablo del Monte Luna
Date: Sat, 18 Jan 1997 12:41:40 -0800
Richard F Ulrich wrote: > If you used ALL of the principal components from each analysis, then > the Canonical Correlation could have been run on all the inital > variables, and it would have accounted for the same variance. Is > that reasonable? "80% of the variability" is enough redundancy > that I would be cautious about a) overfitting, if there were not a > large number of cases per variable; or b) correlated error, if > your data sources are not independent. > > Maybe that will help you consider what you have, but your question > was not specific enough for me to say much more. > > Rich Ulrich, biostatistician wpilib+@pitt.edu > http://www.pitt.edu/~wpilib/index.html Univ. of Pittsburgh One one hand, I have 8 climatic indices (20 years of monthly standarized anomalies, say N=192) that represents the oceanic and atmospheric conditions in the Tropical Pacific. On the other, I have 6 areas (extratropics) each subdivided in 6 2X2 boxes . Each subdivision is an historical record of sea surface temperature, sea level pressure and scalar wind, separately, in form of anomalies too. The variables are naturally related. From the 8 original indices, Principal Components Analysis defined 3 factors accounting 89% of the original variability; and for the 6 series of each box (for each variable) the first three modes explain 80% of the variance 90% of the time, only 4 modes the last 10%. Then, I made Canonical Correlation Analysis between the factor scores of the indices with every set of factor scores of each area (for each variable) to obtain 9 canonical R's per variable. I apologize for my poor vocabulary and syntax. In advance, thak you very much for your time and help. Pablo del Monte Marine Biologist Fisheries Group Center for Biological Research La Paz, B.C.S., Mexico http://www.cibnor.mx delmonte@cibnor.mxReturn to Top
Subject: CFP: 1997 Fall Technical Conference
From: "Randall D. Tobias"
Date: Thu, 16 Jan 1997 16:02:47 GMT
41st Annual Fall Technical Conference 1997 Call for Papers "Mining Data for Quality Improvement" Omni Inner Harbor Hotel Baltimore, Maryland October 16-17, 1997 Co-sponsored by: American Society for Quality Control - Chemical and Process Industries Division - Statistics Division American Statistical Association - Section on Physical and Engineering Sciences Applied and expository papers are needed for parallel sessions in Statistics, Quality Control, and Tutorial / Case Study. Detailed submission instructions are available on the Web http://www.sas.com/ftc97/ or you can request them from one of the following members of the program committee: Susan L. Albin Department of Industrial Engineering Rutgers University PO Box 909 Piscataway, NJ 08855-0909 phone: 908-445-2238 email: salbin@rci.rutgers.edu FAX: 908-445-5467 Sharon Fronheiser (to whom paper correspondance should be addressed) Eastman Kodak Company 151 Mill Hollow Crossing Rochester, NY 14626 phone: 716-588-2014 email: sharonf@kodak.com FAX: 716-722-4415 Randy Tobias (to whom electronic correspondance should be addressed) SAS Institute Inc. SAS Campus Cary, NC 27513-2414 tel: 919-677-8000 x7933 email: sasrdt@unx.sas.com FAX: 919-677-8123 The submission process will start on August 1, 1996 and conclude on January 17, 1997. Papers should be strongly justified by application to a problem in quality control, or the chemical, physical, or engineering sciences. The mathematical level of papers may range from none, to that of the Journal of Quality Technology, or that of Technometrics. -- Randy Tobias SAS Institute Inc. sasrdt@unx.sas.com (919) 677-8000 x7933 SAS Campus Dr. us024621@interramp.com (919) 677-8123 (Fax) Cary, NC 27513-2414 Faith, faith is an island in the setting sun. But proof, yes: proof is the bottom line for everyone. -- Paul SimonReturn to Top
Subject: polychoric matrix in CALIS
From: Robert Flynn Corwyn
Date: Sat, 18 Jan 1997 15:28:06 -0600
I'm sorry if this has been discussed before. I've spent weeks looking through manuals, and browsing the SAS web site. There appears to be very very little documentation on using a polychoric correlation matrix in the CALIS procedure. I've found a snippet on the polychoric macro, and another section on the plcorr option in proc freq. However, after an enormous amount of wasted time, I have not been able to use a polychoric matrix in CALIS, or even find that it can be done. After first learning SPSS, and realizing that SAS has alot to offer, I must admit that I've found SAS documentation of very little use- except for basic stuff. I've even gotten the excellent Hatcher book on CALIS, but even it doesn't answer my polychoric question. Can someone please help me? 1) is it possible to use polychoric matrices in CALIS? 2) if so, does an example exist anywhere? Thanks in advance, Robert Flynn CorwynReturn to Top
Downloaded by WWW Programs
Byron Palmer