![]() |
![]() |
Back |
I ve recently read an advertising (in french) with the following mention: the efficiency of the product (indeed a mascara for eye lash) was proved by the third quartile method. (methode du troisieme quartile in french). Can someone explain what is this third quartile method or give any reference ? thanks. ********************************* Stephane CHAMPELY IUT STID Ile du Saulcy 57045 METZ cedex 1 tel: 87 31 51 62 fax: 87 31 51 55 E mail: champely@iut.univ-metz.fr *********************************Return to Top
>Received: 5/12/96 10:03 am >From: G Asha, asha@CAS.IISC.ERNET.IN >My friend has collected data which has the following info. > >Dependent variable: Achievement in Biology >Ind variables : self -conf, home adjustment, health adjustment etc > totally 11 in number. > >She needs to do multiple (step-wise) regression analysis. I am familiar >with multiple regression, but do not know how to go about step-wise >regression. Can someone please advice me about this.. Even a ref book or >some simple stat package will do. > No she doesn't. She must now look at the data and ask what shape (if = any) the relationship between the predictor and the achievement score = is. Then she might ask if the relationship is genuinely = cause-and-effect or (partly) a product of the relationship between = the predictor variable and another variable. She needs to build an = intelligent model, and that's soething computers don't do for you. = Otherwise we could all go home right now. One bit of advice: do the analysis separately for males and females. = Several people, including myself, have found that achievement is more = strongly related to student characteristics in females - that is, = that women perform more in line with their apparent abilities and = morivations. _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ _/_/_/ _/_/ _/_/_/ _/ Ronan M Conroy _/ _/ _/ _/ _/ _/ Lecturer in Biostatistics _/_/_/ _/ _/_/_/ _/ Royal College of Surgeons _/ _/ _/ _/ _/ Dublin 2, Ireland _/ _/ _/_/ _/_/_/ _/ voice +353 1 402 2431 fax +353 1 402 2329 _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ =B9Do not try to be a genius in every bar=B9 [Brendan Behan? - No, = Faure!]Return to Top
You may want to take a look at this site: http://www.usps.gov/websites/depart/inspect/chainlet.htm It is the site of the US Postal Service. Yes, these chain letters and other pyramid schemes ARE illegal, and can result in some pretty heavy fines and/or imprisonment. And still they post their garbage here. But one day... there WILL BE a knock on the door. webmaster@marcap.com http://www.marcap.comReturn to Top
Please recommend a sorce for a novice to learn about modeling dose levels and exploring the effect of different ingestion patterns. I have been exploring difference equations but my math is weak. Hope that this is an appropriate question for this list. Thanks. Fancher E. Wolfe, Professor Mathematics and Statistics Metropolitan State University 730 Hennepin Ave. Minneapolis, MN 55403-1897 fwolfe@msus1.msus.edu 612-341-7256Return to Top
In article <32a6fb03.22439144@news.southeast.net>, Matt BeckwithReturn to Topwrote: >tgee@superior.carleton.ca (Travis Gee) wrote: >>beckwith@pop.southeast.net (Matt Beckwith) writes: >>>(1) Alpha is the probability that you have rejected that which is >>>true. >>>Not exactly. In classic hypothesis testing, alpha is the probability that you >>will falsely reject the "null hypothesis" when it is true. >You said: Given that the null hypothesis is true, alpha is the >probability that my test design will (inappropriately) reject it >anyway. >I said: Given that my experiment has rejected the null hypothesis, >alpha is the probability that it is actually true. >Are these not logically equivalent? Very definitely not. And no matter how often this is pointed out, this mistake is made, and I believe it is subconsciously made even by those who know better. Suppose that the null hypothesis is that the coin has probability .5 of coming up heads (an "honest" coin). We toss it 100 time, and produce a test which will reject with probability .05 if the coin is truly honest. Now suppose the probability that the coin comes up heads is close, say .4999. The probability that the hypothesis will be rejected is not much different. Why should someone think that the probability is .500 rather than .4999? The test essentially does not distinguish between them. In general, the point null hypotheses is never true. Is that coin EXACTLY honest? I suggest you think about the question you really want to ask. The problem is not trivial then, but "alpha" is also not the answer. ................... -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (317)494-6054 FAX: (317)494-0558
In article <1996Dec4.155947.1@rddvax.decnet.lockheed.com>,Return to Topwrote: >Can anyone provide information or suggest references about the accuracy >and dependability of using PSEUDO-random sequences to simulate rare >events (10-3 to 10-7). The occurrence of a rare event at time n depends >in a complicated way on the values of the preceding 3 to 30 pseudo-random >numbers. Even if it was 3, I would be suspicious. With 30, even more so. Also, if you are using acceptance-rejection procedures. even very long term effects can enter. There are generally better ways to do it, but this si an art, not a science. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (317)494-6054 FAX: (317)494-0558
Sean Lahman wrote: > 4 - Conclusions > > If I had to some it up in one sentence, I would say that the means have > remained fairly consistent, while the variance has gradually dropped. > This would lead me to believe there is no evidence of a > "segregation-inflation" effect. I'm guessing that the decrease in > variance indicates a general increase in the level of play. I think the problem, as I see it, is that you are looking for an effect in a data set that may or may not be present, but you have not tried (or were not able) to remove other effects happening simultaneously or nearly simultaneously in the population you are studying. Perhaps if you removed the effects of increasing usage of relief pitchers, night baseball, better training methods, coast-to-coast travel, etc., there might be an easily recognizable "segregation-inflation" effect. Maybe not. I'm unconvinced that your data supports the conclusion you come to. -- +---------------------------------+------------------------------+ | Paige Miller, Eastman Kodak Co. | "Let's play some basketball" | | PaigeM@kodak.com | Michael Jordan in Space Jam | +---------------------------------+------------------------------+ | The opinions expressed herein do not necessarily reflect the | | views of the Eastman Kodak Company. | +----------------------------------------------------------------+Return to Top
Sean Lahman (lahmans@vivanet.com) wrote: << deleted, comments and first questions about analysing batting data from major league baseball >> : 5 - Questions : a) Am I misinterpreting the data? : b) Does this analysis adequately address the original problem, : or is it a much too simplistic approach? : ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Yes, I would say, simplistic and misinterpreting.... They lowered the pitching mound in about 1955, or maybe 1965, in order to reduce the pitcher's advantages, because batters were doing too poorly. The first generations of players had different rules for what constituted the "batting average". In the history of Major League Baseball, "sacrifices" or walks (or maybe both) have been treated differently than they way they are today. Maybe you should consider the *physical size* of players as demonstration that major league ball is different that it was. Today, a player who is not more than 6-foot tall is a rather short one. Pitchers who were 6'5" used to be rare, instead of the median. Rich Ulrich, wpilib+@pitt.eduReturn to Top
User wrote: > > Hi, I am trying to learn principal component analysis by myself. At > the moment I am a bit stuck on the getting the pc-scores using scaled > eigenvectors. Could someone give me some pointers on how to get the > z-scores based on eigenvectors scaled by the square-root of the > eigenvalues? > > I understand that A=UL^0.5 > (A=scaled eigenvectors, U=eigenvectors, L=eigenvalues) > > and to get the z-scores, the equation is > > Zi=UiT[X-Xbar] > > T=transpose, Xbar=mean vector of the sample > > The operation is to multiple the deviation matrix by the eigenvectors (I > think). > I am using matlab for the matrix manipulation, but can't seem > to get the right answer. First, UiT rearrange the eigenvectors in > rows and the variables in columns, for a data matrix of n specimens > by p variables, the transposed matrix of eigenvectors is m by p, which > is different in dimension to the deviation matrix which is n by p, and > the matrices cannot be > multiplied. If I use [X-Xbar]*U, the resultant matrix is n by m, yet > the values are not correct. I think I am missing some vital steps, > could someone give me a couple of pointers on how to get the z-scores > for principal component analysis using matrix operation? First, you are very close. The problem is that in the formula of Zi=UiT[X-Xbar] as you have written above, it should be noted that [x-xbar] is a px1 vector deviating a single observation vector x from the mean vector xbar. Note that this is different from the way we normally think of a data matrix X, in which rows are normally observations and columns are normally variables. In this formulation that you are attempting to reproduce, x must be a column vector containing a single data point. (Naturally, you could make X a matrix if you wanted, but then X-xbar isn't a completely legal use of notation, as X is a matrix and xbar is a vector). See Jackson (1991) for a more complete description. Reference: Jackson, J. E. (1991), "A User's Guide to Principal Components", John Wiley and Sons, New York. See formula 1.4.1 on page 11. Furthermore, let me add the following sequence of random numbers so that my news server doesn't give me the error "more included lines than new" 4 2 3.14159 -81 -- +---------------------------------+------------------------------+ | Paige Miller, Eastman Kodak Co. | "Let's play some basketball" | | PaigeM@kodak.com | Michael Jordan in Space Jam | +---------------------------------+------------------------------+ | The opinions expressed herein do not necessarily reflect the | | views of the Eastman Kodak Company. | +----------------------------------------------------------------+Return to Top
Richard F Ulrich wrote: > I wonder what THAT means. The example seems to show COUNTERS, > which are incremented at different times by the different raters. > That would make sense of the word "sequential", but in that > case, the rest of the explanation (mentioning kappa, etc.) makes > no sense that I can see. Rich, Perhaps your criticisms could be more constructive? > Is there a procedure which can at the same time say the two > methods agree very little, and also suggest that method B performs > relatively well in that it tends to assign a higher value to > sentences than method A and therefore it could be used as a > predictor > of the values that method A would assign? Tony, The Wilcoxon Signed Rank test may be of use to you. It is a nonparametric test for paired data. While it cannot be used for model building and prediction, it can be used to test if method A = method B. Karen Gilbert ** I'm having trouble with my news server. My appologies if this is a duplicate post!Return to Top
Return to Top
I have been asked by a colleague a question regards the scaling of marks. ie. which is preferable - a sliding scale, or proportional scaling, . . . . . . or some better method? For example if a set of marks has an average of 55% and it is required to bring the average down to say, 50%, then one method would be to subtract 5% from all marks. Another would be to multiply them by 50/55. I'm not sure if this is the right forum for this question, and I apologise for this in advance. My personal opinion is that the choice of method is as much a "political" issue as a statistical one. However I would appreciate some comments from experts. It may appear a trifling matter but a lot of time is spent in committees discussing such transformations on marks. David Sedgwick scdas@twp.ac.nzReturn to Top
We have just started the beta testing of a new software for sequential experimental design and optimization. The methods used are the simplex algorithm and fuzzy set handling of multiple criteria. The MultiSimplex software is an Excel 5 and 7 add-on running on all Windows platforms. We are now inviting those interested to apply for participation in the beta test. More information can be found on our web page, http://ourworld.compuserve.com:80/homepages/Bergstrom_Oberg, or by sending me an email. Regards, Tomas Oberg Bergstrom & Oberg email: tomas.oberg@bergstrom-oberg.se fax: +46 455 27922Return to Top
The RadEFX(sm) Radiation Health Effects Research Resource at Baylor College of Medicine has recently completed development of a WWW data base to compile information on all radiation health effects studies. In particular, we are especially interested in Chernobyl-related projects and other projects involving radiation epidemiology. Investigators can submit their information at URL http://radefx.bcm.tmc.edu. Leif PetersonReturn to Top
It is unclear from your query just what the problem is. I've done similar kinds of things, and it basically involved slaving over the manuals, learning how to use functions and output datasets from the descriptive PROCs. If you have run into specific insoluble problems, describe them, and I or others would probably be of more use to you. ChrisReturn to Top
>Several different procedures have been proposed to test for normality. >Graphical assessment may be useful, especially in conjunction with >other methods. In general, goodness-of-fit methods (e..g, chi-square or >Kolmogorov-Smirnov) do not perform well (in that they have rather low >power). The power of the Shapiro-Wilk test is quite good but the >procedure is cumbersome With good statisticial packages is is just a mouse-click away..... _______________________________________________________________________ Hans-Peter Piepho Institut f. Nutzpflanzenkunde WWW: http://www.wiz.uni-kassel.de/fts/ Universitaet Kassel Mail: piepho@wiz.uni-kassel.de Steinstrasse 19 Fax: +49 5542 98 1230 37213 Witzenhausen, Germany Phone: +49 5542 98 1248Return to Top
>In article <9612031234.AA14617@fserv.wiz.uni-kassel.de>, >Hans-Peter PiephoReturn to Topwrote: >>>In article <9612020859.AA19921@fserv.wiz.uni-kassel.de>, >>>Hans-Peter Piepho wrote: >>>>>In article , >>>>>Mike wrote: >>>>>>Andrew Kukla started a thread: > >>> ................ > >>>>If you assume that X and Y are bivariate normal, and you want to fit a line >>>>to the x,y scatter plot to describe the relationship (without need to >>>>predict either y from x or x from y), a principal component analysis PCA >>>>seems quite appropriate. Of course you would do the PCA on the >>>>variance-covariance matrix (unstandardized data), not on the correlation >>>>matrix (normalized data). > >>>I agree that one should not do it on the correlation matrix, for which >>>the principal component line is the 45 degree line (positive correlation) >>>or the -45 degree line (negative correlation). But the principal >>>component line depends on scaling, and apart from the direction coming >>>from the sign of the correlation, nothing specific can be said. >>>-- > >>If x and y are bivariate normal, the following can be said: We can draw a >>confidence region centered at the mean of x and y. This will be an ellipse. >>The first principle component coincides with the "long" axis of this >>ellipse. (see Johnson and Wichern. Applied multivariate statistical analysis). > >There is no problem with this. But now change the scale of one of the >variables without changing that of the other. The long axis of the >ellipse changes. > >If there is non-zero covariance, the new long axis can come from any >line in the same quadrant, measured from the means, as the old. So >the set of all principal component lines for different scalings convey >no information other than the sign of the covariance. >-- But why should one want to change the scale? Usually, we have chosen to measure x and y on a certain scale, and we want to describe the relationship on that scale, not any other scale. _______________________________________________________________________ Hans-Peter Piepho Institut f. Nutzpflanzenkunde WWW: http://www.wiz.uni-kassel.de/fts/ Universitaet Kassel Mail: piepho@wiz.uni-kassel.de Steinstrasse 19 Fax: +49 5542 98 1230 37213 Witzenhausen, Germany Phone: +49 5542 98 1248
Hans-Peter Piepho wrote: > > >Several different procedures have been proposed to test for normality. > >Graphical assessment may be useful, especially in conjunction with other methods > > With good statisticial packages is is just a mouse-click away..... > I am enthusiastic about using transformations like log(X+k) and repeated use of normal probability plotting as described in Afifi AA, Clark V. Computer-Aided Multivariate Analysis. (2 ed.) New York: Van Nostrand Reinhold Co, 1990:505. In multiple linear regressions I always get very solid results to judge from residual analyses that are part of the Statistica software.Return to Top
Hello, I'd appreciate some help on the following question. Suppose one measures a variable in control and then in an experimental condition. One way to test whether the variable is different in these two conditions is to perform a paired t-test (in which for each experiment, the control and experimental values are subtracted, and the mean of these differences is compared with their variance). This approach works very well if the data is additive: that is, if one expects the same absolute magnitude of change in the variable each time one does the experiment. However, suppose one expects that the experimental effect is multiplicative, not additive. For example, one might expect that a particular concentration of an antibiotic will kill off around half the total number of bacterial cells in a dish, no matter how many cells the dish contains to start with -- 10, 100, 1000, or a billion. If the number of cells in the control condition varies quite a bit, the variance in the differences between control and experimental conditions will be enormous, and the paired t-test is of little use. There are several ways to take care of this problem; I suspect that not all of them are correct. One is to divide each experimental value by the control value instead of to subtract them. Another is to subtract the logarithm of the experimental value from the log of the control value, and then do the test on the resulting values. (Of course the difference between two logs is the same as the log of one value divided by the other [ie, log(E)-log(C) = log(E/C)].) I have been told that the latter method (taking the log transform) is the correct one, but I'm not sure about this, and I'd like an explanation for why it is acceptable to use log values. If anyone could provide one, or at least point me towards a textbook or other reference that discusses this in detail, I would really appreciate it. Thanks a lot, Saleem Nicola "Freedom is like oxygen. One may not even notice it when you have it, but only appreciate it when you don't have it." --Wu'er KaixiReturn to Top
many MANY years ago .. when i was working in canada ... i heard about this little thing called the MAGIC MEDIAN MODIFIER ... seems as though there was some dictum that ... all averages of marks (grades) in schools ... had to be 65% ... so, if a teacher had a class that had a final average higher or lower ... out whipped the MMM ... no relation to the 3 M company ... and here is about how it worked. there was a grid ... and ... a small moveable ruler. one aligned YOUR average with a certain postion on the grid ... and then would read off all the adjusted grades for students ... such that the net result was a new class average of 65%. i had a short article once that described/justified this MMM ... and i will try to find it again. anyway ... i found the entire "notion" to be rather stupid .. it was like saying that all classes were equal ... which we clearly knew was NOT the case. as for adjusting a given set of marks ... keep in mind that adding/subtracting a constant will only shift up or down the average but have NO impact on the spread of scores ... but, if you add or subtract some constant PERCENTAGE (like 5%) ... then the average will change by that amount AND the spread will change too .... if there is any additional interest in this ... i will try to dig up the paper mentioned above ... and expand a bit on it. At 12:42 PM 12/6/96 +1300, you wrote: > I have been asked by a colleague a question regards the scaling of >marks. ie. which is preferable - a sliding scale, or proportional >scaling, . . . . . . or some better method? > For example if a set of marks has an average of 55% and it is >required to bring the average down to say, 50%, then one method would >be to subtract 5% from all marks. > Another would be to multiply them by 50/55. >I'm not sure if this is the right forum for this question, and I >apologise for this in advance. My personal opinion is that the choice >of method is as much a "political" issue as a statistical one. >However I would appreciate some comments from experts. It may appear >a trifling matter but a lot of time is spent in committees discussing >such transformations on marks. > David Sedgwick >scdas@twp.ac.nz > >Return to Top
Can someone recommend reference books/articles and softwares perform reciver operating characteristic analysis? Thanks! JaniceReturn to Top
Hi I have a question regarding the construction of a statistical test for the following problem. I have K populations, and I have a sample, S(i), of size n(i) from the i_th population, i=1,2,3,...,K. Each element in S(i) can be categorized binomially, i.e., it can be assigned a value of 1 or 0, and the probability that it is assigned 1 is given by P(i). P(i) is unknown, and all I really can measure is the frequency of 1s in my sample and this I denote as f(i). I now have an item, Q, that has the value 1, and I want to know to which population Q belongs. The way I thought of doing this - probably long-winded but it seems to make sense to me - is to maximise the following probability: L(i) = P( Q=1 | Q is from population i) Obviously,for each population, L(i) = P(i), but P(i) is unknown, and although I can estimate P(i) from f(i), some samples are going to have small values of n(i)(<5), so that f(i) is going to be an unreliable estimate of P(i). Therefore, it occurred to me that I could calculated a weighted estimate of P(i) as follows: 1 n(i) a = integral P(i)* C P(i)^f(i)*(1-P(i))^(n(i)-f(i)) dP(i) 0 f(i) 1 n(i) b = integral C P(i)^f(i)*(1-P(i))^(n(i)-f(i)) dP(i) 0 f(i) P(i) = a/b This is simply the parameter P(i) multiplied by the binomial probability of getting f(i) elements equal 1, in a sample of size n(i), all integrated with respect to P(i) from 0 to 1 - and then divided by the binomial probability of getting f(i) integrated wrt P(i) from 0 to 1. This estimate of P(i) converges with f(i) as the sample size gets larger. However, doing it this way, I don't think I am penalising populations that are represented by small samples. So - finally, I can ask my question. Since I've effectively calculated a likelihood for group membership, I want to test whether the ML group is statistically better than other groups. How should I do this? Should I use a likelihood ratio test between the ML group and the group with the next highest likelihood? Or should I use a likelihood ratio test (LRT) between the ML group and the sum of the likelihoods of all other groups not including the ML group? Should I use a LRT at all? I apologise for this long-winded post, and I appreciate any assistance. Thanks. Allen RodrigoReturn to Top
On Fri, 29 Nov 1996 18:38:39 -0300, Julio Cesar VoltoliniReturn to Topwrote: > >Dear friends, > >I am a biologist and we are collecting mammals at different strata of the >Brazilian Rainforests. > >I would like to do some statistical tests but I need to know if my data have >normal distribution. I am starting to use ESTATISTICA and SYSTAT and I would >like to do the tests in this packages. May I test the normality in EXCEL too ? > As far as i know, Statistica does not perform normality tests in versions prior to 5.0. Only probability plot. You can do it with Systat though. In the NPAR module, go to Kolmogorov-Smirnov, you will have the choice of many tests. No Shapiro-Wilk nor Anderson-Darling though. In statistica 5.0, you have the choice between Kolmogorov-Smirnov, Lilliefors and Shapiro-Wilk under the menu Frequency Tables. R
Is there a test for H0:Pearson-Rho=1? I found tests for Rho=0 and Rho=Rho0 with Rho0<1. Can't gat to find one for testing Rho=1. Any suggestion? RReturn to Top
Can someone recommend reference books/articles and softwares perform receiver operating characteristics analysis? Thanks! JaniceReturn to Top
In article <586qf6$9ah@usenet.srv.cis.pitt.edu>, wpilib+@pitt.edu (Richard F Ulrich) wrote: > -- You have me baffled already. Spot the cells? I do not know how > log-linear analyses may do that, that Pearsonian analyses cannot do. > Do you have some particular computer program in mind, which has a > very nice implementation? I have seen studentized residuals, for > looking at contributions of cells, from ordinary Pearson tables. > I am perhaps going to baffle you a third time, sorry for that. Once a log-linear model has been selected, you can assess the contribution of an effect to a cell. For this you divide the parameter estimate by its standard deviation and compare this ratio with the critical z. For the cluster question, thanks for your comments. I think this is a question of personal preferences. But I retain yours. > > > Hope this helps. > > > Rich Ulrich, biostatistician wpilib+@pitt.edu > Western Psychiatric Inst. and Clinic Univ. of Pittsburgh Yes, it helps! -- F.Bellour PhD Student U.C.L. Belgium E-mail: bellour@upso.ucl.ac.be Phone office: 00-32-10-478640Return to Top
nakhob@mat.ulaval.ca (Renaud Langis) wrote: >Is there a test for H0:Pearson-Rho=1? > >I found tests for Rho=0 and Rho=Rho0 with Rho0<1. Can't gat to find one for >testing Rho=1. > >Any suggestion? > >R Generate k bivariate datasets (with the same sample size as your observed dataset) under the null hypothesis of a perfect positive linear relationship, compute the Pearson correlation for each dataset, and compare those simulated Pearson correlations to the observed Pearson correlation. All the simulated correlations will be one, so your (directional) P-value will be essentially zero (unless your observed Pearson correlation itself is also 1). Or in other words, I think this isn't a very interesting null hypothesis (are there any?) because it will always be rejected unless you have a perfect positive linear relationship in your data. With kind regards, Pat. _____________________________________________________________________________ Patrick Onghena patrick.onghena@ped.kuleuven.ac.be Katholieke Universiteit Leuven Department of Educational Sciences Tel1: +32 16 32.59.54 Vesaliusstraat 2 Tel2: +32 16 32.62.01 B-3000 Leuven (Belgium) Fax : +32 16 32.59.34 http://www.kuleuven.ac.be/facdep/psy/eng/onderz/methped.htm _____________________________________________________________________________Return to Top
Antony wrote: >Missings are qualitatively different from other data so coding them as a >numeric value does not appeal. Our plots (as you can see by linking from >our webpage: http://www1.math.uni-augsburg.de/ to MANET) treat missings >quite differently so that you are always aware they are missings and not >actual values. More specifically there are technical problems: 999 is a >numeric value and could arise naturally. If it is used in plots it can >seriously distort the scale and what it does to statistics if one is not >careful does not bear thinking about. I meant '999' as an impossible value. I would not use such a code for (e.g.) a distance between two cities variable (unless I was having a particularly bad day :->) but I might use (e.g) '9' for a binary categorical variable coded with 1/2. I will have a look at your page. >Good software is like that, it helps you to get results more >quickly and more easily, although some people always prefer to walk. Proper (i.e. versatile) missing value handling is essential to any data- analysis package worthy of the name. I often read posts on this group by people using packages such as Excel for data analsysis and despair. -- Mark MyattReturn to Top
Does anybody have any software (e.g. SAS macros or the like) for propensity score matching? Specifically I am interested in mat Mahalanobis metric matching within propensity score calipers. Alan Zaslavsky zaslavsk@hcp.med.harvard.eduReturn to Top
HERE IS A TEST OF THE NULL OF RHO BEING 1 ... TAKE A SAMPLE AND SEE WHAT THE CORRELATION IS ... IF IT IS ANYTHING OTHER THAN 1 ... REJECT THE NULL. At 09:31 AM 12/6/96 GMT, you wrote: >nakhob@mat.ulaval.ca (Renaud Langis) wrote: >>Is there a test for H0:Pearson-Rho=1? >> >>I found tests for Rho=0 and Rho=Rho0 with Rho0<1. Can't gat to find one for >>testing Rho=1. >> >>Any suggestion? >> >>R > >Generate k bivariate datasets (with the same sample size as your observed >dataset) under >the null hypothesis of a perfect positive linear relationship, compute the >Pearson >correlation for each dataset, and compare those simulated Pearson correlations >to the >observed Pearson correlation. All the simulated correlations will be one, so >your >(directional) P-value will be essentially zero (unless your observed Pearson >correlation >itself is also 1). Or in other words, I think this isn't a very interesting >null hypothesis >(are there any?) because it will always be rejected unless you have a perfect >positive >linear relationship in your data. > >With kind regards, >Pat. >_____________________________________________________________________________ > >Patrick Onghena patrick.onghena@ped.kuleuven.ac.be >Katholieke Universiteit Leuven >Department of Educational Sciences Tel1: +32 16 32.59.54 >Vesaliusstraat 2 Tel2: +32 16 32.62.01 >B-3000 Leuven (Belgium) Fax : +32 16 32.59.34 > > http://www.kuleuven.ac.be/facdep/psy/eng/onderz/methped.htm >_____________________________________________________________________________ > > =========================== Dennis Roberts, Professor EdPsy !!! GO NITTANY LIONS !!! 208 Cedar, Penn State, University Park, PA 16802 AC 814-863-2401 WEB (personal) http://www2.ed.psu.edu/espse/staff/droberts/drober~1.htmReturn to Top
>As far as i know, Statistica does not perform normality tests in versions prior >to 5.0. Only probability plot. You can do it with Systat though. In the NPAR >module, go to Kolmogorov-Smirnov, you will have the choice of many tests. No >Shapiro-Wilk nor Anderson-Darling though. >In statistica 5.0, you have the choice between Kolmogorov-Smirnov, Lilliefors >and Shapiro-Wilk under the menu Frequency Tables. > As I remember, pre-5.0 Statistica has the K-S test. --- Timothy A. Dierauf, PE Solar Energy Applications Laboratory Department of Mechanical Engineering Colorado State UniversityReturn to Top
----------------------------Original message---------------------------- Dear list owner: This announcement may be appropriate for subscribers to your list. Please post if you think the subject matter would be of interest. Thank you for your time. Deborah Clark Center for Distance Learning Central Michigan University Mt. Pleasant, MI (517) 774-7143 --------------------------------------------------------------------------- Central Michigan University is going beyond the books with 10 undergraduate courses being offered on the Internet, beginning in January 1997. Statistics 382, Elementary Statistical Analysis will be offered on the World Wide Web. Registration for this and other CMU web courses begins Dec. 9, 1996. The 12-week term begins Jan. 6, 1997. Tuition for CMU web courses is $95.90 per credit hour. There is a one-time CMU admission fee of $50. If you have already been admitted to take CMU courses, then you are not required to pay the admission fee. Statistics involves collecting and organizing data, describing it and using it to infer information about the subject being studied. This course will examine topics such as different types of data, probability, variables, distribution, hypothesis testing, and correlation. Come along for an educational lift into cyberspace and the world of statistics with Dr. Ken W. Smith, professor of mathematics at Central Michigan University and director of institutional research at CMU. CMU's College of Extended Learning has always been at the forefront in offering quality off-campus degree programs. For 25 years, CMU has expanded that quality throughout the state of Michigan, the United States, Canada and Mexico. Now, CMU has taken that quality into cyberspace through its Center for Distance Learning with learning package courses on the World Wide Web. Each student will receive a syllabus, lectures and other material for the course via the World Wide Web. CMU's virtual learning center will feature multiple levels of communication for students enrolled in the course. Interactive chat sessions will be scheduled between the students and the instructor in each course. A message center that operates as a forum for student e-mail has been established for students to communicate among themselves about informal topics and about course topics such as assignments, projects and upcoming examinations. Students will also have access to e-mail addresses for all members of the class and the instructor. CMU is also offering the following courses on the World Wide Web, beginning in January. Accounting 201 (Principles of Accounting) 3 credit hours Astronomy 111 (Astronomy) 3 credit hours Astronomy 112 (Introduction to Astronomical Observations) 1 credit hour Business Information Systems 106 (Spreadsheet Concepts) 1 credit hour English 323 (Fantasy and Science Fiction) 3 credit hours Health Promotion and Rehabilitation 523 (AIDS Education) 1 credit hour Health Promotion and Rehabilitation 529 (Alcohol Education Workshop) 1 credit hour Health Promotion and Rehabilitation 530 (Drug Abuse Workshop) 1 credit hour Religion 334 (Death and Dying) 3 credit hours Technical recommendations for taking these classes are: SYSTEM: Multimedia PC 486 or Pentium, Macintosh 68040, or Power PC SOFTWARE: Netscape 2.0 or Microsoft Internet Explorer, Adobe Acrobat Reader 2.0, Netscape Mail or Internet compatible mail system Those interested in taking the course may look at a preview of some of the courses at the following web site: http://www.cel.cmich.edu/dlonline.htm To register, or for more information about the courses, please send e-mail to john.mcmahon@cmich.edu. or call 1-800-688-4268.Return to Top
Richard F Ulrich wrote: > Clay Helberg (chelberg@spss.com) wrote: > : NO! From a statistical standpoint, the null can make any specific > : prediction about the relationship you want. It is true that social > : scientists *usually* specify "no difference" or "no effect", but there > : is no reason in the world for it to be that way. In fact I have argued > : elsewhere that this preponderance of performing tests against such > : straw-man null hypotheses (which are often clearly false a priori) holds > : back social science from fulfilling its potential for scientific > : discovery. > > : You can specify a null hypothesis which states "the difference between > : group 1 and group 2 is exactly 5 Zuleks", or "the regression slope of > : Foo regressed on Bar is less than or equal to 3." There is no need to > : use (and no excuse for using) the default "no effect" null hypothesis > : when you have something more specific in mind. > > Maybe I am confused, too, but I tend to agree with Chauncey, that, > "null is nil." For an Odds ratio, for instance, *ordinarily* > 'nil' is OR= 1.0; but the statistical test, if you want to write > out the terms, is > absolute value of (Group1-Group2) minus 1 equal 0 > or ' ... minus 5 Zuleks equal 0'. This is tautological--you can always rearrange an equation so that there is a zero on one side. The point I was objecting to was the automatic assumption that it refers to "no difference" or "no effect" (not, as in your example, where a specific difference is given, but the equation is rearranged so the hypothesis reads "the observed difference minus the hypothesized difference equals zero"). > There is a 'nil' in there somewhere, or you have a funny idea > of a null hypothesis. Usually, there is a very rational/logical > reason for what constitutes the null=nil though I do imagine > the lax case as being, arbitrarily, 'some value previously > observed', which is what people look at on process-control charts. Unfortunately, all too often the default null of "no difference" is used because it is convenient (it is generally what you get from computer-generated output), or because the theory under investigation is so vague as to preclude reasonable point predictions. > I do NOT see a string of hypothesis, of which H-sub-zero is simply > the lowest number. Well, in Hays (Statistics), he lists the symbol for the null hypothesis as H-sub-zero, and the symbol for the alternative as H-sub-one. This usage is also given in Hogg & Craig (Introduction to Mathematical Statistics) and Vogt (Dictionary of Statistics & Methodology). In fact, here is a relevant quote from Hays (4th ed., p 249):Return to TopIncidentally, there is an impression in some quarters that the term "null hypothesis" refers to the fact that in experimental work the parameter value specified in Ho is very often zero. Thus, in many experiments the hypothetical situation "no experimental effect" is represented by a statement that some mean or difference between means is exactly zero. However, as we have seen, the tested hypothesis can specify any of the possible values for one or more parameters, and this use of the word *null* is only incidental. It is far better for the student to think of the null hypthesis Ho as simply designating that hypothesis actually being tested, the one which, if true, determines the sampling distribution referred to in the test. I couldn't have said it better myself.... --Clay -- Clay Helberg | Internet: helberg@execpc.com Publications Dept. | WWW: http://www.execpc.com/~helberg/ SPSS, Inc. | Speaking only for myself....
Dave Sedgwick (SCDAS@TWP.AC.NZ) wrote: : I have been asked by a colleague a question regards the scaling of : marks. ie. which is preferable - a sliding scale, or proportional : scaling, . . . . . . or some better method? : For example if a set of marks has an average of 55% and it is : required to bring the average down to say, 50%, then one method would : be to subtract 5% from all marks. << rest, deleted... >> "IT IS REQUIRED" is an awfully bland statement of a mandate. What are these scores used for later on? If you want the median to be at 50%, then transform the scores into centiles - When you start with ranks and number of cases, there are several variations on the formula, but just choose one. If you want to say that you have a standardized score with the mean of 50, and the Standard deviation of 10, then you can generate T-scores - either by starting with Ranks and assuming a normal distribution, or by using the simple computations on the mean and standard deviation. : Another would be to multiply them by 50/55. : I'm not sure if this is the right forum for this question, and I : apologise for this in advance. My personal opinion is that the choice : of method is as much a "political" issue as a statistical one. : However I would appreciate some comments from experts. It may appear : a trifling matter but a lot of time is spent in committees discussing : such transformations on marks. If your only examples are 50 vs 55, and there are not a lot of scores at 100, then there is little difference between the two options that you describe. Since you do not mention what they are for, there is little very intrinsic, to recommend either. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
In articleReturn to Top, ebohlman@netcom.com (Eric Bohlman) wrote: > > Oops, you've conflated two separate incidents. The Literary Digest poll > was for the 1936 election, and predicted that Landon would defeat > Roosevelt. The vote in that election split along economic lines, with > wealthier people favoring Landon and poorer people favoring Roosevelt. > In 1936, telephone subscribers tended to be wealthier than the general > population, and thus the sampling procedure oversampled Landon voters and > undersampled Roosevelt voters (the same biased sampling method came up > with correct predictions for the 1928 and 1932 elections, because the > vote in those elections didn't have such a strong economic split). > > The incorrect prediction in 1948 (Dewey vs. Truman) wasn't due to invalid > methodology; political and economic events that occurred in between the > poll and the election caused a lot of voters to change their minds. Yes, I had the wrong election, (shows why one should not stay up late and make posts), but the sampling bias question is another matter. I am remembering an article that appeared in the American Statistician circa 1980 with a title something like: "The Making of a Statistical Myth: The 1936 Literary Digest Poll" Unfortunately, I lost that issue in a fire, but I remember the author's conclusion. The samling procedure did not oversample Landon voters to the extent that it would explain the large prediction error. The author assets that the problem was response bias and not sampling bias. His reasoning was as follows. By 1936 those people who were opposed to Roosevelt tended to hold those opinions very strongly and would thus be more likely to mail back their questionnaire to express their feelings. It is common for mailed responses to run less than 10%, and in such a cases the response bias will significantly distort the results. The author presented an analysis to back up his conjecture including why telephone sampling did not significantly oversample Landon voters. He also gives a history how the myth got started and why it is so enduring. Paul Johnson in his book "Modern times" (Harper and Row 1983) discusses the depth of feeling against Roosevelt in the 1930's.
Sometimes, on sloppy snow day fridays, one waxes philosophical ... Earlier today, i received the following in my readerlist. Some of the post has been omitted ... ------------------ Central Michigan University is going beyond the books with 10 undergraduate courses being offered on the Internet, beginning in January 1997. Statistics 382, Elementary Statistical Analysis will be offered on the World Wide Web. Registration for this and other CMU web courses begins Dec. 9, 1996. The 12-week term begins Jan. 6, 1997. Tuition for CMU web courses is $95.90 per credit hour. There is a one-time CMU admission fee of $50. If you have already been admitted to take CMU courses, then you are not required to pay the admission fee. ---------------- Why should this bother me? (NOTE: My post here is not meant AT ALL to be a criticism of the course above ... nor suggest that CMU does not have the right to do this) Well, here at Penn State for example, there is a push (like I am sure there is at other places) to become a member of the "World University" ... ie, offer things that people all over the world can take advantage of. So far ... so good. But, what does this really mean? The bottom line for doing something like the above ... is to seek more sources of revenue generation and, if we can offer something that will be bought by people in Oregon, and Texas, and the UK ... that means bucks for us. Notice that while much of this material might be accessed at will, if you want CREDIT for it ... you have to pay. So, why is this bad ... and how is this related to Walmart? We all know what happens when Walmart comes to town ... smaller business suffer ... some going out of business ... and while that might be good for consumers in general if prices fall ... it hurts people/jobs and the like. Now ... if one place offers an introductory stat course via the web ... and starts to generate some revenue for it ... then you know that it will not be long before place B, and C, and .... etc. will put their own course on the web since, the fact that place A does it means the POTENTIAL of taking some revenue away from you! And ... we must be competitive! Thus, it is only a matter of time before place B and C bring out their versions of the course and, instead of charging $95.90 per credit hour, offer the course for $90 ... or then D gets into the act and only charges $87.50 for each credit hour. Then, since some of the smaller institutions that are living on the edge ... that have to have that $95.90 per credit hour to survive .. find that the way that they can recoup their investment and keep their course attractive ... is to make their couse EASIER to complete ... less difficult/challenging for those who might register. Get the picture? The big players will wipe out the smaller ones ... and the greed continues. The Walmart schools will knock out the Central Michigans ... either by taking a loss financially or ... making their courses more "user friendly". I see this coming ... and it will be here sooner than you think. Any thoughts? =========================== Dennis Roberts, Professor EdPsy !!! GO NITTANY LIONS !!! 208 Cedar, Penn State, University Park, PA 16802 AC 814-863-2401 WEB (personal) http://www2.ed.psu.edu/espse/staff/droberts/drober~1.htmReturn to Top
All the discussion about CIs is fascinating me ... why? I am not sure. Look at the following. I generated a random sample with n=100 ... from a population with fixed mu and sigma values, and built the 95 and 68 percent CIs ... for the population mean. MTB > tint c1 Confidence Intervals Variable N Mean StDev SE Mean 95.0 % CI C1 100 102.19 14.90 1.49 ( 99.23, 105.14) MTB > tint 68 c1 Confidence Intervals Variable N Mean StDev SE Mean 68.0 % CI C1 100 102.19 14.90 1.49 ( 100.70, 103.68) MTB > --------------------- My first question is ... what EXACTLY can we state verbally ... that is accurate ... about the SPECIFIC interval of 99.23 to 105.14 ... or the SPECIFIC interval of 100.7 to 103.68? My second question is ... what EXACTLY can we state verbally ... that is accurate ... about the TWO intervals together ... ie, what can we correctly and accurately say when comparing the interval of 99.23 to 105.14 ... with the interval of 100.7 to 103.68? (NOTE: please don't say that ... the first is wider than the second ... I DO know that!) More to come ... =========================== Dennis Roberts, Professor EdPsy !!! GO NITTANY LIONS !!! 208 Cedar, Penn State, University Park, PA 16802 AC 814-863-2401 WEB (personal) http://www2.ed.psu.edu/espse/staff/droberts/drober~1.htmReturn to Top
Donn C. Young wrote: > I'd suggest taking a look at this book - Gould is a baseball freak > and includes much information on why batting averages have fluctated > over the years - and why baseball is the only sport where these > phenomena can be studied. > Thanks for reminding me about that. I saw Gould on the Charlie Rose show about a month ago and meant to grab the book, but I forgot about it. Thanks. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sean Lahman - lahmans@vivanet.com Sean Lahman's Baseball Archive http://www.vivanet.com/~lahmans/baseball.htmlReturn to Top
In most standard books on statistics you can read how to compare two normal distributions, i.e. how to test wheater the means of the two distributions are equal. If you have a small sample taken whitout replacement the relevant distribution will be the hypergeometric distribution. If you further more have two samples of this kind and you want to compare there means the computations becomes a bit dificult. Does anybody know how to construct a test for such a situation or mabye know where I can read about it? Any help is appriciated. (Sorry for my poor english) Klaus, DenmarkReturn to Top
In most standard books on statistics you can read how to compare two normal distributions, i.e. how to test wheater the means of the two distributions are equal. If you have a small sample taken whitout replacement the relevant distribution will be the hypergeometric distribution. If you further more have two samples of this kind and you want to compare there means the computations becomes a bit dificult. Does anybody know how to construct a test for such a situation or mabye know where I can read about it? Any help is appriciated. (Sorry for my poor english) Klaus, DenmarkReturn to Top