![]() |
![]() |
Back |
I have data that I wish to analyze using an unbalanced incomplete block design, to test for differences among treatments. I would greatly appreciate information on what software exists for this sort of analysis. Briefly: There are 5 blocks and 5 treatments. There are no replicates for most of the cells. I would like to use a nonparametric technique if possible. This might include the Durbin Test or some sort of permutation/resampling technique. Thanks very much for any advice you can share. Rob Ross Institute of Life and Earth Sciences Shizuoka University 836 Oya Shizuoka 422 JAPAN serross@sci.shizuoka.ac.jpReturn to Top
On Sat, 23 Nov 1996, Igor Kozintsev wrote: > (sorry for this latex code) If you are really sorry, why not provide an ASCII translation? Greg Gregory E. Heath heath@ll.mit.edu The views expressed here are M.I.T. Lincoln Lab (617) 981-2815 not necessarily shared by Lexington, MA (617) 981-0908(FAX) M.I.T./LL or its sponsors 02173-9185, USAReturn to Top
Greetings to all (and Happy Thanksgiving!), I hope this isn't too much of a distraction from the technical issues addressed in this forum, but I need some information about consulting rates. I've recently joined an organization that provides consulting services in engineering and process improvement. Up until my hire, the organization was somewhat lacking in formal statistical methods. My role has been to provide internal and external consulting and training in applied statistics. The external customer is invariably an automotive manufacturer or one of its higher level suppliers. Can anyone tell me what an appropriate hourly rate might be for these services? It seems to me that our rate is a bit low, but I really don't know. I'd be interested in hearing as many opinions on this as possible. Thanks in advance, M.Return to Top
Responsibilities Include: Measurement and analysis on integrated drug, medical, and self report data, and patient profile databases to assess the effectiveness of clinical programs (including Seniors', disease management and DUR). The incumbent will coordinate across departments ensuring that established outcomes and analytic goals are met. The areas to be managed include data management, statistical anallysis and project reporting. The consultant should be able to execute statistical programming and generate standard and ad-hoc reports. Qualifications: Must have significant experience programming SAS, a Master's Degree in statistics, or a related mathematical / epidemiological/ computer field. It is preferred that the incumbent have 3-5 years post-graduate work experience and proficiency with data. A detail oriented and organized individual who can work independently and on multiple projects simultaneously is preferred. Ref-3652 Our regional offices located in Waukegan, IL, Durham NC, Palo Alto, CA, and Princeton, NJ provide off-site and on-site services to clients in over 30 states. Our Clinical Trial Management group assists Research and Development organizations in the monitoring and quality assurance of clinical trial studies. Or System Professionals function in all aspects of the application development life cycle, manage data, and develop reports. In Research Statistics, we design statistical experiments, evaluate the results for the areas of Life Science, Finance, Marketing, Economics, and Engineering. Please reply to: priresumes@trilogycnslt.com Trilogy Consulting Corporation 101 Carnegie Center, Suite 211 Princeton, NJ 08540 http://www.trilogycnslt.com/TCC_Home fax: 609.520.0730Return to Top
The Bass Statistical System was developed as a SAS clone in the 1980s. I am interested in aquiring a (legitimate) copy of this.Preferably both programs and manuals. Henry Voss, 11 Llanfair Lane, Ewing NJ 08618-1011 phone (609)882-2612Return to Top
My question relates to using the t-test for the difference between means for uncorrelated data when one actually has correlated data, as a way of being conservative. I realize this is iffy on methodological grounds but am trying to understand the issue statistically and would appreciate help. An examination of the formula for the standard error of the difference between 2 means shows that when data are correlated, the size of the standard error is reduced. How much it is reduced depends on the size of the correlation coefficient. Since the standard error of the difference between means is the denomenator of the t-test, it follows that a researcher who performs the test using the formula for uncorrelated data when he actually has correlated data is using an unnecesarily stringent test. My question is: *how* unecessarily stringent; that is, given a particular set of data, how do I calculate how extra stringent I've been? Here's an example (you'll recognize it from my previous posting): a runner tests shoes A and shoes B using correct methodology, does a t-test for the difference between means using the formula for uncorrelated data, and concludes there is a statistically significant difference between the shoes at p=.05. According to Downie and Heath 1970, p. 73, in the section headed "Differences Between Means--Correlated Data", "We say that data are correlated when they consist of 2 sets of measurements on the same individuals..." Thus the runner is a correlated t situation, and the standard error (the denomenator of the t-test) calculated under the uncorrelated t-test formula will be reduced depending on the size of the correlation. Is this correlation 1, since it's the same runner? I don't think that's right, because I think maybe then the standard error becomes 0! If it's not 1, then how do I calculate it from my data? I believe I have the correct formula for quantifying "extra stringentness"; my problem is I don't know how to get the correlation coefficient required by that formula. When data are "correlated" in a correlated t-test scenario, how does one quantify *how* correlated they are? Any help much appreciated.Return to Top
On Wed, 27 Nov 1996, Ronan M Conroy wrote: > Does anyone know of software that performs tree-structured survival > analysis or logistic regression? There is code in the statlib S archive (called survcart) for survival analysis. S-PLUS also does binary tree-structured regression. thomas lumley UW biostatisticsReturn to Top
I have a data set containing 20 countries' data from 1980 to 1990. I like to calculate average value among 20 countries in each year from 1980 to 1990. My problem is that in some years, several countries' data are missing. When I make calculation, average values in these years seem not consistent with those of years without missing data. Especially, if important countries' data are missing, average value will be affected a lot. I am trying to use existing data to estimate missing value in some years. But time series are too short to be convinced. Any help and suggestions will be greatly appreciated. JunjiaReturn to Top
Are the following two equations correct? Expected Value = exp (mu^2+.5*sigma^2) Std. Dev.= (Expected Value ^2 *(exp(sigma^2)-1))^.5 I generated 2000 psuedo-random numbers from a lognormal distribution with parameters mu=2 and sigma=3, and noticed the above two formulae give values much larger than the simple avg and standard deviation of the actual sample. Isn't the sample average a point estimate of the expected value of lognormal pdf, and likewise sample standard deviation that of lognormal pdf?.Return to Top
In articleReturn to Top, Mike wrote: >Andrew Kukla started a thread: >> >>The formula for the regression line of Y on X is Y=b*X+a, where >> >>"b" is a slop and "a" is an intercept parameter. However, if I >> >>calculate the regression of X on Y (X=c*Y+d), then the regression >> >>line will differ from the previous substantially. I understand >> >>that different direction of deviations causes a different result >> >>of calculation of the least squares. >> >>1: If this is a mathematical phenomenon then shouldn't we use >> >>some other formula that could "average" these two regression >> >>lines and have only one? I suggest that you decide what you want it for, and then act accordingly. If you want to predict Y given X, the regression of Y on X is appropriate. If you wnat to predict X given Y, use the regression of X on Y. If you have a structural model, and the deviation from the line is due to an "error" which is uncorrelated with X, use the regression of Y on X. If you have a structural model, and both variables are subject to error, consult someone who understands the problem. NO simple regression procedure of any kind is appropriate. If the systematic part is normal, the problem is not identified, and there is no way of finding what should be wanted, unless one has additional information. >With a response from Hans-Peter Peipho >> >You could do a principal component analysis and fit the data cloud by the >> >first component. >The principal component idea is a good one, in that it minimizes sum of >squared residuals perpendicular to its major axis, and is the same "line" >without needing to decide which variable is "X" or "independent" and which >is "Y" or "dependent". This is an atrocious idea. It is dependent on the scales, and normalizing the variances introduces errors all over the place. If the appropriate problem is being considered, selection or scaling becomes irrelevant, but this is not the case for principal components. Do not try to find religious mantras. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (317)494-6054 FAX: (317)494-0558
In articleReturn to Top, Michael Kleiman (SOC) wrote: >My question relates to using the t-test for the difference between means >for uncorrelated data when one actually has correlated data, as a way of >being conservative. This may or may not be conservative; if the correlation is negative, it iw the other way. Also, using paired data eliminates the problem of unequal variances, and is generally less sensitive to normality. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399 hrubin@stat.purdue.edu Phone: (317)494-6054 FAX: (317)494-0558
It's said that in a Markov chain P(X_n | X_n-1, X_n-2,...) = P(X_n | X_n-1) Is this _always_ the case? can one use Markov methods when it does not apply?Return to Top
Many many thanks for all those who answered my question. Since I am unable to get the books some recommended, I used Excel to create a table of values in the range I want and inserted it into my program. It worked fine. thanks again, Rafael.Return to Top
In article <572cut$bg5@usenet.srv.cis.pitt.edu>, wpilib+@pitt.edu (Richard F Ulrich) wrote: > 1) Q: Is there a *reason* for ever preferring log-linear evaluation > of a contingency table over the Pearson test? or a reference? > > 2) A: Siegel does provide rank-order tests, etc. > > 3) Q: Since most cluster algorithms assume a meaningful DISTANCE > metric, rather than just ORDINAL variables, is Lorr(1983) special? and > how does this address the original question? (If the intention here > was to advise, `if you have a bunch of related variables, then you > should try to create a composite score'... then I like the intention.) > Dear M. Ulrich, I am glad to read your comments because you might correct my thinking which may well be faulty. So, I am going to answer to your two questions, but not with a defense tone. I would just like to have a constructive discussion on the topic. (1) As I have read it many times, log-linear analyses are a mean for grasping effects over all the cells of a model and spot those cells which are responsible for the effects. When you have a let's say 3 by 4 tables, the classical tests only allow you either to test the discrepancy between observed and expected for the whole table at once or to test whether to cells are different from each other and then test this again but for two other cells, and so on until you have done the whole table. But doing so many tests increases the error rate without controlling for it. That's why log-linear are more adequate. What do you think? Is this a good reason for using log-linear models? (2) I mentioned Lorr as a quick reference. He distinguishes within distance measures (a) metric measures which are suitable for interval scales and (b) non metric measures which are suitable for ordinal scale. My advise was that the choice of the distance measure depends on the type of scale as Lorr explains. Afterwards, the data can obviously undergo a cluster analysis in order to reduce the information they provide. What do you think? Is there any problem with such an advise? I hope you don't mind helping me to get a better understanding. Thank you for your previous comments anyway. Best thoughts, F.Bellour -- F.Bellour PhD Student U.C.L. Belgium E-mail: bellour@upso.ucl.ac.be Phone office: 00-32-10-478640Return to Top
David Nichols wrote: > > In article <56tft4$3oba@news.doit.wisc.edu>, > Dr MikeReturn to Topwrote: > >One of our grad students wishes to generate some randomly selected data > >arranged like those for a 1-way repeated-measures design. She would > >like to > >be able to specify > >(i) mean and standard deviation for the normally-distributed population > >for each variable and (ii) the correlations between the variables > >and then have some software package do the generation of some samples. > >We have software that conveniently generates individual-variable data > >sets > >assuming a normal distribution but not any that allows for generation > >of > >dependent samples. > > > >Does anyone know of a package that they would recommend for generating > >such multivariate samples? > > > >Any email responses will be greatly appreciated. > > > >- Mike Hogan < mehogan@facstaff.wisc.edu > > > Sr. Database Admin, CWC Project > > U of Wisconsin Psych Dept, Madison, WI > > > > You can do this with just about any software package. To do all aspects > of it, you'd need one that could do a Cholesky decomposition of a matrix > for you. Postmultiplying a data matrix created as pseudo-random normal > deviates by the upper triangular Cholesky decomposition of the correlation > matrix desired will produce the desired intercorrelations (if the variables > are perfectly uncorrelated to start with) or will produce correlations > as if the variables were created from a population with the desired > intercorrelations. > Myself and Mark Shevlin have written a paper, presented at the Computers in Psychology '96 in York, UK, which presents the SPSS code to do this. Those friendly people at the CTI Psychology Centre have put it on teh web at: http://www.york.ac.uk/inst/ctipsych/web/CiP96CD/MILES/XHTML/PAPER.HTM