![]() |
![]() |
Back |
I'm going to be performing a study which investigates hypothesized correlations between certain variables and an outcome. The outcome is binary, so I intend to use logistic regression. The purpose of the first phase of the study is to come up with a predictive index for the outcome, using those variables which turn out to be predictive. The second stage of the study will then be to test the index to see whether it is correlated with the outcome. I was planning to design the predictive index itself based on the correlation coefficients determined in the first phase of the study. For example, let's say the regression equation turns out to be y=.1a + .2b + .3c + .4d + .5e. Having previously determined that any correlation below, say, 2.47 could be due to chance, I reject variables a and b. The predictive index then becomes something like I=(3/12)c + (4/12)d + (5/12)e (12 being the sum of 3, 4 and 5), with threshold values of "I" being associated with certain probabilities of the outcome. For example, an "I" of .4 might be associated with a probability of 40% for the outcome. Can such probabilities can be estimated out of the first stage of my study? Should I use the relative correlations (r squared) instead of the correlations (r) for my index coefficients? I suspect not, since I want to come up with a high percentage result if only one of the variables is highly predictive. For example, let's say that variable "e" is so highly predictive that, if its value is above a certain threshold, the outcome is almost certain. But let's say the same can be said of variable "d". If I use the relative correlations, the index coefficients of these two variables will be deflated so much that, in a situation where "e" is very high but "d" low, the outcome may not be predicted. On the other hand, if I use the correlation coefficients for the index coefficients, either "d" or "e" being above a certain value will result in a nearly 100% outcome prediction. Thanks. Matt BeckwithReturn to Top
This is a second posting for anyone who might have missed the post of last month. The trading arm of a major investment firm is seeking a quantitative specialist for its New York based Analytical Equity Trading Group to work with its senior professionals in the on-going development of sophisticated statistical/econometric trading models and strategies. QUALIFICATIONS: The successful candidate will have in-depth knowledge of financial economics, time series econometrics, stochastic processes and the requisite skills necessary to design and implement strategies in a sophisticated computer environment. Comfort in dealing with Probabilistic notions such as Random Walk, Brownian Motion and Martingale Theory, combined with Econometric ideas such as Stationarity, Cointegration, Error-Correction Models and Arch/Garch is essential. This position would be ideal for someone with Wall Street experience in statistical arbitrage (i.e., pairs trading, basket trading, swaps), and/or academic training near or at the Ph.D. level. CONTACT: E-mail: nyrtd@ny.ubs.com Please reply via email with either a resume or a short informal description of yourself. Please include a day & evening phone number. We are an Equal Opportunity employer.Return to Top
A full-scale version of Survo working until the 1st of May, 1997 is now freely available from http://www.helsinki.fi/survo/ SURVO 84C (Survo) is a general environment for statistical computing and related areas. It is an integrated software system having a unique editorial interface of its own. Main activities of Survo are: - Statistical analysis and computing (standard methods + special features in multivariate analysis, e.g. graphical rotation in factor analysis, stepwise Wilks' lambda in cluster analysis, randomized tests in various situations, e.g. for contingency tables in different types of experiments, in nonlinear regression, automatic detection of parameters and symbolic computation of derivatives of model function, etc.) - Text processing, - Graphics (also user-defined types of statistical graphs), - Desktop publishing, - Editorial computing and computing in the touch mode, - Matrix interpreter (integrated with statistical functions), - General management of large numerical and textual data bases, - Making of expert applications and teaching programs. The center of activities in Survo is an edit field that at all times is partially visible on the screen. The edit field is maintained by the Survo Editor. The user works in Survo by typing both free text and commands in the edit field. When commands are activated, their results will appear in the same edit field. The results are also saved in files for subsequent processing. Warning! Survo goes far beyond the old-fashioned command-oriented, or mouse-driven user interfaces. Experienced Survo users can extend Survo activities by using a macro language of Survo as sucros. Even a great deal of the basic functions of Survo have been programmed as sucros. General job management in Survo is based on (hierachical) menus generated automatically according to the needs of the user. Survo is written in C. It is a collection of DOS programs, but the user sees it as an integrated environment. Many of the restrictions and defects of DOS have been relieved. Survo is an open system. Anyone can extend its capabilities either by the sucro language or in C. The programming tools are freely available. Survo works in any concurrent PC and also under multitask environments like Windows NT, Windows 95, OS/2, and Linux. In desktop publishing and in demanding report generating applications a PostScript printer is essential. Survo is at its best in 486 and Pentium PC's but it works also on lighter alternatives. To illustrate one of the smart features of the editorial approach of Survo think about the following tiny but revealing example. Assume that we are writing text with a WORD PROCESSOR as follows: Population statistics of Finland (31st October 1996): The number of males is 2499415 and the number of females 2630612. Thus the total population is ... The question is, how do you proceed in order to calculate and write the total number of inhabitants to the end of this statement as quickly as possible. Most people (although sitting at PC) still use pocket calculators or separate programs - really inconvenient! Anyhow, if it takes 15-60 seconds (as it does in typical statistical systems, editors and word processors), it is worthwhile to study the capabilities of Survo where it can be done in less than 5 seconds (by means of touch mode computing). More information from http://www.helsinki.fi/survo/ Seppo Mustonen Seppo.Mustonen@Helsinki.Fi Professor Department of Statistics P.O.Box 54 00014 University of Helsinki FinlandReturn to Top
I think you're overlooking one factor; namely the "brand name"--in other words, not all colleges are created equal. CMU courses compete with Penn State or other schools not just on price, but on reputation, perceived quality, prestige, etc.Return to Top
Saleem Nicola (nicola@phy.ucsf.edu) wrote: << asking about using a log-transform on biological data. See below>> When looking at biological variables like growth of cell populations or chemical concentrations, the 'natural' metric is often log-normal, or close enough to it. If you 'naturally' do talk about doubling periods, or half-lifes, then taking the log will give you a straight line when you plot 'quantity' against time. Biological activity that is investigated with concentrations considered in multiples is a probable candidate for log-transform of the concentration. Any time the largest data-value is much bigger than the smallest, and there is a natural zero, Tukey has written that transformations should be *considered*. There are some bad instructors out there who misinform, believing as they do that being countable meets the relevant standard for being 'equal interval'. -- That is not so. Hoaglin, Mosteller and Tukey(ed.) : "Understanding Robust and Exploratory Data Analysis" -- has a chapter on transformation. Tukey's "EDA" has one, too. DJ Finney, "Statistical Method in Biological Assay" has a larger discussion about 'bounded' growth, for which the logit (P/(1-P)) or other symmetrical transformations are indicated. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of Pittsburgh ======================original note========================== : Hello, : I'd appreciate some help on the following question. Suppose one measures : a variable in control and then in an experimental condition. One : way to test whether the variable is different in these two conditions : is to perform a paired t-test (in which for each experiment, the : control and experimental values are subtracted, and the mean of these : differences is compared with their variance). This approach works very : well if the data is additive: that is, if one expects the same absolute : magnitude of change in the variable each time one does the experiment. : However, suppose one expects that the experimental effect is : multiplicative, not additive. For example, one might expect that : a particular concentration of an antibiotic will kill off around half : the total number of bacterial cells in a dish, no matter how many cells : the dish contains to start with -- 10, 100, 1000, or a billion. If the : number of cells in the control condition varies quite a bit, the : variance in the differences between control and experimental conditions : will be enormous, and the paired t-test is of little use. : There are several ways to take care of this problem; I suspect : that not all of them are correct. One is to divide each experimental : value by the control value instead of to subtract them. Another : is to subtract the logarithm of the experimental value from the log of : the control value, and then do the test on the resulting values. (Of : course the difference between two logs is the same as the log of one : value divided by the other [ie, log(E)-log(C) = log(E/C)].) I have been : told that the latter method (taking the log transform) is the correct : one, but I'm not sure about this, and I'd like an explanation for why it : is acceptable to use log values. If anyone could provide one, or at : least point me towards a textbook or other reference that discusses this : in detail, I would really appreciate it.Return to Top
For a study I need tables, graphics, stats and data about megacities. If you have something, please send-it to me with the indications about the Font where you have take it. paco@freenet.hut.fiReturn to Top
one can estimate what the correlation might be IF the predictor measures are made MORE reliable .. but, the question is: if you do this ... are you merely fooling yourself? unless there is some real reason why you know why your data are NOT as reliable as they WILL be ... then i don't think correction for attenuation is a good idea ... At 09:16 PM 12/8/96 -0500, you wrote: >I have a number of variables (categorical and continuous) that have shown >some evidence of poor test-retest reliability. > >I heard that there is a procedure called "attenuation of parameter >estimates" due to poor reliability, however I am unsure what this is, or if >there are any references that I might be able to begin with. > >I am aware that in PROC CALIS (SAS) it is possible to include the >reliability of the variable in the model. However, this limits my analysis >boundaries, as I was intending to use repeated measures. > >If this makes sense to anybody, then I would welcome any comments or >suggestions that they might have. > >Thanks in advance, > >Peter Baade > > =========================== Dennis Roberts, Professor EdPsy !!! GO NITTANY LIONS !!! 208 Cedar, Penn State, University Park, PA 16802 AC 814-863-2401 WEB (personal) http://www2.ed.psu.edu/espse/staff/droberts/drober~1.htmReturn to Top
Yuichi Watanabe (yuichi@HAWAII.EDU) wrote: : I have collected data with a 101-item questionnaire on motivation etc. : from 1000 students. All the responses were 1-5 Likert-type scale : (Strongly disagree - Strongly agree). 52 items were on motivation. : I have run an MDS based on a correlation matrix of 52 items, convering : the correlation coeffiencients into 1+r to obtain a similarity matrix with : all positive numbers. An MDS with n=1 accounted for 90% of variance, with : n=2 92%. All the negative items, mainly Anxiety, were on the left and all : the confidence items, such as Expectancy of success, were on the right : extreme in the first dimension. With a factor analysis with the same 52 : items, 8 factors accounted for only 50% of variance. Did I do something : wrong? : I am confident that the data input was done correctly. The same : correlation matrix yielded the same factor solutions. I have run MDS both : with SAS and Systat and got almost identical results. : Could someone explain why I got so different results beween factor : analysis and MDS? -- Two items with a big negative correlation would be VERY FAR APART or NOT AT ALL SIMILAR accoring to the data you put in your MDS. Factor analysis, on the other hand, would put them on the same factor with opposite loadings. I do not know how you want to enter your correlations, but adding 1.0 is probably a very poor solution, because it HAS to give you the results that you see. Accounting for 50% of the variance of that kind of attitude-item is probably accounting for all the 'reliable' variance, so that is a decent sounding result. I do not know what is comparable for MDS; MDS has not been useful the couple of times that I have tried it (That was also, like yours, on rating-scale data). Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
Hello all, Ok here is the premise to my problem. I had 2 different sets of subjects make ratings on the same set of stimuli. I took those ratings and used them as distance measures to do a cluster analysis for each group. I now have 2 different sets of clusters containing some overlaping items. How do I compare these 2 cluster analyses(lets say one yielded 4 clusters and the other yielded 3)? Is there any quatifiable measure of overlap of 2 cluster analysis? Does anyone even know where to start? Thanks for any type of help. Rickard D. Robbins P.S.: email me personally, if you don't mind. email:rrobbins@colab.brooks.af.milReturn to Top
I know that alpha inflation occurs when doing more than a-1 contrasts on the same set of data, where a=the number of groups. Does alpha inflation occur in the following scenario: If I do a MANOVA with three dv's and then later decide that I want to do another MANOVA with three different variables (from the same data set but they are not the same variables from test one) does performing the second manova constitue alpha inflation? or is there some oher problem with doing 2 separate manovas?? any insights would be appreciated. TIA Jay ============================================================================ Jay Alberts jay.alberts@asu.eduReturn to Top
When parsimony is achieved by examining relationships with the response variables, all statistical inference can be invalidated. In particular, standard errors are too small and P-values are too. See for example the two references below. The Grambsch and O'Brien article has a nice demonstration of the fact that if you fit a model with age and age-squared, test the age^2 term for significance and drop it, the one degree of freedom test for age needs to be almost be judged against a 2 d.f. critical value. In other words, dropping the age^2 term can do nothing but hurt the power of the test of association between age and Y if one in fact preserves the type I error. @article{gra91, author = "Grambsch, P. M. and {O'Brien}, P. C.", journal = SM, pages = "697-709", title = "The effects of transformations and preliminary tests for non-linearity in regression", volume = "10", year = "1991" } @Article{cha95mod, author = {Chatfield, C.}, title = {Model uncertainty, data mining and statistical inference (with discussion)}, journal = JRSSA, year = 1995, volume = 158, pages = {419-466}, annote = {bias by selecting model because it fits the data well; bias in standard errors;P. 420: ... need for a better balance in the literature and in statistical teaching between {\em techniques} and problem solving {\em strategies}. P. 421: It is `well known' to be `logically unsound and practically misleading' (Zhang, 1992) to make inferences as if a model is known to be true when it has, in fact, been selected from the {\em same} data to be used for estimation purposes. However, although statisticians may admit this privately (Breiman (1992) calls it a `quiet scandal'), they (we) continue to ignore the difficulties because it is not clear what else could or should be done. P. 421: Estimation errors for regression coefficients are usually smaller than errors from failing to take into account model specification. P. 422: Statisticians must stop pretending that model uncertainty does not exist and begin to find ways of coping with it. P. 426: It is indeed strange that we often admit model uncertainty by searching for a best model but then ignore this uncertainty by making inferences and predictions as if certain that the best fitting model is actually true. P. 427: The analyst needs to assess the model selection {\em process} and not just the best fitting model. P. 432: The use of subset selection methods is well known to introduce alarming biases. P. 433: ... the AIC can be highly biased in data-driven model selection situations. P. 434: Prediction intervals will generally be too narrow. In the discussion, Jamal R. M. Ameen states that a model should be (a) satisfactory in performance relative to the stated objective, (b) logically sound, (c) representative, (d) questionable and subject to on--line interrogation, (e) able to accommodate external or expert information and (f) able to convey information.} }Return to Top
Please recommend a sorce for a novice to learn about modeling dose levels and exploring the effect of different ingestion patterns. I have been exploring difference equations but my math is weak. Hope that this is an appropriate question for this list. Thanks. Fancher E. Wolfe, Professor Mathematics and Statistics Metropolitan State University 730 Hennepin Ave. Minneapolis, MN 55403-1897 fwolfe@msus1.msus.edu 612-341-7256Return to Top
I might conceptualize it as a question of whether the "effect" of B on A varies with T (time): in other words, is there a B x T interaction. Assuming that B is continuous and T is categorical, I'd analyze with the Lorch and Myers repeated measures regression technique (J. Experimental Psychology: Learning, Memory, and Cognition, 1990). Their technique is straightforward for anyone familiar with OLS regression. If I was more of a statistician I might consider something like hierarchical linear modelling. Of course, the more conceptual questions raised by Rich Ulrich also deserve attention--consideration of those might resolve issues about whether and how to standardize prior to applying the L&M; technique. Also, if B is categorical, then the L&M; approach might not be applicable. Ed Cook, Assoc Prof of Psychology, Univ of Alabama at Birmingham, USAReturn to Top
Peter Flom wrote: > > Sean Lahman wrote about studying the effects of integration and > expansion on baseball. > > He asked if he was "missing anything" > > Well, one thing that seems to me to be missing is that batting > averages are NOT an absolute measure of ability....they are affected > by lots of things,. but the main one I can think of here is PITCHING > and fielding ability. > You're right, there is no absolute measure of hitting ability. But I would expect that by using several different statistical measures, you would be able to see the effects of outside forces. There are obviously many other factors that affect hitting performance, and I suspect mutil-variable regression analysis might help to identify and quantify them. > If integration improved the quality of play (as seems inutitively > likely to me as well as to Sean) wouldn't it improve all aspects of > play? But I'm attempting to draw a distinction between quality of play and statistical performance. Specifically, "does comparable statistical performance imply comparable quality of play?" My theory was something like this (as restated by someone else). If ability to hit is distributed normally among the population from which major league players are drawn, expanding that population while keeping the number of players the same will result in a higher percentage of players being drawn from the extreme right end of the normal curve. Increasing the number of players drawn from a stable population will have the opposite effect. If we accept this premise, we would expect the MEANS of standard measures of performance like BA, OBP and SLG to be unaffected (since the "average" batter should improve at the same rate as the average pitcher). We would, however, expect expansion to increase the standard deviations, and integration to have the opposite effect. Because the data does not conform to that model, it suggests to me that the other effects (night baseball, artificial turf, livelier ball, changes in style of play, etc.) are more significant in affecting player statistics. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sean Lahman - lahmans@vivanet.com Sean Lahman's Baseball Archive http://www.vivanet.com/~lahmans/baseball.htmlReturn to Top
In article <199611300000.AAA24894@wildnet.co.uk>, John WhittingtonReturn to Topwrote: > Of course. My point is that, in the real world, one normally will not know > for certain that the serial numbers do start at 1 - not the least because > there are plenty of examples in which this is not the case. > > Is it not best to try to model on the basis of all the available > information, with the minimum of guessing about the nature of the model? It > would seem (to me) reasonable to use a model which assumed no more than that > the serial numbers were consecutive (usually the case, even when the start > is not 1), with both 'start' and 'finish' numbers as parameters of the model > to be estimated. I don't imagine that such a model would be all that > difficult to deal with - intuitively, one might expect it would be based, > inter alia, on both xmin and xmax in the sample. Indeed you are right. However, if we observe xmax = 1287 and xmin = 16, we realise that the starting number was probably 1. Who would start with a small number other than 1? Of course this is a Bayesian type of argument. What is more, the estimate would no more than 16 different from using something like (xmax-xmin) + bias correction. Terry Moore, Statistics Department, Massey University, New Zealand. Imagine a person with a gift of ridicule [He might say] First that a negative quantity has no logarithm; secondly that a negative quantity has no square root; thirdly that the first non-existent is to the second as the circumference of a circle is to the diameter. Augustus de Morgan
On Sun, 8 Dec 1996 14:24:00 -0500, Dennis RobertsReturn to Topwrote: >most stat packagew will read off any F value you want ... for any percentile >point ... certainly Minitab can > >At 09:48 AM 12/4/96 GMT, you wrote: >>Hi all, >> >>For some error analysis I need to use Tables >>of the F-distribution, both the F_n;m;0.025 >>tables as the F_n;m;0.05 tables. >>There are enough books in which I can find them, >>but since I need them on a computer I wondered >>whether instead of typing in all those numbers, >>there is some WWW site where I could retrieve them. >> >>Machiel. > Excel has it. R
On Sun, 8 Dec 1996 05:34:41 -0500, ChrisM11@AOL.COM wrote: >--------------------- >Forwarded message: >Subj: Search for Paradise >Date: 96-12-08 03:41:25 EST >From: ChrisM 11 >To: ChrisM 11 > > This is to inform you about the new adult game that VCS Magazine rated >"The best game of '96" and gave an "Outstanding ****" (4 stars). "The Search >for Paradise is no doubt one of the greatest XXX Adult games available." The >first game where it is as much fun as it is a turn on. Travel the world to >every continent, every country you can think of, and meet some of the most >beautiful women in existence. These women will treat you like a king and obey >your every command. Any sexual wish you can think of, these women know it >all. There is a different paradise for every guy out there, and this game >will have them all. This game uses real models, digital video, and digital >sound >to make it as realistic as possible. You will feel like you're in the same >room as the girl you're talking to. --- Required: 386 or better, 4 meg ram >or better, Windows 3.1 or higher (Win95 is fine), sound card is optional, >CD-Rom is optional. Game is given either CD-rom, or compressed 3.5" >diskettes.) - $19.95. > > The last adult game we are going to inform you about is the newly >released "Club Celebrity X". Imagine being in a club with some very >beautiful, well known, ACTUAL celebrities that with skill, will be making you >breakfast in bed the next day. These girls you have seen on television, >magazines, and billboard ads, and now they are on your computer, begging >for action. Each girl you will recognize and you won't believe your eyes when >you got them in your own bedroom. This game is hot, and once you start >playing, you won't be able to stop. --- Required: 386 or better, 4 meg ram >or better, Windows 3.1 or higher (Win95 is fine), sound card is optional, >CD-Rom is optional. Game is given either CD-rom, or compressed 3.5" >diskettes.) - $19.95. > >Software arrives is a plain, unmarked, brown package. Delivery takes no >longer than 7 to 8 working days. Both your email address, and mailing >address are NOT added to any mailing lists whatsoever. Once you are mailed >this email, your name is deleated from all lists to ensure you are not mailed >again. > >Each game is $19.95, but for a limited time, you can get both "The Search for >Paradise" and "Club Celebrity X" for just $29.95. Shipping and handling is >$2.00 for each game ordered. There are no additional charges or fees. > >Please make checks or money orders out to: Chris Mark > I'll surely buy it with probability zero. RReturn to Top
Peter Baade (baade@SPIDER.HERSTON.UQ.OZ.AU) wrote: : I have a number of variables (categorical and continuous) that have shown : some evidence of poor test-retest reliability. : I heard that there is a procedure called "attenuation of parameter : estimates" due to poor reliability, however I am unsure what this is, or if : there are any references that I might be able to begin with. -- This is mainly hocus-pocus, which serves to mislead the unwary. It is well-intended, to serve a particular theoretical purpose, in a few particular times and places of application; but 'correction for attentuation', when I have seen it a couple of times, has gone along with ignoring *any* appropriate tests of significance. And claiming that two variables 'seem' to be (or *are*) correlated perfectly, or thereabouts, is a bit misleading when the accompanying test of significance might not reach (even) the 5% test level. The idea is this: If variable A has reliability of .5, and variable B has reliability of only .6 (estimated, perhaps, from the data on hand), then the observed correlation of .55 between A and B would be 'corrected' to be a correlation of 1.00. I find the procedure more believable when the original correlations are in the vicinity of .9. For instance, I have read some similar arguments concerning the 'content' of two IQ tests. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
Sean Lahman wrote: > > Peter Flom wrote: > > > > Sean Lahman wrote about studying the effects of integration and > > expansion on baseball. > > > > He asked if he was "missing anything" > > > > Well, one thing that seems to me to be missing is that batting > > averages are NOT an absolute measure of ability....they are affected > > by lots of things,. but the main one I can think of here is PITCHING > > and fielding ability. > > You're right, there is no absolute measure of hitting ability. But I > would expect that by using several different statistical measures, you > would be able to see the effects of outside forces. There are obviously > many other factors that affect hitting performance, and I suspect > mutil-variable regression analysis might help to identify and quantify > them. Are you hypothesizing that variable X2 might show the effect of outside force A, and therefore you could infer from X2 how outside force A affected variable X1? If so, I'm skeptical that you can do this from the data. You are dealing with highly collinear variables here. Multiple regression will enable you to predict with highly-collinear variables; it will not enable you to separate the affects of the variables. Basically, the problem is in the data, not in the statistical method. -- +---------------------------------+------------------------------+ | Paige Miller, Eastman Kodak Co. | "Let's play some basketball" | | PaigeM@kodak.com | Michael Jordan in Space Jam | +---------------------------------+------------------------------+ | The opinions expressed herein do not necessarily reflect the | | views of the Eastman Kodak Company. | +----------------------------------------------------------------+Return to Top
We have an ongoing debate in our lab about nested factors and whether they should be fixed or random. There is no consensus among authors on the subject. Some say that nested factors are always random while others state that it is possible (though unlikely) that a nested factor will be fixed. Zar (1996) pg. 308 gives an example of a nested analysis where the nested factor (drug source) is random. Drug 1 Drug 2 Drug 3 Source: A Q D B L S 1. What is the basis for considering source as a random factor? Is nesting alone sufficient? Are there not valid reasons to make sources a fixed factor (e.g., if sources are not randomly chosen)? 2. The F test for the nested factor will tell us whether there is variation among drug sources. Suppose that was significant and we were specifically interested in comparing sources within each drug. Is it appropriate to make contrasts between sources within drugs (-1 1 0 0 0 0; 0 0 -1 1 0 0; 0 0 0 0 1 -1)?Return to Top
If I do an obllique factor analysis, and get two factors, each representing a different proportion, what is the best way to represent this graphically? a. How long do I make the two oblique axes? (Do I make them proportional to the eigenvalues?) b. What is the angle between the two oblique axes? Thanx ChuckReturn to Top
Erik H Williams wrote: I am just becoming familiar with newsgroups and saw your posting. I don't exactly know all the etiquette but I do know time series. What you a referring to is called intervention detection or outlier detection. There are lots of assumptions regarding time series and when you talk about robust procedures there are a variety of extensions to time series modeling which allow one to proceed , always cautiously , when some of the assumptionss are not met. One of the assumptions that is nearly always violate by real-world data is the assumption that the mean of the errors is invariant and is not staistically significant from zero at all points in time. This led directly to outlier detection. If you wish to discuss pulses,seasonal pulses,level shifts or time trends as characteristics of a robust time series, please call me at 215-675-0652 . DAVE REILLY Another standard assumption , often violated but eminently treatable is the assumption that the variance of the errors is constant. have done with extending time series models to GENERALIZED LEAST SQUARES bu bootstrapping the diagonal elements of the variance-covariance matrix of the residuals. Another possible violation , and again treatable is the assumption that the model/ parameters are invariant over time. I have treated this in our commercial package called AUTOBOX. References and down-loadables are available at http://darkstar.icdc.com/~autoboxReturn to Top
Can anyone suggest references for the loss of degrees of freedom in a regression situation under heteroscedasticity? Alternatively, the equivalent effect in unbalanced one way ANOVA may be of help. The simpler of the situations I'm in essentially has a model of a set of parallel lines, for which I'm interested in finding the p-values of the parameters representing the differences in height. The smaller sample sizes generally have the smaller variances. If the two-sample t is any guide, this indicates the effect of d.f. should be small, but I'd like to see what is out there on this problem. Most of what I've been able to find out there so far either just falls back on asymptotic normality, or pretends that the degrees of freedom don't change. GlenReturn to Top
Just want to say thanks to all the people who responded to my plea for help on basic techniques and problems areas in formulating and specifying models. Also techniques for exploratory analysis and visual interpetation. Now that I have a bunch of responses, I will post a summary of them in the next day or two so as to possibly be of use for others. Again, thanks for all the suggestions! Daniel ParkerReturn to Top
Just want to say thanks to al the people who responded to my plea for help on basic techniques and problems areas in formulating and specifying models. Also techniques for exploratory analysis and visual interpetation. Now that I have a bunch of responses, I will post a summary of them in the next day or two so as to possibly be of use for others. Again, thanks for all the suggestions! Daniel ParkerReturn to Top
Just want to say thanks to al the people who responded to my plea for help on basic techniques and problems areas in formulating and specifying models. Also techniques for exploratory analysis and visual interpetation. Now that I have a bunch of responses, I will post a summary of them in the next day or two so as to possibly be of use for others. Again, thanks for all the suggestions! Daniel ParkerReturn to Top
Just want to say thanks to al the people who responded to my plea for help on basic techniques and problems areas in formulating and specifying models. Also techniques for exploratory analysis and visual interpetation. Now that I have a bunch of responses, I will post a summary of them in the next day or two so as to possibly be of use for others. Again, thanks for all the suggestions! Daniel ParkerReturn to Top
Just want to say thanks to all the people who responded to my plea for help on basic techniques and problems areas in formulating and specifying models. Also techniques for exploratory analysis and visual interpetation. Now that I have a bunch of responses, I will post a summary of them in the next day or two so as to possibly be of use for others. Again, thanks for all the suggestions! Daniel ParkerReturn to Top
Sorry for the repeats, my browser glitched and appeared not to be sending anything. But then I noticed the flood of messages. Daniel ParkerReturn to Top
>Is there a test for H0:Pearson-Rho=1? > >I found tests for Rho=0 and Rho=Rho0 with Rho0<1. Can't gat to find one for >testing Rho=1. > >Any suggestion? > reject whenever your sample r is < 1. _______________________________________________________________________ Hans-Peter Piepho Institut f. Nutzpflanzenkunde WWW: http://www.wiz.uni-kassel.de/fts/ Universitaet Kassel Mail: piepho@wiz.uni-kassel.de Steinstrasse 19 Fax: +49 5542 98 1230 37213 Witzenhausen, Germany Phone: +49 5542 98 1248Return to Top
>We have an ongoing debate in our lab about nested factors and whether >they should be fixed or random. There is no consensus among authors on >the subject. Some say that nested factors are always random while >others state that it is possible (though unlikely) that a nested factor >will be fixed. > >Zar (1996) pg. 308 gives an example of a nested analysis where the >nested factor (drug source) is random. > > Drug 1 Drug 2 Drug 3 >Source: A Q D B L S > > >1. What is the basis for considering source as a random factor? Is >nesting alone sufficient? Are there not valid reasons to make sources a >fixed factor (e.g., if sources are not randomly chosen)? > >2. The F test for the nested factor will tell us whether there is >variation among drug sources. Suppose that was significant and we were >specifically interested in comparing sources within each drug. Is it >appropriate to make contrasts between sources within drugs (-1 1 0 0 0 >0; 0 0 -1 1 0 0; 0 0 0 0 1 -1)? > > 1. The random assumption is a stronger one than the fixed assumption. 2. I would consider a factor as random, if the levels included in the study can be regarded as a RANDOM sample from a population (as you indicate in 1.). In the example, there may be a population of possible sources for each drug. If the levels used in the study are a RANDOM sample from that population, the nested factor can be regarded as random. 3. If the nested factor is random, the standard errors for drugs will be larger, and the inference space will be broader, i.e. you can draw inferences with respect to the whole population of sources. When the nested factor is fixed, inferences for drugs are restricted to the levels of the nested factor you have investigated. 4. Since you are interested in contrasts among levels of the nested factor, it may be that you have purposefully selected the levels, so the factor is not random. If the factor is really random, you could compare BLUPs rather than BLUEs. Hans-Peter _______________________________________________________________________ Hans-Peter Piepho Institut f. Nutzpflanzenkunde WWW: http://www.wiz.uni-kassel.de/fts/ Universitaet Kassel Mail: piepho@wiz.uni-kassel.de Steinstrasse 19 Fax: +49 5542 98 1230 37213 Witzenhausen, Germany Phone: +49 5542 98 1248Return to Top