![]() |
![]() |
Back |
Can someone clarify the distinction between reference priors and jeffreys' priors? I believe that one is a subset of the other (ie, jeffreys is a reference) or they overlap some way. To me they seem the same, but apparently reference priors depend on the order of parametrization, and I know jeffreys do not. thanks.Return to Top
I have a large number of data sets, each of which contains a time series of consecutive annual observations, with a maximum of ten years for each set. There is a lot of fluctuation in the data. I need an algorithm that will section the data (according to the values of a particular variable) into periods characterized by growth, decline, steady state (roughly), or just outright fluctuation. Each period should last at least two or three years and the characterization should agree fairly well with subjective judgment when looking at a graph of the data. As I see it, one problem lies in determining inflection points that define the beginning/end of each period. If possible, please also give hints to how the algorithm could be implemented in SPSS! Thanx. (And yes; this is an econometric problem.) Mail, please, to Hakon.Finne@ifim.sintef.noReturn to Top
Robert Gelb wrote: > > I need to find the standard coordinatization of 1+x+x^3 in P3. The answer > at the back of the book says: > [1] > [1] > [0] > [1] > basically a 4x1 matrix. My question is how can there be a 4x1 matrix in a > 3 dimensional space. > Hi Robert: in fact P3 is a 4 dimensional space. The dimension is the length of a basis, and a basis of P3 is obviously given by the polnomials 1, x, x^2 and x^3. The '3' in P3 just means that only polynomials of degree less than or equal to three are considered. It has nothing to do with the dimension of that space. Rainer DyckerhoffReturn to Top
Given a sample X_1,...X_n from a normal distribution N(mu,sigma**2). What is the dist'n of D=sum_{i=1}^n |X_i-mean| ? or at least, what is its expectation?Return to Top
One of the posters to this group had a signature which went something like this: ...possibility of the improbabilities vs the impossibility of the probabilities .... I would really appreciate it, if someone could recreate this correctly along with the name of the person who originally quoted this. Sincerely, Pushkar Tamhane.Return to Top
Without question, Linux is the way to go. My experience with Linux is with the Slackware release, and I have found it to be the most feature-rich of them all. Linux has more included value than any other UNIX release available (excepting Berkeley Software Design's BSD/OS, a commercial product). Its kernel is constantly updated, and includes specific and high-performance support for more cards, motherboards, chipsets, and etc than anything else. That hardware specific support alone makes it the most valuable. Linux is very easy to configure and provides solid and stable networking code. It comes with a vast variety of applications and has the single widest application support of any UNIX OS. Again, I don't think I can stress enough the value of the kernel support and application code available. Linux is free, powerful, and has all you need. You should definitely give it a try. Neil neil@rmi.net In article <329CB17A.C9F@ucla.edu>, baustin@ucla.edu says... > >Hello all, > >I am in the market for a UNIX operating system. I have narrowed the >search down to three 3 prospects: SCO UNIX 2.1, Solaris x86 UNIX, and >Lunix. My question is, which of the three is the best choice, and more >importantly, Why? I will be using the operating system for business and >personal use. > >I am positive that all three OSs have some strengths and weaknesses. >This has been my method of evaluation so far. If anyone can help please >reply. >-- >_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/ > > _/ _/ _/_/ _/ _/_/ Bryan Austin > _/ _/ _/ _/ _/ _/ _/ Dept. of Economics > _/ _/ _/ _/ _/_/_/_/ University of California > _/ _/ _/ _/ _/ _/ _/ Los Angeles > _/_/ _/_/_/ _/_/_/_/ _/ _/ > >_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/Return to Top
POSITION ANNOUNCEMENT Applications are invited for a possible tenure-track faculty position beginning August, 1997, subject to university approval. A Ph.D. degree or its equivalent and a commitment to research and excellence in teaching at all levels are required. The department offers B.S., M.S., Master of Industrial Statistics, and Ph.D. programs and consists of 10 faculty with varied research interests. See our home page, http://www.stat.sc.edu/, for details. Minority and female candidates are especially encouraged to apply. Transcript (for recent graduates), resume and four letters of reference to Chair of Faculty Search Committee, Department of Statistics, University of South Carolina, Columbia, SC 29208. AA/EOE ---- -tony (Anthony Rossini) Asst. Prof. Department of Statistics rossini@stat.sc.edu University of South Carolina http://www.stat.sc.edu/~rossini/ Columbia, SC 29208 803-777-3578(O) ..7-4048 (fax)Return to Top
Michael Kamen (mbkamen@facstaff.wisc.edu) wrote: : Greetings All! : How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a : higher correlation between sample 1 and 2, as rho^2 gets closer : to 1? -- Even without understanding the right side of your equation, I suggest that THAT is a silly question: Assuming rho is a correlation, "closer to 1" is "higher". rho^2 is monotonic in rho, rho positive. : The larger is (sigma^2_2 - sigma^2_1), surely the larger is : the difference between the samples. Rewriting your formula, R= (B-A)/B = 1 - A/B So if B is bigger/A is smaller, "the difference" and R both increase. But, offhand, I do not recognize your formula. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
Bryan Austin wrote: > > Larry Culver wrote: > > Do you have a need to run multiple processors? Does Linux support more > > than a single CPU yet? I'm not sure of some of the others U mentioned, > > but Solaris does ... one of the reasons I went with Solaris (2.5.1) was > > the fact that it does support multiple CPUs. > > > > Larry > I don't know about linux, but I know that SCO's new UNIXware has the > dual CPU capability, but I am not positive if it has the multi-cpu > capability. > > Bryan Yes, UnixWare supports multi-cpu configurations. The largest number I've ever seen operating was 12, but any number that the HW supports ought to work, in theory. In practice, SMP scaling is limited by how well the hardware can scale and how efficient the OS is at doing multi-cpu stuff. UnixWare is one of the _most_ efficient OS's at this sort of thing as evidenced by the fact that so many AIM hot-iron awards and TPC-C records have been on UnixWare. -- Christopher J. Calabrese Lab System Architect (Consultant) Hewlett Packard Engineering Services Group cjc@fpk.hp.comReturn to Top
Hi, I hope someone out there can help with this. I'm fairly new to neural nets and so I guesss this could be one of those obvious-once-you- know questions, but nevertheless.... I am trying to build a 7-10-10-3 feedforward neural net, with full connectivity between successive layers but no (direct) connectivity between non-adjacent layers. I am currently using the standard sigmoid function as my activation function. The problem is that I have training and test data where all seven inputs typically vary in a small range 0-0.3 . I am quite happy to normalise this data; However, my 3 output units are tricky to deal with. One of them varies in the range 200-20000, the second from 100-300 and the third 20-70. Clearly, with such large numbers, the network will find it difficult (impossible?) to be trained on such data, as my activation function only gives output in the [0,1] range. I have heard it is possible to adapt the sigmoid function to give a nonlinear activation function with a larger range. How is this done exactly, and would it be a suitable technique for solving the problem ? The network is being used to infer these values from measurements, so I need to preserve the original values. One possiblity I have considered is to apply a function to each of the outputs to scale it into the [0,1] interval, and to apply the inverse of that function when I require the values back again. Is this a safe approach, if I have to apply a different scaling function to each output unit ? I was thinking of something simple as a scaling function such as (unit value - min value the unit can take ) / ( max value that unit can take - min value ) Is this suitable (assuming all of this is a valid approach) or are there better ways to scale ? Thanks for any help that anyone can offer. Ed Burcher Joint Honours Maths / Computer Science Year 3 University of Nottingham EnglandReturn to Top
Is there any (preferably free) software available to compute smoothing splines for data on points in 2 and 3 dimensions? I'd prefer that it allow irregular data points, but if it required points to be on a regular grid, I can live with it. Thanks Ed Hughes Edward Hughes Consulting 130 Slater St., Ste. 1100 Ottawa, Ontario, Canada K1P 6E2 Voice: 613-238-4831 Fax: 613-238-7698 ehughes@mrco2.carleton.caReturn to Top
Robert Dodier (dodier@colorado.edu) wrote: : Hello all, : I'm trying to test hypotheses of the form ``Parameters a and b are : both nearly zero,'' ``Parameter a is nearly zero and b is substantially : more than zero,'' ``Parameter a is substantially more than zero and : parameter b is nearly zero.'' -- I am curious as to what context you have, that gives you those hypotheses, in that form. A much more normal situation, it seems to me, is to consider a pair of tests, S and T: S= a+b and T=a-b . Where a and b are scaled the same (same standard deviation), these are two orthogonal tests; which means you can look at the power (or whatever) for two tests at the value of half-alpha. Would you really oppose using those tests instead of what you described? Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
tom (tjb@acpub.duke.edu) wrote: : I'm a medical student with a question about P values. P values are : frequently referred to in the medical literature, and my understanding is : that they represent the probability that you would see results "like" : your test results based on random chance given certain assumptions about : distribution and so on. : Now, in that case, would a P value representing *no* significance : whatsoever be 0.5 or 0.1? I can envision both being the case. I.e., if : your test is two tailed, there is a 0.5 chance there would be test results : shifted one way from your results based on chance, and 0.5 chance that : they would be shifted the other way. : Or, conversely, I can see P values approaching 1 being the correct : answer. I.e., you would expect to see results like the results you have in : most random trials, and if you did an infinite number of trials the P : value would approach 1. : Which is correct? -- Think of it like this: the p-value of .5 represents the middle of the expected outcomes under the simple null hypothesis. That is, the cumulative distribution runs from 0 to 1, where only the upper extreme of scores is in the 'rejection area'. If you are grabbing BOTH extremes of the distribution as part of your rejection area, as you do with the two-tailed t-test, then it is not very natural to refer to other cut-off points. BY arbitrarily doubling the value (p) or (1-p), you can label cutoff points so that what was .5 is now 1.0. : An email cc: of posted replies is appreciated -- my Reader does not make that convenient. Do you have Newsfeed problems, or are you too lazy to pop in and look for responses? -- An E-mail address included within *your* text would be appreciated, too, since some NewsReaders don't show what it was, and what is shown is wrong a sizable fraction of the time, anyway. Rich Ulrich, biostatistician wpilib+@pitt.edu Western Psychiatric Inst. and Clinic Univ. of PittsburghReturn to Top
We have a job opportunity for a Statistical Quality Engineer and wonder if you accept job postings? Please let me know where to post, if so. Thank you. Position is with a worldwide leader of heat transfer technology located in Racine, Wisconsin. -------------------==== Posted via Deja News ====----------------------- http://www.dejanews.com/ Search, Read, Post to UsenetReturn to Top
Neil Schroeder (neil@rmi.net) wrote: : Without question, Linux is the way to go. Thats your opinion. It really depends on what he's going to be using it for. For me, and this is my opinion, it boils down to 2 issues: 1) Motif/CDE 2) source code Q: Do you want Motif and/or CDE (eventually) A: Unixware is the way to go because Motif is free in SCO now, Free in unixware in January. CDE will be free with the release of Gemini in mid-97. CDE for Linux is $250.00 last I heard. Motif is $100.00 or so. If ya just need the source code for the OS you're running, then Linux. If ya don't need source code, get SCO Unix now, or hold off for Unixware in January. You can still reference the Linux source implementation if you are using Unixware if you need an example of how something may have been implemented. For me, all things being equal, I prefer a commercial Operating System. Just expressing my right to voice my opinion. Here's a question for anyone who would like to take as stab at it: Q: Which kernel is really better? The linux kernel developed by Linus, or the SVR4 kernel developed by Sun/Bell Labs ? And why? Explain what makes one better than the other. Jim -- Jim Balson balson@world.std.comReturn to Top
Letting z_i be iid N(0,1) r.v.'s, we can write x_i=mean+sigma*z_i so D=sum|x_i - mean|=sum|sigma*z_i|=sigma*sum|z_i|. Then E(D)=sigma*n*E(|z_1|)=sigma*n*sqrt(2/pi) V(D)=(sigma^2)*n*V(|z_1|)=(sigma^2)*n*[E(|z_1|^2)-(E|z_1|)^2]=(sigma^2)*n*[1-2/pi] (hope I did the expected value integrals right). Robert E Sawyer soen@pacbell.net _________________ Roland JeskeReturn to Topwrote in article ... | Given a sample X_1,...X_n from a normal distribution N(mu,sigma**2). | | What is the dist'n of | | D=sum_{i=1}^n |X_i-mean| ? | | or at least, what is its expectation? |
>The "moronic" idea is to let some quasi-religious mantra decide what >action to take. You have decisions to make; statistical decision theory >is designed to help YOU make the decision appropriate for YOU; use it, >instead of following the blind. >-- >Herman Rubin Yes I agree if you have a bona fide decision-making situation on your hands, better use decision theory. But I don't think we are usually in that situation in science. We typically want to estimate some parameter rather than make a decision. Bill SimpsonReturn to Top
Richard, wpilib+@pitt.edu (Richard F Ulrich) wrote: >Michael Kamen (mbkamen@facstaff.wisc.edu) wrote: >: Greetings All! > >: How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a >: higher correlation between sample 1 and 2, as rho^2 gets closer >: to 1? > > -- Even without understanding the right side of your equation, >I suggest that THAT is a silly question: Assuming rho is a >correlation, "closer to 1" is "higher". rho^2 is monotonic >in rho, rho positive. I think it was Herby Dinkins who once said (and very well too), "the only silly question is the one that is not asked,":). >: The larger is (sigma^2_2 - sigma^2_1), surely the larger is >: the difference between the samples. > > >Rewriting your formula, > R= (B-A)/B = 1 - A/B > >So if B is bigger/A is smaller, "the difference" and R both >increase. But, offhand, I do not recognize your formula. If you have a copy of Probability and Statistics for Engineers, 4th edition, Miller et al, Prentice Hall, New Jersey, please turn to page 360 and note at the bottom that the aforementioned formula is the definition of the Population Correlation Coefficient. In your version of the formula, note that if B >> A, A/B gets small making R-->1. |One| is defined as "perfect correlation." That is exactly my problem! The question of terms like "higher" or "lower" aside -- if the variances of 2 samples differ a lot they are usually thought of as not likely to come from the same population. Why then, in the case of Cor. Coef. do we draw a somewhat reverse conclusion? Please do not take me to be trying to refute decades (centuries?) of statistical precedent :). I am only trying to understand this as best as I can (with the admittedly frail instrument at my disposal). Thanks, MichaelReturn to Top
For ordinary least sqares regression what is meant by the term Dependence (in the error term) ? Many, many thanks for an answer and an authoritaive cite.Return to Top
Eric Bartels wrote: > > Hello, > > I am looking for pointers to publications in the biology/statistics > community where the following problem arose. > > An experiment is conducted in which n individuals participate. Some > individuals can compete with other individuals for a resource. For > each individual the list of the possible opponents is fixed at the > beginning. During the observation time some of these competitions take > place and for each one the winner is noted. Some competitions may end > in a tie in which case the competition is being treated as if it has > not taken place. > At the end we obtain for each individual the number of wins it has > achieved during the observation time. > > This outcome data is to be tested statistically in order to test > certain hypotheses about the population and certain subsets of it. > > I am curious to learn whether this or related problems have been dealt > with in the statistics / biology literature. Any paper where a similar > problem was analysed or pointers to the statistical method of choice > would be very much appreciated. > > thanks, > > Eric Bartels It seems possible that you could view this as a 1-1 case-control data set. Each game that had a winner would be a stratum and the winner would be the case, the loser the control. Each player except one for reference would have a dummy variable whose contribution (positive or negative) to the odds of being the winner would be estimated. A reference is Hosmer and Lemeshow. Ellen HertzReturn to Top
I'm looking for references and/or other guidance to help me formulate sample size and power calculation expressions under _both_ completely randomized and matched paired assignment to treatment and control groups given: (1) a simple difference of means test is to be preformed; (2) k pre-treatment and j post-treatment measures are available for both groups; (3) I have reasonably good estimates of means and variances of the criterion for k pre-treatment time periods for the population of interest; (4) I have a hypothesized value for the post-intervention percentage difference between treatment and control groups on the criterion variable. Direct replies via e-mail would be appreicated. Thanks in advance.Return to Top
ebx@cs.nott.ac.uk (Edward A G Burcher) writes: >I am trying to build a 7-10-10-3 feedforward neural net, with full >connectivity between successive layers but no (direct) connectivity between >non-adjacent layers. I am currently using the standard sigmoid function >as my activation function. The problem is that I have training and test data >where all seven inputs typically vary in a small range 0-0.3 . I am quite happy >to normalise this data; However, my 3 output units are tricky to deal with. >One of them varies in the range 200-20000, the second from 100-300 and the >third 20-70. Clearly, with such large numbers, the network will find it >difficult (impossible?) to be trained on such data, as my activation function >only gives output in the [0,1] range. I have heard it is possible to adapt the >sigmoid function to give a nonlinear activation function with a larger range. >How is this done exactly, and would it be a suitable technique for solving the >problem ? If the network is to be used as a regression model (as you are doing), it is appropriate to use linear output units. (If you have a nonlinear output unit and graft on a scaling function, you have added a linear unit, in effect.) About scaling output values: whether this is necessary depends on your training algorithm. Simple gradient descent (i.e. backpropagation) will fail if the magnitudes of the outputs are very different, so you should scale them all to be roughly the same size. In this situation, I always subtract the mean and divide by the standard deviation so that I get numbers that are more or less in the range -3 to +3 or so. Smarter algorithms, e.g. conjugate gradient, are not sensitive to output magnitudes, so scaling is not needed. >The network is being used to infer these values from measurements, so I need >to preserve the original values. One possiblity I have considered is to apply >a function to each of the outputs to scale it into the [0,1] interval, and >to apply the inverse of that function when I require the values back again. >Is this a safe approach, if I have to apply a different scaling function to >each output unit ? Yes, just apply the inverse of the scaling function to recover values in the original units. BTW with a linear output unit, you can compose the hidden-layer-to-output-unit function with the inverse scaling to get a modified set of weights for the final layer; then you don't have to explicitly worry about the scaling business once the network is trained. (Same with input scaling, if you're using it.) >I was thinking of something simple as a scaling function such as > >(unit value - min value the unit can take ) / ( max value that unit can take - min value ) > >Is this suitable (assuming all of this is a valid approach) or are there better >ways to scale ? No, sooner or later you'll find a value that goes beyond the boundaries. In any event, scaling to [0,1] is unnecessary when the output unit is linear -- as it should be for your problem. Any reasonable scheme will work -- I suggest subtracting the mean and dividing by standard deviation. There is a good reason to prefer the (x-mean)/stddev scheme, though -- which is that if you compute mean-square-error using output values transformed in that way, you get the MSE of the untransformed values divided by the variance of the output. Some people call this the `normalized MSE.' It is 1 if your network does no better than putting out (outputting?) the mean; if it is more than 1 you are in bad shape! A value of, say, 0.10 has an obvious interpretation -- the MSE of prediction is 10 percent as large as the variance of the output. Hope this helps. Believe it or not, in the not-so-distant past I asked questions just like this too... Wait, I _still_ ask questions like this. :) Robert Dodier -- ``Ainda nos faz lembrar os belos tempos'' -- on the prow of a fishing boat.Return to Top
If you fit a "straight-line" linear model (linear in the parameters A, B) y_i = A + B*x_i + err_i to data {(x_i, y_i), i=1,n} to get yhat_i = a + b*x_i, with a,b the resulting estimators for A,B then the coefficient of *linear* correlation R is such that R^2 = (MST - MSE)/MST = 1 - MSE/MST where MST = "mean square total" = (sum(y_i-ybar)^2)/n MSE = "mean squared error"= (sum(yhat_i -y_i)^2)/n. This form lends itself to the interpretation: MST = "total (y-)variation" in the independent variable y, MSE = "(y-)variation UNexplained" by the linear model (MSE = 0 iff all yhat_i = y_i, i.e. all the data points lie on a straight line) MST-MSE = "(y-)variation explained" by the linear model This should answer your question: "better fit" <-> proportionately smaller MSE <-> proportionately greater MST-MSE -- Robert E Sawyer soen@pacbell.net _________________ Michael KamenReturn to Topwrote in article <57ssnj$1cra@news.doit.wisc.edu>... | Greetings All! | | How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a | higher correlation between sample 1 and 2, as rho^2 gets closer | to 1? | | The larger is (sigma^2_2 - sigma^2_1), surely the larger is | the difference between the samples. | | Can anyone help me out here? | | | |
I am wondering what the algorithms and the convergence criteria TSP uses for Logit and Probit model. Thanks in advance. Tatsuo Ochiai tochiai@students.wisc.eduReturn to Top