Back


Newsgroup sci.stat.math 11983

Directory

Subject: reference priors versus jeffreys priors -- From: lthompso@s.psych.uiuc.edu (Laura Thompson)
Subject: growth, decline, steady state (roughly), or just outright fluctuation -- From: "Håkon Finne"
Subject: Re: Linear Transformations & Matrix representation. Help -- From: Rainer Dyckerhoff
Subject: Dist'n of mean abs. Dev. -- From: WSHEIL03atnyx.uni-konstanz.e@uni-konstanz.de (Roland Jeske)
Subject: Looking for a quotation; (probability of impossibilities etc.). -- From: "Pushkar N. Tamhane"
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!! -- From: neil@rmi.net (Neil Schroeder)
Subject: Faculty Position Announcement -- U South Carolina -- From: Anthony Rossini
Subject: Re: Meaning of Correlation Coefficient -- From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!! -- From: Chris Calabrese
Subject: Output unit scaling ? -- From: ebx@cs.nott.ac.uk (Edward A G Burcher)
Subject: Query: smoothing-spline software -- From: Ed Hughes
Subject: Re: Bayesian hypothesis testing confusion -- From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Re: Basic question on P values -- From: wpilib+@pitt.edu (Richard F Ulrich)
Subject: Job Opportunity to post, if permitted (Statistical Quality Engineer) -- From: ambgrp@aol.com
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!! -- From: balson@world.std.com (Jim Balson)
Subject: Re: Dist'n of mean abs. Dev. -- From: "Robert E Sawyer"
Subject: Re: Implausible null hypotheses -- From: Bill Simpson
Subject: Re: Meaning of Correlation Coefficient -- From: Michael Kamen
Subject: OLS - Dependence in the error term -- From: d4t@jano.com (d4t)
Subject: Re: Statistics of outcomes of competitions -- From: Ellen Hertz
Subject: Sample & power calc advice needed -- From: mehla@netcom.com (Mike Hollis)
Subject: Re: Output unit scaling ? -- From: dodier@bechtel.Colorado.EDU (Robert Dodier)
Subject: Re: Meaning of Correlation Coefficient -- From: "Robert E Sawyer"
Subject: Logit & Probit by TSP -- From: Tatsuo Ochiai

Articles

Subject: reference priors versus jeffreys priors
From: lthompso@s.psych.uiuc.edu (Laura Thompson)
Date: 2 Dec 1996 06:59:42 GMT
Can someone clarify the distinction between reference priors and
jeffreys' priors?  I believe that one is a subset of the other (ie,
jeffreys is a reference) or they overlap some way. To me they seem the
same, but apparently reference priors depend on the order of
parametrization, and I know jeffreys do not.
thanks.
Return to Top
Subject: growth, decline, steady state (roughly), or just outright fluctuation
From: "Håkon Finne"
Date: 2 Dec 1996 09:25:42 GMT
I have a large number of data sets, each of which contains a time series of consecutive annual
observations, with a maximum of ten years for each set. There is a lot of fluctuation in the data.
I need an algorithm that will section the data (according to the values of a particular variable)
into periods characterized by growth, decline, steady state (roughly), or just outright
fluctuation.
Each period should last at least two or three years and the characterization should agree
fairly well with subjective judgment when looking at a graph of the data. As I see it, one problem
lies in determining inflection points that define the beginning/end of each period.
If possible, please also give hints to how the algorithm could be implemented in SPSS!
Thanx. (And yes; this is an econometric problem.)
Mail, please, to Hakon.Finne@ifim.sintef.no
Return to Top
Subject: Re: Linear Transformations & Matrix representation. Help
From: Rainer Dyckerhoff
Date: Mon, 02 Dec 1996 13:17:03 -0800
Robert Gelb wrote:
> 
> I need to find the standard coordinatization of 1+x+x^3 in P3.  The answer
> at the back of the book says:
>                                 [1]
>                                 [1]
>                                 [0]
>                                 [1]
> basically a 4x1 matrix.  My question is how can there be a 4x1 matrix in a
> 3 dimensional space.
>
Hi Robert:
in fact P3 is a 4 dimensional space. The dimension is the length of
a basis, and a basis of P3 is obviously given by the polnomials
1, x, x^2 and x^3. 
The '3' in P3 just means that only polynomials of degree less than
or equal to three are considered. It has nothing to do with the
dimension of that space.
Rainer Dyckerhoff
Return to Top
Subject: Dist'n of mean abs. Dev.
From: WSHEIL03atnyx.uni-konstanz.e@uni-konstanz.de (Roland Jeske)
Date: Mon, 2 Dec 1996 15:00:01
Given a sample X_1,...X_n from a normal distribution N(mu,sigma**2).
What is the dist'n of
D=sum_{i=1}^n |X_i-mean| ?
or at least, what is its expectation?
Return to Top
Subject: Looking for a quotation; (probability of impossibilities etc.).
From: "Pushkar N. Tamhane"
Date: Mon, 02 Dec 1996 10:47:34 -0500
One of the posters to this group had a signature which went something
like this:
...possibility of the improbabilities vs the impossibility of the
probabilities ....
I would really appreciate it, if someone could recreate this correctly
along with the name of the person who originally quoted this.
Sincerely,
Pushkar Tamhane.
Return to Top
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!!
From: neil@rmi.net (Neil Schroeder)
Date: 2 Dec 1996 16:55:42 GMT
Without question, Linux is the way to go.  
My experience with Linux is with the Slackware release, and I have found it to 
be the most feature-rich of them all.
Linux has more included value than any other UNIX release available (excepting 
Berkeley Software Design's BSD/OS, a commercial product).  Its kernel is 
constantly updated, and includes specific and high-performance support for 
more cards, motherboards, chipsets, and etc than anything else. That hardware 
specific support alone makes it the most valuable.
Linux is very easy to configure and provides solid and stable networking code. 
 It comes with a vast variety of applications and has the single widest 
application support of any UNIX OS.
Again, I don't think I can stress enough the value of the kernel support and 
application code available.  Linux is free, powerful, and has all you need.  
You should definitely give it a try. 
Neil
neil@rmi.net
In article <329CB17A.C9F@ucla.edu>, baustin@ucla.edu says...
>
>Hello all,
>
>I am in the market for a UNIX operating system. I have narrowed the
>search down to three 3 prospects: SCO UNIX 2.1, Solaris x86 UNIX, and
>Lunix. My question is, which of the three is the best choice, and more
>importantly, Why? I will be using the operating system for business and
>personal use.  
>
>I am positive that all three OSs have some strengths and weaknesses. 
>This has been my method of evaluation so far. If anyone can help please
>reply.
>-- 
>_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
>
>    _/    _/    _/_/     _/          _/_/           Bryan Austin
>   _/    _/   _/   _/   _/        _/    _/       Dept. of Economics
>  _/    _/   _/        _/        _/_/_/_/     University of California
> _/    _/   _/    _/  _/        _/    _/            Los Angeles
>  _/_/      _/_/_/   _/_/_/_/  _/    _/         
>
>_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
Return to Top
Subject: Faculty Position Announcement -- U South Carolina
From: Anthony Rossini
Date: 02 Dec 1996 11:28:57 -0500
POSITION ANNOUNCEMENT
        Applications are invited for a possible tenure-track faculty
position beginning August, 1997, subject to university approval.  A
Ph.D. degree or its equivalent and a commitment to research and
excellence in teaching at all levels are required.  The department
offers B.S., M.S., Master of Industrial Statistics, and Ph.D. programs
and consists of 10 faculty with varied research interests.  See our
home page, http://www.stat.sc.edu/, for details.  Minority and female
candidates are especially encouraged to apply.  Transcript (for recent
graduates), resume and four letters of reference to Chair of Faculty
Search Committee, Department of Statistics, University of South
Carolina, Columbia, SC 29208.  AA/EOE
----
-tony (Anthony Rossini)		Asst. Prof.
Department of Statistics        rossini@stat.sc.edu
University of South Carolina	http://www.stat.sc.edu/~rossini/
Columbia, SC 29208              803-777-3578(O) ..7-4048 (fax)
Return to Top
Subject: Re: Meaning of Correlation Coefficient
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 2 Dec 1996 17:25:55 GMT
Michael Kamen (mbkamen@facstaff.wisc.edu) wrote:
: Greetings All!
: How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a 
: higher correlation between sample 1 and 2, as rho^2 gets closer 
: to 1?
  -- Even without understanding the right side of your equation,
I suggest that THAT is a silly question:  Assuming  rho  is a
correlation, "closer to 1"  is  "higher".   rho^2  is monotonic
in rho, rho positive.
: The larger is (sigma^2_2 - sigma^2_1), surely the larger is 
: the difference between the samples.  
Rewriting your formula,
   R= (B-A)/B  =  1 - A/B
So if B is bigger/A is smaller, "the difference" and R both 
increase.  But, offhand, I do not recognize your formula.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
Return to Top
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!!
From: Chris Calabrese
Date: Mon, 02 Dec 1996 14:50:52 -0500
Bryan Austin wrote:
> 
> Larry Culver wrote:
> > Do you have a need to run multiple processors?  Does Linux support more
> > than a single CPU yet? I'm not sure of some of the others U mentioned,
> > but Solaris does ... one of the reasons I went with Solaris (2.5.1) was
> > the fact that it does support multiple CPUs.
> >
> > Larry
> I don't know about linux, but I know that SCO's new UNIXware has the
> dual CPU capability, but I am not positive if it has the multi-cpu
> capability.
> 
> Bryan
Yes, UnixWare supports multi-cpu configurations.
The largest number I've ever seen operating was 12, but
any number that the HW supports ought to work, in theory.
In practice, SMP scaling is limited by how well the hardware
can scale and how efficient the OS is at doing multi-cpu stuff.
UnixWare is one of the _most_ efficient OS's at this sort of thing
as evidenced by the fact that so many AIM hot-iron awards
and TPC-C records have been on UnixWare.
-- 
Christopher J. Calabrese
Lab System Architect (Consultant)
Hewlett Packard Engineering Services Group
cjc@fpk.hp.com
Return to Top
Subject: Output unit scaling ?
From: ebx@cs.nott.ac.uk (Edward A G Burcher)
Date: Mon, 2 Dec 1996 12:41:27 GMT
Hi, I hope someone out there can help with this. I'm fairly new to 
neural nets and so I guesss this could be one of those obvious-once-you-
know questions, but nevertheless....
I am trying to build a 7-10-10-3 feedforward neural net, with full
connectivity between successive layers but no (direct) connectivity between
non-adjacent layers. I am currently using the standard sigmoid function
as my activation function. The problem is that I have training and test data
where all seven inputs typically vary in a small range 0-0.3 . I am quite happy
to normalise this data; However, my 3 output units are tricky to deal with.
One of them varies in the range 200-20000, the second from 100-300 and the
third 20-70. Clearly, with such large numbers, the network will find it 
difficult (impossible?) to be trained on such data, as my activation function
only gives output in the [0,1] range. I have heard it is possible to adapt the
sigmoid function to give a nonlinear activation function with a larger range.
How is this done exactly, and would it be a suitable technique for solving the
problem ?
The network is being used to infer these values from measurements, so I need
to preserve the original values. One possiblity I have considered is to apply
a function to each of the outputs to scale it into the [0,1] interval, and
to apply the inverse of that function when I require the values back again.
Is this a safe approach, if I have to apply a different scaling function to
each output unit ? 
I was thinking of something simple as a scaling function such as
(unit value - min value the unit can take ) / ( max value that unit can take - min value )
Is this suitable (assuming all of this is a valid approach) or are there better
ways to scale ?
Thanks for any help that anyone can offer.
Ed Burcher 
Joint Honours Maths / Computer Science Year 3 
University of Nottingham
England
Return to Top
Subject: Query: smoothing-spline software
From: Ed Hughes
Date: 2 Dec 1996 19:00:52 GMT
Is there any (preferably free) software available to 
compute smoothing splines for data on points in 2 and 3 
dimensions?  I'd prefer that it allow irregular data 
points, but if it required points to be on a regular
grid, I can live with it.  
Thanks
Ed Hughes 
Edward Hughes Consulting
130 Slater St., Ste. 1100
Ottawa, Ontario, Canada   K1P 6E2
Voice:  613-238-4831
Fax:    613-238-7698
ehughes@mrco2.carleton.ca
Return to Top
Subject: Re: Bayesian hypothesis testing confusion
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 2 Dec 1996 20:21:12 GMT
Robert Dodier (dodier@colorado.edu) wrote:
: Hello all,
: I'm trying to test hypotheses of the form ``Parameters a and b are
: both nearly zero,'' ``Parameter a is nearly zero and b is substantially
: more than zero,'' ``Parameter a is substantially more than zero and
: parameter b is nearly zero.''
  -- I am curious as to what context you have, that gives you those
hypotheses, in that form.
A much more normal situation, it seems to me, is to consider a pair
of tests, S and T:  S= a+b  and T=a-b .   Where  a  and  b  are
scaled the same (same standard deviation), these are two orthogonal
tests;  which means you can look at the power (or whatever)  for two
tests at the value of half-alpha.
Would you really oppose using those tests instead of what you 
described?
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
Return to Top
Subject: Re: Basic question on P values
From: wpilib+@pitt.edu (Richard F Ulrich)
Date: 2 Dec 1996 20:41:38 GMT
tom (tjb@acpub.duke.edu) wrote:
: I'm a medical student with a question about P values. P values are 
: frequently referred to in the medical literature, and my understanding is 
: that they represent the probability that you would see  results "like" 
: your test results based on random chance given certain assumptions about 
: distribution and so on.
: Now, in that case, would a P value representing *no* significance 
: whatsoever be 0.5 or 0.1? I can envision both being the case. I.e., if 
: your test is two tailed, there is a 0.5 chance there would be test results 
: shifted one way from your results based on chance, and 0.5 chance that 
: they would be shifted the other way.
: Or, conversely, I can see  P values approaching 1 being the correct 
: answer. I.e., you would expect to see results like the results you have in 
: most random trials, and if you did an infinite number of trials the P 
: value would approach 1.
: Which is correct?
 -- Think of it like this:   the  p-value  of .5  represents
the middle of the expected outcomes under the simple null hypothesis.
That is, the cumulative distribution runs from 0 to 1, where only
the upper extreme of scores is in the 'rejection area'.
If you are grabbing BOTH extremes  of the distribution as part of 
your rejection area, as you do with the two-tailed t-test, then
it is not very natural to refer to other cut-off points.  BY 
arbitrarily doubling the value (p)  or (1-p), you can label 
cutoff points so that what was .5  is now 1.0.
: An email cc: of posted replies is appreciated
  -- my Reader does not make that convenient.  Do you have Newsfeed
problems, or are you too lazy to pop in and look for responses?
  -- An E-mail address included within  *your*  text would be
appreciated, too, since some NewsReaders don't show what it was,
and what is shown is wrong a sizable fraction of the time, anyway.
Rich Ulrich, biostatistician              wpilib+@pitt.edu
Western Psychiatric Inst. and Clinic   Univ. of Pittsburgh
Return to Top
Subject: Job Opportunity to post, if permitted (Statistical Quality Engineer)
From: ambgrp@aol.com
Date: Mon, 02 Dec 1996 14:11:03 -0600
We have a job opportunity for a Statistical Quality Engineer and wonder if you accept job postings?  Please let me know where to post, if so.  
Thank you.
Position is with a worldwide leader of heat transfer technology located
in Racine,
Wisconsin.
-------------------==== Posted via Deja News ====-----------------------
      http://www.dejanews.com/     Search, Read, Post to Usenet
Return to Top
Subject: Re: UNIX OPERATING SYSTEM, WHICH ONE!!!!!!!
From: balson@world.std.com (Jim Balson)
Date: Mon, 2 Dec 1996 21:28:31 GMT
Neil Schroeder (neil@rmi.net) wrote:
: Without question, Linux is the way to go.  
	Thats your opinion.
	It really depends on what he's going to be using it for. For me,
and this is my opinion,  it boils down to 2 issues:
	1) Motif/CDE
	2) source code
	Q: Do you want Motif and/or CDE (eventually)
	A: Unixware is the way to go because Motif is free in SCO now,
	   Free in unixware in January. CDE will be free with the release of
	   Gemini in mid-97. CDE for Linux is $250.00 last I heard. Motif
	   is $100.00 or so.
	If ya just need the source code for the OS you're running, then Linux.
	If ya don't need source code, get SCO Unix now, or hold off for 
Unixware in January. You can still reference the Linux source implementation 
if you are using Unixware if you need an example of how something may have 
been implemented.
	For me, all things being equal, I prefer a commercial Operating System.
	Just expressing my right to voice my opinion.
	Here's a question for anyone who would like to take as stab at it:
	Q: Which kernel is really better? The linux kernel developed by Linus,
	or the SVR4 kernel developed by Sun/Bell Labs ?  And why? Explain what
	makes one better than the other.
Jim
-- 
Jim Balson 
balson@world.std.com
Return to Top
Subject: Re: Dist'n of mean abs. Dev.
From: "Robert E Sawyer"
Date: 2 Dec 1996 23:10:49 GMT
Letting z_i be iid N(0,1) r.v.'s, we can write x_i=mean+sigma*z_i
so D=sum|x_i - mean|=sum|sigma*z_i|=sigma*sum|z_i|.
Then 
E(D)=sigma*n*E(|z_1|)=sigma*n*sqrt(2/pi) 
V(D)=(sigma^2)*n*V(|z_1|)=(sigma^2)*n*[E(|z_1|^2)-(E|z_1|)^2]=(sigma^2)*n*[1-2/pi]
(hope I did the expected value integrals right).
Robert E Sawyer 
soen@pacbell.net
_________________
Roland Jeske  wrote in article
...
| Given a sample X_1,...X_n from a normal distribution N(mu,sigma**2).
| 
| What is the dist'n of
| 
| D=sum_{i=1}^n |X_i-mean| ?
| 
| or at least, what is its expectation?
| 
Return to Top
Subject: Re: Implausible null hypotheses
From: Bill Simpson
Date: Mon, 2 Dec 1996 15:17:24 -0600
>The "moronic" idea is to let some quasi-religious mantra decide what
>action to take.  You have decisions to make; statistical decision theory
>is designed to help YOU make the decision appropriate for YOU; use it,
>instead of following the blind.
>-- 
>Herman Rubin
Yes I agree if you have a bona fide decision-making situation on your
hands, better use decision theory.  But I don't think we are usually in
that situation in science.  We typically want to estimate some parameter
rather than make a decision.
Bill Simpson
Return to Top
Subject: Re: Meaning of Correlation Coefficient
From: Michael Kamen
Date: 3 Dec 1996 02:19:54 GMT
Richard,
wpilib+@pitt.edu (Richard F Ulrich) wrote:
>Michael Kamen (mbkamen@facstaff.wisc.edu) wrote:
>: Greetings All!
>
>: How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a 
>: higher correlation between sample 1 and 2, as rho^2 gets closer 
>: to 1?
>
>  -- Even without understanding the right side of your equation,
>I suggest that THAT is a silly question:  Assuming  rho  is a
>correlation, "closer to 1"  is  "higher".   rho^2  is monotonic
>in rho, rho positive.
I think it was Herby Dinkins who once said (and very well too), 
"the only silly question is the one that is not asked,":).  
>: The larger is (sigma^2_2 - sigma^2_1), surely the larger is 
>: the difference between the samples.  
>
>
>Rewriting your formula,
>   R= (B-A)/B  =  1 - A/B
>
>So if B is bigger/A is smaller, "the difference" and R both 
>increase.  But, offhand, I do not recognize your formula.
If you have a copy of Probability and Statistics for Engineers, 
4th edition, Miller et al, Prentice Hall, New Jersey,
please turn to page 360 and note at the bottom that the 
aforementioned formula is the definition of the Population 
Correlation Coefficient.
In your version of the formula, note that if B >> A, A/B gets 
small making R-->1.  |One| is defined as "perfect correlation."  
That is exactly my problem!  The question of terms like "higher" 
or "lower" aside -- if the variances of 2 samples differ a lot 
they are usually thought of as not likely to come from the same 
population.  Why then, in the case of Cor. Coef. do we draw a 
somewhat reverse conclusion?
Please do not take me to be trying to refute decades (centuries?) 
of statistical precedent :).  I am only trying to understand this 
as best as I can (with the admittedly frail instrument at my 
disposal).  
Thanks,
Michael
Return to Top
Subject: OLS - Dependence in the error term
From: d4t@jano.com (d4t)
Date: Tue, 03 Dec 1996 03:45:20 GMT
For ordinary least sqares regression what is meant by the term
Dependence (in the error term) ?
Many, many thanks for an answer and an authoritaive cite.
Return to Top
Subject: Re: Statistics of outcomes of competitions
From: Ellen Hertz
Date: Mon, 02 Dec 1996 22:06:14 -0500
Eric Bartels wrote:
> 
> Hello,
> 
> I am looking for pointers to publications in the biology/statistics
> community where the following problem arose.
> 
>         An experiment is conducted in which n individuals participate. Some
>         individuals can compete with other individuals for a resource. For
>         each individual the list of the possible opponents is fixed at the
>         beginning. During the observation time some of these competitions take
>         place and for each one the winner is noted. Some competitions may end
>         in a tie in which case the competition is being treated as if it has
>         not taken place.
>         At the end we obtain for each individual the number of wins it has
>         achieved during the observation time.
> 
> This outcome data is to be tested statistically in order to test
> certain hypotheses about the population and certain subsets of it.
> 
> I am curious to learn whether this or related problems have been dealt
> with in the statistics / biology literature. Any paper where a similar
> problem was analysed or pointers to the statistical method of choice
> would be very much appreciated.
> 
> thanks,
> 
> Eric Bartels
It seems possible that you could view this as a 1-1 case-control
data set. Each game that had a winner would be a stratum
and the winner would be the case, the loser the control. Each player
except one for reference would have a dummy variable whose contribution 
(positive or negative) to the odds of being the winner
would be estimated. A reference is Hosmer and Lemeshow.
Ellen Hertz
Return to Top
Subject: Sample & power calc advice needed
From: mehla@netcom.com (Mike Hollis)
Date: Tue, 3 Dec 1996 01:59:44 GMT
I'm looking for references and/or other guidance to help me formulate sample 
size and power calculation expressions under _both_ completely randomized 
and matched paired assignment to treatment and control groups given:
(1) a simple difference of means test is to be preformed;
(2) k pre-treatment and j post-treatment measures are available for both 
    groups;
(3) I have reasonably good estimates of means and variances of the 
    criterion for k pre-treatment time periods for the population of 
    interest;
(4) I have a hypothesized value for the post-intervention percentage 
    difference between treatment and control groups on the criterion 
    variable.
Direct replies via e-mail would be appreicated.
Thanks in advance.
Return to Top
Subject: Re: Output unit scaling ?
From: dodier@bechtel.Colorado.EDU (Robert Dodier)
Date: 3 Dec 1996 05:32:39 GMT
ebx@cs.nott.ac.uk (Edward A G Burcher) writes:
>I am trying to build a 7-10-10-3 feedforward neural net, with full
>connectivity between successive layers but no (direct) connectivity between
>non-adjacent layers. I am currently using the standard sigmoid function
>as my activation function. The problem is that I have training and test data
>where all seven inputs typically vary in a small range 0-0.3 . I am quite happy
>to normalise this data; However, my 3 output units are tricky to deal with.
>One of them varies in the range 200-20000, the second from 100-300 and the
>third 20-70. Clearly, with such large numbers, the network will find it 
>difficult (impossible?) to be trained on such data, as my activation function
>only gives output in the [0,1] range. I have heard it is possible to adapt the
>sigmoid function to give a nonlinear activation function with a larger range.
>How is this done exactly, and would it be a suitable technique for solving the
>problem ?
If the network is to be used as a regression model (as you are doing),
it is appropriate to use linear output units. (If you have a nonlinear
output unit and graft on a scaling function, you have added a linear unit,
in effect.)
About scaling output values: whether this is necessary depends on your
training algorithm. Simple gradient descent (i.e. backpropagation) will
fail if the magnitudes of the outputs are very different, so you should
scale them all to be roughly the same size. In this situation, I always
subtract the mean and divide by the standard deviation so that I get 
numbers that are more or less in the range -3 to +3 or so. Smarter 
algorithms, e.g. conjugate gradient, are not sensitive to output magnitudes,
so scaling is not needed.
>The network is being used to infer these values from measurements, so I need
>to preserve the original values. One possiblity I have considered is to apply
>a function to each of the outputs to scale it into the [0,1] interval, and
>to apply the inverse of that function when I require the values back again.
>Is this a safe approach, if I have to apply a different scaling function to
>each output unit ? 
Yes, just apply the inverse of the scaling function to recover values
in the original units. BTW with a linear output unit, you can compose the 
hidden-layer-to-output-unit function with the inverse scaling to get
a modified set of weights for the final layer; then you don't have to
explicitly worry about the scaling business once the network is trained.
(Same with input scaling, if you're using it.)
>I was thinking of something simple as a scaling function such as
>
>(unit value - min value the unit can take ) / ( max value that unit can take - min value )
>
>Is this suitable (assuming all of this is a valid approach) or are there better
>ways to scale ?
No, sooner or later you'll find a value that goes beyond the boundaries.
In any event, scaling to [0,1] is unnecessary when the output unit is
linear -- as it should be for your problem. Any reasonable scheme will
work -- I suggest subtracting the mean and dividing by standard deviation.
There is a good reason to prefer the (x-mean)/stddev scheme, though --
which is that if you compute mean-square-error using output values 
transformed in that way, you get the MSE of the untransformed values
divided by the variance of the output. Some people call this the
`normalized MSE.' It is 1 if your network does no better than putting out
(outputting?) the mean; if it is more than 1 you are in bad shape!
A value of, say, 0.10 has an obvious interpretation -- the MSE of
prediction is 10 percent as large as the variance of the output.
Hope this helps. Believe it or not, in the not-so-distant past I asked
questions just like this too... Wait, I _still_ ask questions like this. :)
Robert Dodier
-- 
``Ainda nos faz lembrar os belos tempos'' -- on the prow of a fishing boat.
Return to Top
Subject: Re: Meaning of Correlation Coefficient
From: "Robert E Sawyer"
Date: 3 Dec 1996 05:54:44 GMT
If you fit a "straight-line" linear model (linear in the parameters A, B) 
y_i = A + B*x_i + err_i 
to data {(x_i, y_i), i=1,n} to get 
yhat_i = a + b*x_i,
with a,b the resulting estimators for A,B
then the coefficient of *linear* correlation R is such that
R^2 = (MST - MSE)/MST = 1 - MSE/MST  
where 
MST = "mean square total" = (sum(y_i-ybar)^2)/n   
MSE = "mean squared error"= (sum(yhat_i -y_i)^2)/n.
This form lends itself to the interpretation:
MST = "total (y-)variation" in the independent variable y,
MSE = "(y-)variation UNexplained" by the linear model
(MSE = 0 iff all yhat_i = y_i, i.e. all the data points lie on a straight line)
MST-MSE = "(y-)variation explained" by the linear model 
This should answer your question:
"better fit" <-> proportionately smaller MSE <-> proportionately greater MST-MSE
-- 
Robert E Sawyer 
soen@pacbell.net
_________________
Michael Kamen  wrote in article <57ssnj$1cra@news.doit.wisc.edu>...
| Greetings All!
| 
| How is it that rho^2 = (sigma^2_2 - sigma^2_1)/sigma^2_2 shows a 
| higher correlation between sample 1 and 2, as rho^2 gets closer 
| to 1?
| 
| The larger is (sigma^2_2 - sigma^2_1), surely the larger is 
| the difference between the samples.  
| 
| Can anyone help me out here?  
| 
| 
| 
| 
Return to Top
Subject: Logit & Probit by TSP
From: Tatsuo Ochiai
Date: Tue, 03 Dec 1996 01:28:19 -0800
I am wondering what the algorithms and the convergence criteria TSP uses
for Logit and Probit model.
Thanks in advance.
Tatsuo Ochiai
tochiai@students.wisc.edu
Return to Top

Downloaded by WWW Programs
Byron Palmer