Newsgroup sci.stat.math 11745

Articles

Subject: Re: Measure of homogenity
From: ebohlman@netcom.com (Eric Bohlman)
Date: Sun, 17 Nov 1996 20:57:33 GMT

Dr. Torben Wiede (torben.wiede@sowi.uni-bamberg.de) wrote:
: I am looking for measures of homogenity (or 
: distance) for two distributions of nominal scaled 
: variables.
: These measures should not compare only the location 
: or scale parameters of these distributions but the 
: whole informations of the distributions.
It's not meaningful to talk about location or scale parameters for 
distributions of nominally scaled variables.
In most cases, measuring the distance between two multinomial 
distributions is the equivalent of measuring the association between the 
nominal variable and a dichotomous variable indicating group membership.  
If that's a reasonable assumption in your case, you can use any standard 
measure of nominal-nominal association, such as the measures based on 
Pearson's chi-square (Cramer's V, contingency coefficient) or on 
proportional reduction in variation (Goodman & Kruskal's lambda, Goodman 
& Kruskal's tau or Thiel's U).
: Where can I find some literature about such 
: measures.
Try _Categorical Data Analysis_ by Alan Agresti (Wiley, 1990).

Return to Top

Subject: SEM Assumptions
From: William Stroup
Date: Sun, 17 Nov 1996 21:18:25 -0800

Does anyone know what the assumptions of Structural Equation Modelling are?  Can they be 
viewed as extensions of path analysis, which itself is an extension of OLS regression 
and its assumptions?
I know that there are many books on the subject, most notably the series published by 
SAGE.  But, these are either checked out of my university library or are missing or 
mis-shelved.
Helpless in Purdue!!!!!
Thanks in advance,  Bill Stroup

Return to Top

Subject: SEM Assumptions
From: William Stroup
Date: Sun, 17 Nov 1996 21:15:49 -0800

ahmed shabbir wrote:
> 
> A function f(x,w), where w is a random variable and x is deterministic,
> is convex in x for fixed w, and is also convex in w for fixed x. We know
> that the expectation: E[f(x,w)] is then convex in x. Is the variance:Does anyone know what the assumptions of Structural Equation Modelling are?  Can they be 
viewed as extensions of path analysis, which itself is an extension of OLS regression 
and its assumptions?
I know that there are many books on the subject, most notably the series published by 
SAGE.  But, these are either checked out of my university library or are missing or 
mis-shelved.
Helpless in Purdue!!!!!
Thanks in advance,  Bill Stroup

Return to Top

Subject: Re: Ratios of two portions under a discrete Normal pdf same as their continuous counterparts?
From: aacbrown@aol.com
Date: 18 Nov 1996 03:48:49 GMT

daley@albany.net in <848209470.18934@dejanews.com> writes:
> Thanks Aaron for pointing me to the right  direction :) Using
> a function : CumNormal(x,Mean,StDev) I am calculating this:
> (S) = {1-CumNormal(0,2,40)}/CumNormal(0,2,40) so effectively
> I am dividing the area under my pdf from 0 to oo with the area
> under my pdf form -oo to 0. Now when I compare (S) with the
> number (T) I get by generating 50,000 members of the N(2,40)
> distribution and dividing the sum of all positives with the sum
> of all negatives (S) and (T) differ significantly!
Yes, they would differ. This is true for two reasons.
Your CumNormal function gives the probability that a Normal variate will
be less than x. You want the sum of variates less than x. The sum of k
standard Normal variate, ignoring the variates less than c, is
exp(-c*c/2)*(2*pi)^-0.5.
In your case c=-.05 (since 0 is 0.05 standard deviations below your mean)
and k is 50,000 so your positive x's will sum to 0.3984 standard
deviations above the mean or 896,888. Your negative x's will sum to
100,000 minus this or -796,888 (the positive and negative x's have to sum,
on average, to 2*50,000).
With 50,000 observations the standard deviations of these totals are on
the order of 10,000. Therefore the simulated ratio should be very close to
the ratio of expected sums. However the expected value of the ratio of
sums will still be infinite, although you would have to do an awful lot of
repetitions of your sample of 50,000 to demonstrate this.
Aaron C. Brown
New York, NY

Return to Top

Subject: Re: Please read...I NEED HELP WITH THIS.
From: aacbrown@aol.com
Date: 18 Nov 1996 03:55:30 GMT

S.C. DeJaegher in <56h5cp$b60@canopus.cc.umanitoba.ca> asks for comments
on a study design.
Your design is standard and reasonable. You should think about whether you
want to treat physical abuse and sexual abuse as separate dimensions, but
this should not significantly affect your results one way or the other.
One important caution is that all your data come from tests administered
at the same time in the same circumstances. Therefore you may get a
spurious effect based on testing situation. It is always better to study
associations of data from independent measurements.
Aaron C. Brown
New York, NY

Return to Top

Subject: Math Mag Problem
From: bryan@feeding.frenzy.com (Bryan Nelson)
Date: Mon, 18 Nov 1996 03:38:46 GMT

I could really use the ansers to the Article published in Mathematics
Magazine Vol 69, #4 October 1996, It was Steve Gadbois Poker with Wild
Cards Problem were he tries to find the Frequenzy of occurence of
certain hands when the two jokers are introduced as wild cards to a
poker hand of 5
please either post or send to bryan2feeding.frenzy.com asap

Return to Top

Subject: Re: Occam's razor & WDB2T [was Decidability question]
From: n_nelson@ix.netcom.com(Neil Nelson)
Date: 18 Nov 1996 02:24:21 GMT

|>< Radford Neal: 
|>< 
|>< One often sees people using priors that are such that the  
|>< effective complexity of the model increases as the amount of  
|>< data increases.  This makes no sense - it amounts to using a  
|>< prior that one knows is going to be contradicted by future  
|>< data. 
| 
|Neil Nelson  wrote: 
| 
|>... Of course the difficulty here is the  
|>determination of the prior probabilities and algorithmic  
|>relation, for which our only effective recourse is an analysis  
|>of the previously and currently available data.  This implies  
|>that our prior probabilities and algorithm may change depending  
|>on any increase in the available data; or more simply, we would  
|>not want to hold to our previous judgment if new information  
|>indicated we were previously in error. 
| 
| Radford Neal: 
| 
| This is not the case for a full Bayesian analysis, since the  
| prior decided on before any data is collected will implicitly  
| contain all the revisions of judgement that would be prompted  
| by any possible data set. 
| 
| In practice, a Bayesian is likely to use a model and prior  
| that do not contain certain possibilities that seem very  
| unlikely at first, simply because formalising all these  
| possibilities is too much work.  If the actual data indicate  
| that these possibilities need to be considered, then the  
| Bayesian might revise the prior and model, perhaps adopting a  
| more complex one. 
| 
| However, I think that this scenario has little to do with the  
| usual reasons why people think that you can't use complex  
| models with small datasets.  The usual reasons are not  
| compatible with a Bayesian viewpoint. 
I would not assert that complex models can't be used with small  
datasets, knowing that the complexity of a model may be  
increased to any arbitrary degree beyond the minimum required  
to describe a data set and still maintain conformance to that  
small description.  That is, we may add here any number of  
sentences to our explanation without modification of the  
initial sentences sufficient to provide the required content.   
Given that complexity beyond the necessary minimum reduces the  
efficiency of the description--in that we have to maintain a  
portion of our description for which no use is subsequently  
made--, we may question the usage of complex models with small  
datasets. 
In may be that the intent of a more complex model is to provide  
a more general application to a variety of small datasets.  If,  
say, we had a complex Bayesian model that could be applied to a  
large number (portion) of small datasets, it would be  
convenient, upon the appearance of a new small dataset, to  
apply the prior complex Bayesian model, on the probable  
expectation of good fit, instead of spending the effort to  
build a new model from scratch.  If we could create a composite  
model of a large portion of small datasets with an effective  
means to select the applicable components of the composite  
model, we may reduce our overall effort. 
In complexity it is common to consider data to be a binary  
string of 0's and 1's and an objective to identify the  
smallest program for a given language that will generate the  
considered binary data.  If we specify the property 'small  
dataset' as those binary strings of length n and 'complex  
model' to be the binary string of length m representing a  
program P that will identify--in the sense of matching some  
representation internal to the program--all strings of length  
n, we can note that m (length of P) should be less than about  
n2^n--the sum of the lengths of all possible combinations of  
binary strings of length n--and, given that most of these  
strings are incompressible, m will be roughly greater than  
n2^(n-1) or (n2^n)/2.  For all 32 bit binary strings, m will  
be greater than 6.8E10 bits.  The use of approximation  
and/or domain restriction has the effect of reducing n to n-k  
such that matching takes place on the first n-k bits.  
The essential result of the previous sequence is that a prior  
model intending to address a significant portion of all small  
datasets quickly becomes prohibitively large. 
Fortunately, real life applications have available a large  
number of well known restrictions (useful properties) such that  
a prior model of large complexity can be assembled that can be  
expected to have wide application.  E.g., education may be  
viewed as the competitive advantage of constructing, via known  
restrictions, a complex model whose alternate assembly by trial  
and error experience results in a prohibitive binary search of  
an extremely large potential model containing all models of  
complexity less than or equal to the educated model. 
Prior models are useful in correspondence to the known 
restrictions (properties) of the application. 
Neil Nelson

Return to Top

Subject: Re: Occam's razor & WDB2T [was Decidability question]
From: webb@deakin.edu.au (Geoff Webb)
Date: 18 Nov 1996 04:52:21 GMT

In article , saswss@hotellng.unx.sas.com (Warren Sarle) writes:
>
>In article <56dgil$fcs@netserv.waikato.ac.nz>, maj@waikato.ac.nz (Murray Jorgensen) writes:
>|> I have looked at Geoff Webb's article in
>|> http://www.cs.washington.edu/research/jair/table-of-contents-vol4.html
>|> and it seems to conflict with all my intuition built up as a practising 
>|> statistician.
>|> ... It is widely accepted in the statistical 
>|> community that 'overfitting' of a data set [using a needlessly complex 
>|> model] results in a fitted model closely tuned to that particular data 
>|> set that has poor predictive power. This is not to say that there is not 
>|> additional complexity to be discovered, just that the data set under 
>|> consideration does not contain enough information about possible 
>|> elaborations to the model to make it safe to fit them.
>
>I will try briefly to appease Murray's statistical intuition. The
>problem with Geoff Webb's interpretation of his interesting and possibly
>very useful work has to do with the meaning of "complexity". Whether the
>number of splits or leaves in a tree-based model is a measure of the
>model's complexity depends on how the tree is grown. Consider a
>nonlinear regression (i.e. function approximation) problem. Suppose my
>prior beliefs indicate that the regression function is smooth, as is
>often the case in real life. Regression trees tend to sacrifice
>smoothness for interpretability. But I could obtain a smooth regression
>tree by doing some form of smooth regression, such as kernel regression,
>and then growing a tree with billions of leaves to approximate the
>smooth kernel regression surface instead of approximating the original
>data.  The size of the resulting tree would not be a measure of the
>tree's complexity--in fact, one could argue that the bigger the tree,
>the simpler it is!
I would like to make two responses.
First, my paper addresses the common application of a technique or principle
in machine learning, often called Occam's razor, that seeks to minimise
the surface syntactic complexity of the inferred classifier in the expectation
that doing so will in general increase predictive accuracy.  I believe that
I have provided strong evidence that this is misguided.
Second, I think that this analysis gives reason to rethink a general, often
uncritical, acceptance of Occam's razor in a broader context.
If I understand you, you suggest that the more complex decision trees
that C4.5x produces might map onto less complex things at some other level of
analysis.  This seems like wishful thinking to me.  To be convinced of
such an argument I would need to be convinced that there was a single
correct complexity metric.  Otherwise it will always be possible that
there is some other metric in which something turns out to be less complex
and the whole debate becomes pointless.
Geoff.
----------
Geoff Webb
School of Computing and Mathematics,
Deakin University, Victoria, 3217, Australia.
E-mail:	webb@deakin.edu.au

Return to Top

Subject: Re: Occam's razor & WDB2T [was Decidability question]
From: kenneth paul collins
Date: Mon, 18 Nov 1996 02:56:40 -0500

Geoff Webb wrote:
> First, my paper addresses the common application of a technique or principle
> in machine learning, often called Occam's razor, that seeks to minimise
> the surface syntactic complexity of the inferred classifier in the expectation
> that doing so will in general increase predictive accuracy.  I believe that
> I have provided strong evidence that this is misguided.
I agree that the approach you refute is a misguided approach.
> Second, I think that this analysis gives reason to rethink a general, often
> uncritical, acceptance of Occam's razor in a broader context.
This doesn't compute. The "surface syntactic complexity" minimization technique is 
completely locked up in the syntactic rules that are arbitrarily chosen. As long as such 
syntactic rules are in the mix, neither those syntactic rules, nor anything which 
refutes them, is generalizable. All such attempts can be disproven by presenting the 
system with something that "breaks" the syntax. (This is also my main objection to 
Goedel's "Incompleteness".)
> If I understand you, you suggest that the more complex decision trees
> that C4.5x produces might map onto less complex things at some other level of
> analysis.
"Trees" can be discounted in their entirety. The answer exists at a more-fundamental 
level. This can be taken as proven from an analysis of evolutionary dynamics. Evolution 
began with no "trees". "Tree"-like stuff, including cognitive capabilities to conceive 
of "trees", was created as evolutionary dynamics unfolded.
> This seems like wishful thinking to me.  To be convinced of
> such an argument I would need to be convinced that there was a single
> correct complexity metric.
This is the whole point of why I am arguing from the perspective of what's described 
by the 2nd Law of Thermodynamics (WDB2T). I know of nothing that does not reduce 
directly to WDB2T. The only "difficulty" that needs to be understoon is how one gets 
from a gradient, which is what WDB2T is, to structures of any and type and any 
complexity. And this is solved. Let me know if you want to explore a bit. ken collins

Return to Top

Subject: The Problem...
From: bryan@feeding.frenzy.com (Bryan Nelson)
Date: Mon, 18 Nov 1996 06:19:30 GMT

Since no one answered my first request maybe if I give the problem
Someone could answer it for me...
We are given a standard deck of cards (52) with 2 jokers.  You must
find the frequenzy of the advent happening.  With the jokers counting
as wild cards. The conditions are you are given a 5 card deck.
this is just an example...
For a four of a kind, with out repeated answers there are three
distinct ways to find the frequency of the advent happening..
a) no joker of the two, one demonination from the thirteen, slect four
of that demonination, select one demonination from the remaining
twelce, and select one of the four of that demonination.
b) select one joker of the two, select on demonination from the
thirteen, select three of the four of that demonination from the
remaining twelve, and select one of the four of that demonination.
c) select two jokers of the two, select one demonination from the
thirteen, select two of the four of that demonination, selct one
demonination of the four of that demonination.
 which gives you an equation like this...
a) (2/0)(13/1)(4/4)(12/1)(4/1)
b) (2/1)(13/1)(4/3)(12/1)(4/1)
c) (2/2)(13/1)(4/2)(12/1)(4/1)
a+b+c = 9360
( 2/1)= Combination(2,1)
now try the other hands such as
Five of kind
Royal flush
Straight Flush
Full house
Flush
Straight
Three of a kind
two pair
one pair
and a hand of junk

Return to Top

Downloaded by WWW Programs
Byron Palmer

Newsgroup sci.stat.math 11745

Directory

Articles