![]() |
![]() |
Back |
Dr. Torben Wiede (torben.wiede@sowi.uni-bamberg.de) wrote: : I am looking for measures of homogenity (or : distance) for two distributions of nominal scaled : variables. : These measures should not compare only the location : or scale parameters of these distributions but the : whole informations of the distributions. It's not meaningful to talk about location or scale parameters for distributions of nominally scaled variables. In most cases, measuring the distance between two multinomial distributions is the equivalent of measuring the association between the nominal variable and a dichotomous variable indicating group membership. If that's a reasonable assumption in your case, you can use any standard measure of nominal-nominal association, such as the measures based on Pearson's chi-square (Cramer's V, contingency coefficient) or on proportional reduction in variation (Goodman & Kruskal's lambda, Goodman & Kruskal's tau or Thiel's U). : Where can I find some literature about such : measures. Try _Categorical Data Analysis_ by Alan Agresti (Wiley, 1990).Return to Top
Does anyone know what the assumptions of Structural Equation Modelling are? Can they be viewed as extensions of path analysis, which itself is an extension of OLS regression and its assumptions? I know that there are many books on the subject, most notably the series published by SAGE. But, these are either checked out of my university library or are missing or mis-shelved. Helpless in Purdue!!!!! Thanks in advance, Bill StroupReturn to Top
ahmed shabbir wrote: > > A function f(x,w), where w is a random variable and x is deterministic, > is convex in x for fixed w, and is also convex in w for fixed x. We know > that the expectation: E[f(x,w)] is then convex in x. Is the variance:Does anyone know what the assumptions of Structural Equation Modelling are? Can they be viewed as extensions of path analysis, which itself is an extension of OLS regression and its assumptions? I know that there are many books on the subject, most notably the series published by SAGE. But, these are either checked out of my university library or are missing or mis-shelved. Helpless in Purdue!!!!! Thanks in advance, Bill StroupReturn to Top
daley@albany.net in <848209470.18934@dejanews.com> writes: > Thanks Aaron for pointing me to the right direction :) Using > a function : CumNormal(x,Mean,StDev) I am calculating this: > (S) = {1-CumNormal(0,2,40)}/CumNormal(0,2,40) so effectively > I am dividing the area under my pdf from 0 to oo with the area > under my pdf form -oo to 0. Now when I compare (S) with the > number (T) I get by generating 50,000 members of the N(2,40) > distribution and dividing the sum of all positives with the sum > of all negatives (S) and (T) differ significantly! Yes, they would differ. This is true for two reasons. Your CumNormal function gives the probability that a Normal variate will be less than x. You want the sum of variates less than x. The sum of k standard Normal variate, ignoring the variates less than c, is exp(-c*c/2)*(2*pi)^-0.5. In your case c=-.05 (since 0 is 0.05 standard deviations below your mean) and k is 50,000 so your positive x's will sum to 0.3984 standard deviations above the mean or 896,888. Your negative x's will sum to 100,000 minus this or -796,888 (the positive and negative x's have to sum, on average, to 2*50,000). With 50,000 observations the standard deviations of these totals are on the order of 10,000. Therefore the simulated ratio should be very close to the ratio of expected sums. However the expected value of the ratio of sums will still be infinite, although you would have to do an awful lot of repetitions of your sample of 50,000 to demonstrate this. Aaron C. Brown New York, NYReturn to Top
S.C. DeJaegher in <56h5cp$b60@canopus.cc.umanitoba.ca> asks for comments on a study design. Your design is standard and reasonable. You should think about whether you want to treat physical abuse and sexual abuse as separate dimensions, but this should not significantly affect your results one way or the other. One important caution is that all your data come from tests administered at the same time in the same circumstances. Therefore you may get a spurious effect based on testing situation. It is always better to study associations of data from independent measurements. Aaron C. Brown New York, NYReturn to Top
I could really use the ansers to the Article published in Mathematics Magazine Vol 69, #4 October 1996, It was Steve Gadbois Poker with Wild Cards Problem were he tries to find the Frequenzy of occurence of certain hands when the two jokers are introduced as wild cards to a poker hand of 5 please either post or send to bryan2feeding.frenzy.com asapReturn to Top
|>< Radford Neal: |>< |>< One often sees people using priors that are such that the |>< effective complexity of the model increases as the amount of |>< data increases. This makes no sense - it amounts to using a |>< prior that one knows is going to be contradicted by future |>< data. | |Neil NelsonReturn to Topwrote: | |>... Of course the difficulty here is the |>determination of the prior probabilities and algorithmic |>relation, for which our only effective recourse is an analysis |>of the previously and currently available data. This implies |>that our prior probabilities and algorithm may change depending |>on any increase in the available data; or more simply, we would |>not want to hold to our previous judgment if new information |>indicated we were previously in error. | | Radford Neal: | | This is not the case for a full Bayesian analysis, since the | prior decided on before any data is collected will implicitly | contain all the revisions of judgement that would be prompted | by any possible data set. | | In practice, a Bayesian is likely to use a model and prior | that do not contain certain possibilities that seem very | unlikely at first, simply because formalising all these | possibilities is too much work. If the actual data indicate | that these possibilities need to be considered, then the | Bayesian might revise the prior and model, perhaps adopting a | more complex one. | | However, I think that this scenario has little to do with the | usual reasons why people think that you can't use complex | models with small datasets. The usual reasons are not | compatible with a Bayesian viewpoint. I would not assert that complex models can't be used with small datasets, knowing that the complexity of a model may be increased to any arbitrary degree beyond the minimum required to describe a data set and still maintain conformance to that small description. That is, we may add here any number of sentences to our explanation without modification of the initial sentences sufficient to provide the required content. Given that complexity beyond the necessary minimum reduces the efficiency of the description--in that we have to maintain a portion of our description for which no use is subsequently made--, we may question the usage of complex models with small datasets. In may be that the intent of a more complex model is to provide a more general application to a variety of small datasets. If, say, we had a complex Bayesian model that could be applied to a large number (portion) of small datasets, it would be convenient, upon the appearance of a new small dataset, to apply the prior complex Bayesian model, on the probable expectation of good fit, instead of spending the effort to build a new model from scratch. If we could create a composite model of a large portion of small datasets with an effective means to select the applicable components of the composite model, we may reduce our overall effort. In complexity it is common to consider data to be a binary string of 0's and 1's and an objective to identify the smallest program for a given language that will generate the considered binary data. If we specify the property 'small dataset' as those binary strings of length n and 'complex model' to be the binary string of length m representing a program P that will identify--in the sense of matching some representation internal to the program--all strings of length n, we can note that m (length of P) should be less than about n2^n--the sum of the lengths of all possible combinations of binary strings of length n--and, given that most of these strings are incompressible, m will be roughly greater than n2^(n-1) or (n2^n)/2. For all 32 bit binary strings, m will be greater than 6.8E10 bits. The use of approximation and/or domain restriction has the effect of reducing n to n-k such that matching takes place on the first n-k bits. The essential result of the previous sequence is that a prior model intending to address a significant portion of all small datasets quickly becomes prohibitively large. Fortunately, real life applications have available a large number of well known restrictions (useful properties) such that a prior model of large complexity can be assembled that can be expected to have wide application. E.g., education may be viewed as the competitive advantage of constructing, via known restrictions, a complex model whose alternate assembly by trial and error experience results in a prohibitive binary search of an extremely large potential model containing all models of complexity less than or equal to the educated model. Prior models are useful in correspondence to the known restrictions (properties) of the application. Neil Nelson
In articleReturn to Top, saswss@hotellng.unx.sas.com (Warren Sarle) writes: > >In article <56dgil$fcs@netserv.waikato.ac.nz>, maj@waikato.ac.nz (Murray Jorgensen) writes: >|> I have looked at Geoff Webb's article in >|> http://www.cs.washington.edu/research/jair/table-of-contents-vol4.html >|> and it seems to conflict with all my intuition built up as a practising >|> statistician. >|> ... It is widely accepted in the statistical >|> community that 'overfitting' of a data set [using a needlessly complex >|> model] results in a fitted model closely tuned to that particular data >|> set that has poor predictive power. This is not to say that there is not >|> additional complexity to be discovered, just that the data set under >|> consideration does not contain enough information about possible >|> elaborations to the model to make it safe to fit them. > >I will try briefly to appease Murray's statistical intuition. The >problem with Geoff Webb's interpretation of his interesting and possibly >very useful work has to do with the meaning of "complexity". Whether the >number of splits or leaves in a tree-based model is a measure of the >model's complexity depends on how the tree is grown. Consider a >nonlinear regression (i.e. function approximation) problem. Suppose my >prior beliefs indicate that the regression function is smooth, as is >often the case in real life. Regression trees tend to sacrifice >smoothness for interpretability. But I could obtain a smooth regression >tree by doing some form of smooth regression, such as kernel regression, >and then growing a tree with billions of leaves to approximate the >smooth kernel regression surface instead of approximating the original >data. The size of the resulting tree would not be a measure of the >tree's complexity--in fact, one could argue that the bigger the tree, >the simpler it is! I would like to make two responses. First, my paper addresses the common application of a technique or principle in machine learning, often called Occam's razor, that seeks to minimise the surface syntactic complexity of the inferred classifier in the expectation that doing so will in general increase predictive accuracy. I believe that I have provided strong evidence that this is misguided. Second, I think that this analysis gives reason to rethink a general, often uncritical, acceptance of Occam's razor in a broader context. If I understand you, you suggest that the more complex decision trees that C4.5x produces might map onto less complex things at some other level of analysis. This seems like wishful thinking to me. To be convinced of such an argument I would need to be convinced that there was a single correct complexity metric. Otherwise it will always be possible that there is some other metric in which something turns out to be less complex and the whole debate becomes pointless. Geoff. ---------- Geoff Webb School of Computing and Mathematics, Deakin University, Victoria, 3217, Australia. E-mail: webb@deakin.edu.au
Geoff Webb wrote: > First, my paper addresses the common application of a technique or principle > in machine learning, often called Occam's razor, that seeks to minimise > the surface syntactic complexity of the inferred classifier in the expectation > that doing so will in general increase predictive accuracy. I believe that > I have provided strong evidence that this is misguided. I agree that the approach you refute is a misguided approach. > Second, I think that this analysis gives reason to rethink a general, often > uncritical, acceptance of Occam's razor in a broader context. This doesn't compute. The "surface syntactic complexity" minimization technique is completely locked up in the syntactic rules that are arbitrarily chosen. As long as such syntactic rules are in the mix, neither those syntactic rules, nor anything which refutes them, is generalizable. All such attempts can be disproven by presenting the system with something that "breaks" the syntax. (This is also my main objection to Goedel's "Incompleteness".) > If I understand you, you suggest that the more complex decision trees > that C4.5x produces might map onto less complex things at some other level of > analysis. "Trees" can be discounted in their entirety. The answer exists at a more-fundamental level. This can be taken as proven from an analysis of evolutionary dynamics. Evolution began with no "trees". "Tree"-like stuff, including cognitive capabilities to conceive of "trees", was created as evolutionary dynamics unfolded. > This seems like wishful thinking to me. To be convinced of > such an argument I would need to be convinced that there was a single > correct complexity metric. This is the whole point of why I am arguing from the perspective of what's described by the 2nd Law of Thermodynamics (WDB2T). I know of nothing that does not reduce directly to WDB2T. The only "difficulty" that needs to be understoon is how one gets from a gradient, which is what WDB2T is, to structures of any and type and any complexity. And this is solved. Let me know if you want to explore a bit. ken collinsReturn to Top
Since no one answered my first request maybe if I give the problem Someone could answer it for me... We are given a standard deck of cards (52) with 2 jokers. You must find the frequenzy of the advent happening. With the jokers counting as wild cards. The conditions are you are given a 5 card deck. this is just an example... For a four of a kind, with out repeated answers there are three distinct ways to find the frequency of the advent happening.. a) no joker of the two, one demonination from the thirteen, slect four of that demonination, select one demonination from the remaining twelce, and select one of the four of that demonination. b) select one joker of the two, select on demonination from the thirteen, select three of the four of that demonination from the remaining twelve, and select one of the four of that demonination. c) select two jokers of the two, select one demonination from the thirteen, select two of the four of that demonination, selct one demonination of the four of that demonination. which gives you an equation like this... a) (2/0)(13/1)(4/4)(12/1)(4/1) b) (2/1)(13/1)(4/3)(12/1)(4/1) c) (2/2)(13/1)(4/2)(12/1)(4/1) a+b+c = 9360 ( 2/1)= Combination(2,1) now try the other hands such as Five of kind Royal flush Straight Flush Full house Flush Straight Three of a kind two pair one pair and a hand of junkReturn to Top