![]() |
![]() |
Back |
John Rogers (jrogers@arg.org) wrote: : Recently, in connection with a discussion of STATA software, I mentioned a situation in : which sampling weights were being used that did not sum to the sample size. Some : questions have come up about the use of these weights, and I would be grateful for any : opinions and especially for any references on this topic. : : The survey in question was carried out by a nationally recognized academic survey : research facility according to standard methods (design effects from cluster sampling : are involved, but that is not the issue here). Three subsamples were drawn to ensure : representation among the African American and Hispanic communities. The main sample was : selected at random from all Listing Areas, and so includes some members of the latter : two populations (2178 interviews). The "oversample" was selected at random from Listing : Areas with at least 5% African American or Hispanic populations (2747 interviews). : Among other weigting adjustments, post-stratification weights were calculated to : match known population distributions for region, race / ethnicity, and sex. Thus, : members of the oversampled groups were given lower relative weights than the White, : non-Hispanic members of the main sample. The sampling weights were scaled so that they : sum to the total sample size, 4925 cases. : : The procedure described above is a common one, but some in our office have concerns : about it. Specifically, it seems that at least in some analyses this system effectively : takes the N from the oversample and allocates it to the main sample. Imagine a : cross-classification that overlaps to some extent with sample membership, and you can : see that White, non-Hispanic respondents will be treated as if they existed in far : greater numbers than are actually present. Consequently, variances will be deflated for : any statistics. The solution seems to be to scale the weights to some smaller number, : possibly the size of the main sample (this has been done in some previous studies). : : The problem? Some statistical programs cannot accept weights that don't sum to the : sample size. One of those programs (Stata) is otherwise highly desirable because of its : facility to handle clustered samples, and there others I'd like to be able to use as : well. The choice is not a trivial one - literally dozens of reports are likely to be : published from this data, and there is concern among the scientists involved that the : published analyses be as consistent as possible with one another. : : Any comments on the statistical ramifications of this weighting issue would be most : welcome (Hint to the folks at Stata - here's you're chance to make a sale!). I would : particularly appreciate any references to credible literature, as I have not been able : to find much on the subject. Let me make a few comments here. 1. It is standard practice in survey statistics for the weight to sum to N (the population size), not n (the sample size). This is not an oversight. It is so that descriptive statistics can be computed easily. For example, an estimate of total expenditure in the population would be the weighted sum of the expenditures by the individual sample units. 2. Software packages designed for survey data handle the survey weights from (1.) without need for any adjustment by the user. 3. I am not a Stata user. I know, however, that John L. Eltinge (Texas A&M;) and William M. Sribney (Stata Corp.) have written "macros" (if that is what they are called) that enable one to use Stata for analyzing survey data. John is a respected survey statistician. I don't think these macros are yet officially supported by Stata so you may not hear of them through the Stata tech help. 4. The May 1996 issue of the Stata Technical Bulletin has 5 papers on using Stata on survey data. One can download (presently, but maybe for not much longer) the first paper off the Web: http://www.stata.com/info/products/stb/prevstbfeature.html 5. Just rescaling your weights will not yield a correct analysis. If you want to use Stata, I recommend the approach in (3). I hope this helps. -- Michael P. Cohen home phone 202-232-4651 1615 Q Street NW #T-1 office phone 202-219-1917 Washington, DC 20009-6331 office fax 202-219-2061 mcohen@cpcug.orgReturn to Top
Please recommend a source for a novice to learn about modeling dose levels and exploring the effect of different ingestion patterns. I have been exploring difference equations but my math is weak. Hope that this is an appropriate question for this list. Thanks. Fancher E. Wolfe, Professor Mathematics and Statistics Metropolitan State University 730 Hennepin Ave. Minneapolis, MN 55403-1897 fwolfe@msus1.msus.edu 612-341-7256Return to Top