2019年2月12日 星期二

Q&As Collection - Stratified and Cluster Sampling

Define the term stratified random sampling.
  • The population is divided into strata, i.e., mutually exclusive and exhaustive subgroups, before sampling. 
  • Strata are mutually exclusive, i.e., every member of the population belongs to only one stratum. 
  • Strata are exhaustive, i.e., no member of the population is excluded. 
  • Random samples are then taken separately from each stratum. 

Explain the circumstances when stratified random sampling may be expected to work well, and give an example of such a situation.
  • A stratified random sample can increase the precision of our estimates, i.e., reduce variance, if we stratify the population into relatively homogeneous strata so that most of the variation is between strata.
  • It is desirable to have stratum-specific estimates, e.g., for specific groups within the population.
  • A stratified random sample may be easier/cheaper to administer, i.e., the cost per observation may be reduced by stratification of the population into convenient strata for sampling.

Explain in words what is meant by the expressions 
(1) stratification with proportional allocation,;
(2) stratification with Neyman allocation, and;
(3) stratification with optimal allocation.
  • Proportional allocation: the sampling fraction is the same for all strata, i.e., it assigns sample sizes to strata in proportion to the stratum population size.
  • Neyman allocation: this is an allocation to minimise the variance of the stratified sampling estimator of the population mean, for a given sample size, for a fixed total cost, where costs of sampling are the same in all strata.
  • Optimal allocation: this is an allocation to minimise the variance of the stratified sampling estimator of the population mean, for a given sample size, for a fixed total cost, where costs of sampling vary from stratum to stratum.

Explain what sense each of "Neyman allocation" and "optimal allocation" makes the most effective use of resources. 
  • Neyman allocation is a special case of optimal allocation when the cost of obtaining an observation from each stratum is the same.
  • Neyman allocation is optimal in the sense that it minimises the variance of the stratified estimator of the population mean subject to a fixed total sample size n.
  • Optimal allocation is when the stratum-specific sample sizes nh are chosen to
    • minimise the variance of the stratified estimator of the population mean subject to fixed cost, or
    • minimise total cost subject to fixed variance.

Reasons why the optimal allocations are only theoretical.
  • as the values of Sh are typically unknown.
  • direct costs can be at best estimated.
  • as the solution to the optimal allocation formula typically results in non-integer solutions.

State the optimal allocation formula for the allocation of sample size in stratified random sampling. Define all the symbols used in the formula.

The optimal allocation formula, up to a constant of proportionality, is


where
  • nh - sample size for stratum h
  • Nh - total number of elements in stratum h
  • Sh - population standard deviation in stratum h
  • ch - direct cost of obtaining an observation from stratum h

Why should stratified sampling be expected to be an improvement on simple random sampling? When would this improvement be greatest? Illustrate your answer with reference to a practical survey with which you are familiar. 

  • If the population can be divided into useful strata, the variance within strata should be much less that the overall population variance. 
  • Stratification will then improve the precision of estimates considerably.
  • Examples would be population surveys in urban and rural parts of a region, agricultural surveys in different climatic and soil conditions, social surveys in which age-groups are the strata, industrial surveys of small and large companies, and so on. 
  • The improvement by stratification is greatest when the population is stratified by the value of the quantity to be measured in the survey, or some variable highly correlated with it. 

A survey is being planned to study agricultural land use in a large region in a developing country. Explain how and why stratification and clustering might be useful in such a survey, and what practical problems they could help to overcome. 
  • A large region in a developing country is very likely to have communication and transport problems which will slow down the sampling process. 
  • There may be maps which will help to identify major areas of agriculture, although most of the region probably grows some crops where it can. 
  • If available, aerial photographs (or satellite images) could be very useful. 
  • Stratification might be by geographically or climatically different parts of the region. 
  • Stratification would ensure that all such sub-regions are studied, which might not be the case under simple random sampling. 
  • Clustering would seek to identify parts of the region that exhibit most of the characteristics of the whole region.
  • Survey work could then be restricted to a few clusters (maybe only one), instead of having to try to cover the whole region.
  • One possibility would be to carry out an initial stratification using all available information and local knowledge, then form clusters within each stratum and choose a sample of these. 
  • An administrative base for the survey could be set up in each stratum, where the work for the chosen clusters would be coordinated.
  • It is likely to be important not to have to visit isolated parts of the region more than once, and to maximise the information obtained from each such visit. 

What is meant by post-stratification (i.e. post-hoc stratification)? Explain the main distinction between this and ordinary stratified sampling. What are the main consequences of using post-stratification? 

  • Post-stratification would be a convenient way of grouping the selected members according to age because that question could be asked without causing refusal to answer. 
  • The process sorts the replies according to any question that gives useful information.
  • But we cannot control the numbers that will arise in each 'stratum' because we have not designed the sample to do so. 
  • Some strata may therefore not be sampled at all, and the results will have precision that can vary substantially between strata. 
  • The method can be useful in fairly large samples, as it should then be similar to proportional allocation in stratified random sampling. 

Define one-stage and two-stage cluster sampling. 

  • If a population (e.g. a geographical region) can be split into clusters (e.g. towns, villages), sampling can be based on these clusters.
  • Either a random sample of clusters is chosen and these are studied completely, which is "one-stage", or a subsample of units may be taken at random for study from each chosen cluster, which is "two-stage". 
  • The sample of clusters could be simple random, stratified random or systematic with random starting point. 


How do cluster sampling and stratified sampling differ, both in their construction and in their use? Give an example of a survey in a country of your choice that uses both stratification and clustering in the sample design. 

  • Stratified sampling splits a population into various groups, according to some specified characteristic such as urban or rural areas, which are expected to be relatively homogeneous within themselves – which clusters might not be. 
  • Stratified sampling requires a complete listing of the whole population, whereas cluster sampling only requires that for the chosen clusters (and of course an initial list of clusters).
  • Cluster sampling is often used for administrative convenience, in limiting the area that is to be covered, and in reducing costs, while stratified sampling aims to give a precise estimate of the population parameters through careful choice of homogeneous strata; cluster sampling might not give any better precision than simple random sampling. 
  • In the UK, the Family Expenditure Survey stratifies into quite large geographical areas (by postcode) and uses cluster sampling to locate different communities within the areas. 

Similarities and differences between stratified sampling and one-stage cluster sampling

Difference between one-stage and two-stage cluster sampling

Conditions under which the cluster sampling is used
Cluster sampling is preferred when
  • No reliable listing of elements is available and it is expensive to prepare it. 
  • Even if the list of elements is available, the location or identification of the units may be difficult. 
  • A necessary condition for the validity of this procedure is that every unit of the population under study must correspond to one and only one unit of the cluster so that the total number of sampling units in the frame may cover all the units of the population under study without any omission or duplication. When this condition is not satisfied, bias is introduced. 

xxx