Sampling Procedure
A sampling strategy comprises two main elements: a sample design describing the scheme by which the sample of survey units is selected, and the estimators by which survey results can be computed from sample data. The two elements are usually closely interrelated, and determine the quality or reliability of survey estimates. In this section both elements will be described briefly. A more detailed description is provided in a separate working paper (Abu Hassan and Tamsfoss 1995).
The sample design adopted is a stratified three stage design for selection of households to be surveyed. At the first stage a sample of localities was selected. The sample localities have been subdivided into cells of approximately equal size, and a number of cells were selected randomly from each of the sample localities at the second stage. At the third and final stage, a sample of households was selected from the sample cells. For all the demographic variables included in the survey, records were taken for all members of the sample household.
Although a two-stage design would have been preferable, the present, more complex one is partly an outcome of limited availability of data on which sample designing usually is based, specifically data on the population size of various small area units, e.g. cells. The sample designing was undertaken in parallel with the updating of maps for the localities in the West Bank and Gaza Strip during the winter and spring 1995 - another ongoing PCBS project. Due to the limited time available, the design had to be completed before a complete set of updated locality maps was ready, implying the small area information needed was available for only a limited number of localities. However, the map updating was coordinated with the sample designing in such a way that once the first stage sample of localities was selected, mapping of these localities was given highest priority, thus offering an opportunity to subdivide sample localities into cells with a known measure of (population) size.
The present design is based on listings of localities provided by Barghouti and Daibas (1993) for the West Bank, and Abdeen and Abu-Libdeh (1993) for the Gaza Strip. Even though the population figures are rough estimates as per 1992-93, produced mainly by questioning local administration informants (e.g. Mukhtars) about the number of families in the locality, or projected estimates, they appear to be fairly well attuned with other sources (e.g. Benvenisti and Khayat 1988). Furthermore, the listings applied as a frame comprise more localities than previous ones, and should thus be more complete. However, the coverage may still be less than - although close to - 100% in terms of areas.
The first stage comprises the assigning of localities (as listed by Barghouti and Daibas 1993; Abdeen and Abu Libdeh 1993) to be the Primary Sampling Units (PSUs), the stratification of the PSUs, and the selection of sample PSUs from each stratum. The stratification is a subdivision of the PSUs according to district, administrative status of the locality, and estimated population (households) size. The PSUs were selected independently for each stratum, and with probability proportionate to estimated population size. In the Gaza Strip all localities were selected. The same applies to the district capitals, municipal localities and refugee camps in the West Bank, except in two strata in A Ramallah district. Whenever all PSUs in a stratum are selected, the design is a two stage one, and each single PSU is to be regarded as a separate substratum. The two stage design also applies for several of the small villages (single cell localities). As a matter of fact, the major parts of the sample is selected in two stages only, contributing favorably to smaller sampling error as compared to a strict three stage design.
The second stage subdivision of sample PSUs into cells (or Secondary Sampling Units - SSUs) was done on maps indicating location of buildings and a rough estimate of the number of dwelling in each building. Thus, for each sample PSU or locality as a whole, there are two size measures available; the estimated number of households, and the roughly estimated number of dwelling units. Although these sets of measures proved to be positively correlated, they departed significantly in most cases. However, for the cells, the number of dwelling units were the only measure of size available. Therefore, when selecting the sample cells from each sample PSU with probability proportionate to size, the size in terms of dwelling units had to be applied, i.e. a conceptually different size measure than the one applied at the first stage of selection (households).
For each sample cell the population has been listed by enumeration of buildings (map reference), and dwelling units. It should be noted that the number of dwelling units in each building was assessed by listers from outside no thorough inquiries were made as to whether they were inhabited or not. It was thus expected that errors would occur rather frequently - a problem which is to be evaluated separately on the basis of data collected during the survey. The listing of dwelling units constitutes the Sampling Frame from which the household sample was selected at a third stage by systematic sampling.
The planned sample size was 15,000 households. However, due to the sampling frame imperfections which were envisaged (several non-eligible units included), oversampling was carried out at a rate of approximately 30%, i.e. the gross sample selected at the outset comprised around 20,000 dwelling units.
The sampling design and sample allocation yield a household sample with varying inclusion probabilities. In order to have unbiased results, it is thus recommended that all estimates are based on weighed observations, the weights being the inverse of the respective inclusion probabilities.
All households in a cell have the same probability of being selected, however varying from cell to cell. It should be noted that non-eligible dwelling units (i.e. units which are not inhabited by households) have been removed from the sample. This does not affect the inclusion probabilities or the weights . The actual values of the weights are in the range 0.3 to 3.0. However, 80 % of the weights are in the range 0.7 to 1.4. Only a very few (small) cells are near the extremes.
Since the sampling design is a complex multi-stage one, variance must be calculated with other methods than those applicable to simple random sampling. In order to carry out the calculations, the software CENVAR (US Bureau of the Census 1993) has been used.