OPTIMIZING THE SAMPLE SIZE FOR FIXED COST VIA CLUSTER SAMPLING:  A UNIVERSITY OF NAMIBIA CASE STUDY

 

 

Isak Neema

Statistics Department, University of Namibia, Namibia

 

Abstract

In this paper, we have considered the problem of optimizing the sample size when the cost involved in the data collection process is fixed. The target population constitutes the population of students residing in the campus residence (hostels) at the University of Namibia (UNAM) main campus during the 2002 academic year. A stratification sampling method was employed to divide the residents into strata and then apply a systematic sampling technique in drawing a sample from each stratum.

 

Keywords: Optimum sample size, sampling cost, stratification and proportional allocation

 

1. Introduction

At the very beginning of the planning stage of a statistical investigation the question of the magnitude of the sample size

according to Thompson (1992) forms a very critical aspect of the study, especially in the case where the population of interest is stratified. In particular, questions such as how big should the sample size be or how many units of the population should be sampled from each stratum dominate the early stages of the sampling process. However the estimation of the sample size prior to the research undertaking requires the postulation of an effect size that may be related to a correlation, an f-value, or a non-parametric test. In the procedure implemented here the effect is the difference between two means or proportions respectively. Therefore the effect size which can be labelled as 'd' is mostly subjective due to the fact that it presents the difference that the researcher or practitioner wants to discover as well as the difference that one may find to be relevant.

On the other hand, the cost involved in the data collection process could also affect the size of the sample drawn from the strata since in many cases the cost is fixed to a certain amount. As a result, there is a need to incorporate the information concerning the cost of obtaining information from various strata when calculating the total sample size. Therefore in this paper we describe the process involved as

well as the calculation for obtaining the optimum sample size from the population of students residing in the campus residence (hostels) at the University of Namibia (UNAM) main campus during the 2002 academic year. In particular the data used in this paper were obtained from a study by Neema (2003) on the resident student perceptions of on-campus living and study environments at UNAM and their relation to academic performance.

2. The structure of the campus residence

The UNAM campus residence consists of two separate units of blocks or dormitories which are normally referred to as the New and Old hostels. The hostels provide for the students’ housing as well as their catering needs.

2.1 Characteristics of the hostels

 

(i) Old hostels

 

The old hostels are the original student residences erected before the institution became a university. The old hostels consist of blocks A to C. Each of these blocks has 128 bedrooms for students.

(ii) New hostels

The new hostels were primarily constructed to address the increasing number of students seeking accommodation each year. These hostels consist of blocks ranging from A to K, with each of the blocks consisting of about 58 bedrooms. Although these hostels were designed to ease the housing problems on campus, they do not completely eradicate the problem as the lack of accommodation on campus is still one of the problems that students face each year. The total population of students residing on campus is 1095.

2.2 Hostel management

The management of the hostels is overseen by the accommodation management committee and also by the housing committee members (HC). The top management consists of the head of accommodation, a deputy head, and secretaries. The HC members are students who are elected annually by their hostel residents to represent and administer the welfare of their respective blocks.

2.3 Limitation of the original study

The initial study was limited to the students receiving their tertiary education at the University of Namibia main campus, who were residing in the hostels. As pointed out by Neema (2003), first year students were not included in the study since it was decided as a prerequisite that the students should have spent at least a year in the institution hostels, so that they would be able to give informed judgment about the prevailing conditions.

3 Sample and sampling procedure

In order to undertake the sample selection process, a stratified sampling technique was employed to stratify the 14 residence blocks into 14 strata (see Foreman, 1991). The decision to use stratified sampling was influenced by the fact that the student experiences might differ from block to block. Blocks are considered to be homogeneous units. The strata are as follows:

Stratum 1 is A block (OH)

Stratum 2 is B block (OH)

Stratum 3 is C block (OH)

Stratum 4 is A block (NH)

Stratum 5 is B block (NH)

Stratum 6 is C block (NH)

Stratum 7 is D block (NH)

Stratum 8 is E block (NH)

Stratum 9 is F block (NH)

Stratum 10 is G block (NH)

Stratum 11 is H block (NH)

Stratum 12 is I block (NH)

Stratum 13 is J block (NH)

Stratum 14 is K block (NH)

OH signifies old hostel and NH is the new hostel.

The total population of students residing in the hostels in the 2002 academic year was 1095 students, of whom 203 were first year students. However as indicated in section 2.3, since first year students were not part of the survey, this brought the total population of students to be surveyed to 892.

In the context of survey sampling the cost can be divided into three parts: namely total cost, the overhead or sampling cost and the cost per head (Groves, 2004). The total cost constitutes all the expenses that are earmarked for the completion of the study. This covers everything from the planning and execution to the compilation of the report. On the other hand, the overhead or sampling cost is only the cost that is necessary in the operation of conducting the sampling process itself. In survey sampling this may include the cost of questionnaire printing, stationery, logistics such as transport to the survey place, costs of follow up and the overall cost for conducting interviews. The sampling cost is a subset of the total cost.

Similarly the cost per head is mainly the cost involved in conducting interviews, which is a fixed rate allocated per questionnaire. This rate however may differ from one organisation to another, depending on the magnitude and the structure of the questionnaire. The summarised information on the distribution of the students in the respective blocks of residence for the two hostels is displayed in Table 3.1 below.

 

Table 3.1: Distribution of students in the Old hostels blocks

 

Blocks

Number of students in the block

Number of 1st year students in the block

Number of students excluding 1st year students

 

Cost per block

 

  A

   135

   38

   97

  N$120

0.1087

  B

   137

   35

   102

  N$120

0.1144

  C

   132

   42

   90

  N$120

0.1009

 

New hostels

 

  A

   66

   6

   60

  N$90

0.0673

  B

   65

   3

   62

  N$90

0.0695

  C

   64

   3

   61

  N$90

0.0684

  D

   63

   10

   53

  N$90

0.0594

  E

   64

   3

   61

  N$90

0.0684

  F

   62

   10

   52

  N$90

0.0583

  G

   65

   9

   56

  N$90

0.0628

  H

   65

   13

   52

  N$90

0.0583

  I

   60

   11

   49

  N$90

0.0549

  J

   60

   12

   48

  N$90

0.0538

  K

   57

   8

   49

  N$90

0.0549

Total

  1095

  203

  892

N$1350

1.0000

 

From the table,

 

3.1 Sample size allocation

The method of proportional allocation which was proposed by Bowley (1926) and appears in Foreman (1991) was used in determining the sample size to be drawn from each stratum. This implies that

, the number of units sampled from stratum

, is proportional to

, the number in stratum

. If

is the total sample size and

the population size, then Foreman (1991) defines

as

 

However, the information on the cost must be factored into the equation

in order to determine the optimum sample size. Take note that from the equation

the total sample size

is not known, and hence it needs to be estimated. Now if we let

= Total cost

= Sampling cost (Overhead cost)

= Cost for stratum

then the total sample size

can be defined in terms of the cost as defined above as

,

3.2 Optimizing the sample size

In order to optimize the sample size, suppose that the following information on the respective costs was known:

,

Therefore the total sample size

is calculated as

       

.

As a result the optimum sample size is given as

.
In addition the magnitude of the sample size to be drawn from the respective stratum

is therefore determined by substituting the value of

into equation

to obtain the following:

   

   

,

The calculated sub-sample sizes to be drawn proportionally from each stratum are therefore displayed in Table 3.2 below.

Table 3.2: Summary of the sub-sample size to be drawn from the strata

Stratum

Size of the stratum

  1

   97

  50

  2

   102

  53

  3

   90

  46

  4

   60

  31

  5

   62

  32

  6

   61

  31

  7

   53

  27

  8

   61

  31

  9

   52

  27

  10

   56

  29

  11

   52

  27

  12

   49

  25

  13

   48

  25

  14

   49

  25

Total

   892

  459

 

(University of Namibia, 2002)

A simple random sampling technique was then used to draw the sample from each stratum. That is, in each stratum, the sample was selected in such a way that all possible units within the population have an equal probability of being included in the sample. This is achieved by using a random number table generated by Rao et al. (1974).

4 Conclusion

In many cases when calculating the sample size, there is a need to consider the overall cost of sampling. This cost will significantly contribute to the decision about the number of sampling units that will be included in the sample (or the sample size). In our study the optimum sample size taken through the stratification process, when the cost of obtaining information from the population is taken into consideration, was found to be 459. Consequently, the optimum sample size is chosen to be

in order to keep under budget. Hence by using the following equation,

, the sample size can be distributed in a proportional manner to each of the strata.

Acknowledgements

The author is thankful to Professor Joseph D. Petruccelli (Department of Statistics), Worcester Polytechnic Institute, U.S.A, Dr. N.O. Ama formally of the University of Namibia, as well as to the sponsors, the African American Institute (AAI) and the University of Namibia (UNAM), for their relevant input and support.

References

Dalenius, T. (1957). Sampling in SWEDEN, Contributions to the Methods and Theories of Sample Survey Practice. Stockholm: Almquist and Wiksell.

Foreman, E.K. (1991). Survey sampling principles. New York: Marcel Dekker Inc.

Ghosh, S.P. (1963). Optimum stratification with two characters. Annals of Mathematical Statistics, No. 34: 866 – 872.

Groves, R.M. (2004). Survey errors and survey costs. Hoboken, New Jersey: John Wiley and Sons–Interscience.

Neema, I. (2003). Resident student perceptions of on-campus living and study environments at the University of Namibia and their relation to academic performance. (Master’s thesis, Worcester Polytechnic Institute (WPI), USA, Retrieved from http://www.wpi.edu/Pubs/ETD/Available/etd-0429103-184245.

Rea, L.M. and Parker, R.A. (1992). Designing and conducting survey research: A comprehensive guide. San Fransisco CA. : Jossey Bass.

Rijvi, S.E.H., Guptha J.P., and Bhargava, M. (2002). Optimum stratification based on auxiliary variable for compromise allocation. Metron, Vol. LX, n.3-4, 201-216.

Thompson, S.K. (1992). Sampling. New York: John Wiley and Sons Inc.


Warning: Cannot add header information - headers already sent by (output started at /var/ndj/html/classes/file/FileManager.inc.php:153) in /var/ndj/html/classes/template/TemplateManager.inc.php on line 187
Neema