CRITICAL ISSUES IN MEASUREMENT:
CONSISTENT MEASURES

View PDF (Download Adobe Reader)
2005-01-05

MEASUREMENT:
Useable, Consistent Measures to Assess Effectiveness and Change

Measurement is the process of transforming observable events into quantitative information. In most instances, this process consists of assigning numbers to behaviors, attitudes, or occurrences, a course of action that allows the social scientist to summarize, analyze, and explain natural phenomena.

Requirements of ‘True’ Measurement
True measurement must satisfy two conditions to be useful. First, the quantification of events must be ordinal. In other words, it must be possible to order the events according to their relative amounts. For example, if an event can occur at different frequencies for different portions of the population, it could be said to occur, rarely, sometimes or often. Because “rarely” is less than “sometimes” and “sometimes” is less than “often,” the list can be ordered from low to high, making it an ordinal scale.

Non-ordinal information can be observed and recorded, but does not constitute measurement in this context. For example, if region is an element of an analysis, we may observe that events occur either in the Northeast or the Southwest, but one region cannot be called “more” and the other “less.” In this case, the items cannot be ordered from low to high. They are names only, and not indicative of quantities.

Secondly, the enumeration of events must be additive, in that the resulting indicators must be consistent and comparable, both mathematically and conceptually. An example of an additive measure is one where subjects report seeing a doctor three times in a year and other subjects report seeing a doctor six times in a year. Each visit constitutes a single addition to our count of doctor visits, and a person with three doctor visits has exactly half as many as a person with six.

The Need for Measurement in Scientific Research

heliIn social scientific research, measurement is of paramount importance, and is fact, the fundamental necessity of any social scientific endeavor. Indeed, Lord Kelvin notes that, “when you cannot measure, your knowledge is meager and unsatisfactory.” One could very well justify the argument that your knowledge is also uninformed and potentially counterproductive.

Unfortunately, many social scientific researchers look upon the use of rigorous research methods with misunderstanding, disdain, or annoyance because the use of stringent methods may delay arriving at findings, cost more to execute, or require additional expertise. In the social world, we lack many of the more precise measuring instruments of the physical, biological, or economic worlds. To wit, there are more than 12 different IQ tests that purport to measure the same phenomenon. We are therefore left to compare apples to oranges, or even worse, comparing inconsistently defined inch marks on a ruler. Further still, is the problem of integrating empirical data to the underlying theory.nfortunat

Building Better Measures
The challenge, then, for social scientific researchers is to formulate and commit to the use of measurement that is consistent, valid, and produces understandable and useable results. What is needed is more systematic attention to the processes involved in social classification in order to tackle some of the inconsistencies and inadequacies that result from the plethora of inadequate social measures in use (Bulmer, 2001). Social researchers already employ many aspects of an ideal measurement system, but a rigorous, integrated process has yet to be fully accepted by the social scientific community.

The process of developing a useable measure is elaborate, yet often neglected. In social scientific research, a good survey instrument is of utmost importance. It is the yardstick by which we measure our social phenomena. Without a consistent, accurate instrument, we are simply using a stick with moving inch marks.

Developing a sound measurement instrument is an involved, iterative process. While the construction of good survey items is both an art and a science, this vital step is often misunderstood and overlooked. The researcher must consider project goals, respondent needs, comprehension levels, and how difficult questions will be for respondents to answer when constructing survey questions. Once a pool of potential survey items has been constructed, having them reviewed by colleagues or experts in the field of interest is very beneficial. Item revisions are then made, if necessary, based on feedback and theory. Finally, pilot testing of the items is completed on a small sample from the population under investigation. Based on the findings of the pilot testing, item revision may need to be done again.

Traditional Techniques for Constructing a Measurecluster
Traditional psychometric approaches to research insist that a measure have validity and reliability. An analysis of validity assesses whether or not the instrument is measuring what we think it should be measuring. A reliable instrument is one that consistently and predictably measures the same phenomenon. See the figure below for a demonstration of the relationship and distinction between validity and reliability.

More Sophisticated Approaches to Constructing a Measure
Improved psychometric techniques have allowed social scientists to use more sophisticated approaches in the assessment of whether or not a measure is working well. These approaches, under the umbrella term of ‘item response theory,’ investigate the individual items that make up the instrument, and identify which are reliable, valid, redundant, or ineffective. Rasch modeling, for example, produces ‘fit statistics,’ which allow the researcher to make scientific decisions about the accuracy and effectiveness of each individual item, as well as the instrument as a whole.

Using Appropriate and Effective Measurement in Research
Once a psychometrically sound measurement tool is developed, it is only useful if it is appropriately employed (much like the tape measure cartoon on the first page). Standardized administration protocol of the survey instrument is crucial. The SATs, for instance, require that test administrators use identical scripts and behaviors so that the results across different test administrations are truly comparable. Furthermore, also like the SATs, your newly developed instrument must be continually assessed and refined. We don’t use the Wright brothers’ airplane design anymore, and for good reason.

Researchers at ARN insist on following these stringent processes for constructing, testing, and administering survey instruments. Sound measurement is a core tenet of social scientific research, and the social scientists at ARN firmly believe that the quality of output depends almost entirely on the quality of input. As author JoAnn Hackos, national expert on quality and outcomes, proclaims, “Quality cannot simply be measured at the end of the project. Quality happens as the result of a well-managed, well-organized process.”