CRITICAL ISSUES IN MEASUREMENT:
MAKING SENSE OF RESPONSE CATEGORIES

View PDF (Download Adobe Reader)
2005-12-20

MEASUREMENT:
Making Sense of Response Categories

The difficulty with many of the approaches researchers use in analyzing survey data is that they have to rely on the fuzzy categories often applied to answer scales. Whether your scale is one that asks how much a respondent agrees or disagrees with a survey item, or it’s one that uses a scale from one to ten, analysis shows that people use categories in a slippery way that often makes it hard to track changes over time.

A Respondent’s View of Adjectival Response Categories
The words below are centered exactly one inch from one another, depicting an evenly spaced scale. For someone to whom this was their cognitive response to the answer scale, the codes applied to each category actually would essentially constitute what we would call a “measure.” Measures are simply items that produce consistently spaced distances. You wouldn’t expect the inches at one end of a yardstick to be closer together or further apart than at the other end of the yardstick, would you? In the same way, survey research assumes each end of an answer scale is evenly spaced. But analysis has shown that in application, different respondents think in terms of varying differences when they use response categories.

1
2
3
4
5
Strongly Agree
Somewhat Agree
Neither Agree nor Disagree
Somewhat Disagree
Strongly Disagree

 

The scale below represents the cognitive distance that a different respondent might assign to the same answer scale. In this case, the respondent find the categories labeled “strongly” to be substantially farther away from the “somewhat” categories than the other categories are from each other. This may be someone for whom a service or experience would have to be absolutely perfect before they’d use the strongly agree category.

1
2
3
4
5
Strongly Agree
Somewhat Agree
Neither Agree nor Disagree
Somewhat Disagree
Strongly Disagree

 

The next scale suggests a respondent for whom disagreement is a category to avoid. In fact, category 3 essentially constitutes agreement for this respondent. It would take a tremendously awful experience to get them to disagree with a positively worded statement.

1
2
3
4
5
Strongly Agree
Somewhat Agree
Neither Agree nor Disagree
Somewhat Disagree
Strongly Disagree

 

Consider that the width of a line of type on this page represents all the possible positions people might think of to put a response category. the far left of the margin represents maximum agreement with the statement If you look down the page across all three representations of the scale, you can see that while the position of the “strongly agree” category stayed consistent, the positions of the other categories shift around, especially on the far end of the scale. A respondent using category 5 in the first example is actually disagreeing less with the item than a respondent using the same category in the subsequent examples (it is further to the left than the other two, suggesting less disagreement).

A Respondent’s View of Numeric Response Scales
People tend to use number scales in similar ways, by either clustering the middle or spreading out the ends: an evenly spaced scale:

1
2
3
4
5
6
7
8
9
10

 

A respondent who thinks of the middle of the scale as being quite low and doesn’t differentiate much at the top end of the scale:

1
2
3
4
5
6
7
8
9
10

 

A respondent who thinks of the top end of the scale as being sufficient to differentiate their experiences. By using only 6-10 to identify their responses, these respondents essentially created their own 5-point scale:

1
2
3
4
5
6
7
8
9
10

 

Because of these cognitive tendencies, researchers nearly always end up clustering the ten responses into just 3 broad categories – high, medium and low – or they’ll report average scores to avoid having to deal with the lack of information that can be pulled from the scales.

How Researcher’s Manage Response Categories
Researchers generally need to apply numbers as codes to the descriptors used in responding to the survey. As we showed in the first example, a researcher may use 1 as a code for “strongly disagree” 2 for “somewhat disagree” 3 for “somewhat agree” and so on.

Now, because numbers are assigned to the categories, users of the data often assume that there is some kind of meaning in the numbers. One of those assumptions is that of measurement, and the other is that of equal distance between categories.

The assumption of measurement means that the codes assigned to the categories have some kind of intrinsic meaning or value simply because they are numbers. However, there is no more meaning to my saying “disagree” is equal to 6 than saying “disagree” is equal to 97. The numbers are an arbitrary assignment, primarily used by convention but not with any real meaning.

If researchers assume measurement is a part of the codes, they may start to treat the responses to the survey questions as if they are scores, and not simply indicators of a response category. One of the inaccuracies such an assumption may lead to is the tendency toward is reporting an average of these so-called scores. So for a five category response scale, the average of the numeric codes may be 3.2, or 2.8 and so on. If codes are used consistently between and across items, these averages may be nominally useful in making comparisons among items. However, they not only misrepresent the information (attitudes and ratings are not like measuring people’s height or weight, for example) they miss some of the richest information available from categorical data – the distribution of the data.

Take this example. In each case the average is 3.0, but how would you feel about what you were learning in each case?

ARN’s use of Rasch analysis in evaluating response scales and survey items provides customers with true measurement. Rasch analysis can account for the variety of ways people use response categories and creates numeric values that represent a true score for the survey items. These scores are more exacting than the simple codes.

“Rasch must be credited with an outstanding contribution to one of the two central psychometric problems, the achievement of nonarbitrary measures.” – Jane Loevinger (1965)