What Makes a Good Test?

There are three basic elements to look for when judging the quality of a psychological test — reliability, validity, and standardization.

RELIABILITY is a measure of the test’s consistency. A useful test is consistent over time. As an analogy, think of a bathroom scale. If it gives you one weight the first time you step on it, and a different weight when you step on it a moment later, it is not reliable. Similarly, if an IQ test yields a score of 95 for an individual today and 130 next week, it is not reliable. Reliability also can be a measure of a test’s internal consistency. All of the items (questions) on a test should be measuring the same thing — from a statistical standpoint, the items should correlate with each other. Good tests have reliability coefficients which range from a low of .65 to above .90 (the theoretical maximum is 1.00).

VALIDITY is a measure of a test’s usefulness. Scores on the test should be related to some other behavior, reflective of personality, ability, or interest. For instance, a person who scores high on an IQ test would be expected to do well in school or on jobs requiring intelligence. A person who scores high on a scale of depression should be diagnosed as depressed by mental health professionals who assess him. A validity coefficient reflects the degree to which such relationships exist. Most tests have validity coefficients (correlations) of up to .30 with “real world” behavior. This is not a high correlation and emphasizes the need to use tests in conjunction with other information. Relatively low correlations mean that some people may score high on a scale of schizophrenia without being schizophrenic and some people may score high on an IQ test and yet not do well in school. Correlations are high as .50 are seen between IQ and academic performance.

STANDARDIZATION is the process of trying out the test on a group of people to see the scores which are typically obtained. In this way, any test taker can make sense of his or her score by comparing it to typical scores. This standardization provides a mean (average) and standard deviation (spread) relative to a certain group. When an individual takes the test, she can determine how far above or below the average her score is, relative to the normative group. When evaluating a test, it is very important to determine how the normative group was selected. For instance, if everyone in the normative group took the test by logging into a website, you are probably being compared to a group which is very different from the general population.