There are three basic elements to look for when judging the
quality of a psychological test -- reliability, validity, and
standardization.
RELIABILITY is a measure of the test's consistency. A useful
test is consistent over time. As an analogy, think of a bathroom
scale. If it gives you one weight the first time you step on it,
and a different weight when you step on it a moment later, it is
not reliable. Similarly, if an IQ test yields a score of 95 for
an individual today and 130 next week, it is not reliable.
Reliability also can be a measure of a test's internal
consistency. All of the items (questions) on a test should be
measuring the same thing -- from a statistical standpoint, the
items should correlate with each other. Good tests have
reliability coefficients which range from a low of .65 to above
.90 (the theoretical maximum is 1.00).
VALIDITY is a measure of a test's usefulness. Scores on the
test should be related to some other behavior, reflective of
personality, ability, or interest. For instance, a person who
scores high on an IQ test would be expected to do well in school
or on jobs requiring intelligence. A person who scores high on a
scale of depression should be diagnosed as depressed by mental
health professionals who assess him. A validity coefficient
reflects the degree to which such relationships exist. Most tests
have validity coefficients (correlations) of up to .30 with
"real world" behavior. This is not a high correlation,
and emphasizes the need to use tests in conjunction with other
information. Relatively low correlations mean that some people
may score high on a scale of schizophrenia without being
schizophrenic and some people may score high on an IQ test and
yet not do well in school. Correlations are high as .50 are seen
between IQ and academic performance.
STANDARDIZATION is the process of trying out the test on a
group of people to see the scores which are typically obtained.
In this way, any test taker can make sense of his or her score by
comparing it to typical scores. This standardization provides a
mean (average) and standard deviation (spread) relative to a
certain group. When an individual takes the test, she can
determine how far above or below the average her score is,
relative to the normative group. When evaluating a test, it is
very important to determine how the normative group was selected.
For instance, if everyone in the normative group took the test by
logging into a website, you are probably being compared to a
group which is very different from the general population.