Criterion-related validity
This is the "acid test" of screening instrumentation and takes a unique form in screening test construction. Generally referred to as accuracy indices, criterion-related validity for screening tests include the following concepts:
- Sensitivity
- Specificity
- Positive predictive value
- Negative predictive value
- Hit rate
Sensitivity
In a random sample of children, if all were administered a diagnostic battery and categorized into the presence or absence of disabilities (e.g., by viewing eligibility for services under IDEA), some would be found to have disabilities. If screening tests were then given to the same group, ideally, all children with disabilities would score below cutoffs on the screen and thus be identified as needing referrals for diagnostic workups and special services. In reality, detection of disabilities is imperfect due to behavioral noncompliance, psychosocial malleability and age-related manifestations, and the brevity of screens. Thus, sensitivity, sometimes called co-positivity, is percentage of children with true problems correctly identified by a screening test (e.g., by failing, abnormal, or positive results). Ideally, 70% to 80% of those with difficulties should be identified. While this figure may seem low, many tests fail to attain this level of accuracy and none attain sensitivity that is substantially higher. More importantly, repeated screening is thought to improve detection rates over time.
Specificity
To continue the above example, most children in a random sample who are given diagnostic tests would be found to have normal development. Screening tests given to the same group would ideally identify all the children with typical development as normal (e.g., above cutoffs, passing or negative scores). Reality differs of course, and so specificity (or co-negativity) indicates the percentage of children without disabilities correctly identified, usually by passing or above cutoff scores on the screen. At least 70% to 80% of those with normal development should be correctly identified. Still, because there are many more children developing normally than not, specificity closer to 80% or higher is desirable.
Other Accuracy Indicators
Screening test research sometimes includes information on other accuracy indicators.
- Positive predictive value
- Negative predictive value
- Hit rate
Positive predictive value
Positive predictive value answers the question, to what extent does a suboptimal screening test score, reflect a true problem? If all children performing poorly on a screening test are pooled and administered diagnostic tests, at least a few will perform in the broad range of normal (because of the limits of specificity) and the rest will have disabilities. For example, if 9 out of 10 children with failing scores on screening tests are later found to have developmental diagnoses, the test’s positive predictive value is 9/10 or 90% meaning that for any screening test failure, there would be an 90% chance of a true developmental problem. In reality, positive predictive value is rarely 90% with values ranging from 30% to 50% being far more common (i.e., one of every two or three referrals will render a diagnosis. While this may seem troublingly inaccurate, the costs of over-referral (approximately $1000 for a comprehensive diagnostic evaluation) are substantially less than the cost of under-treatment, (a life-time loss to the child and society of more than $100,000 if needed early intervention is not offered).(Glascoe, Foster & Wolraich, 1997; Barnett & Escobar, 1990).
Over-referral
Also reassuring are results from a recent study showing that approximately 70% of children over-referred on developmental screening tests have numerous psychosocial risk factors and score on diagnostic measures of intelligence, language, and academic achievement, well below the 25th percentile (the point below which regular classroom instruction is less than optimally effective). Glascoe, 2001). This suggests that almost all children performing poorly on screening tests need at least some additional scrutiny and intervention and that a range of responses is desirable (e.g., Head Start, Title I, parent training, as well as special education and related services).
Negative predictive value
Negative predictive value is somewhat less commonly presented but involves determining the degree to which an optimal (above cutoff, passing or not-at-risk) score reflects typical or non-delayed development. For example, if 95 out of 100 children with passing scores on screening tests are later found on diagnostic testing to have typical development, the test’s negative predictive value is 95/100 or 95% meaning that for any passing score, there would be an 95% chance of a no developmental problem. Some measures also report over and under-referral rates. This reflects the proportion of the entire sample who should have been referred but were not correctly identified (under-referral rates) or should not have been referred (over-referral rates).
Hit rate
Hit rates are occasionally reported and are simply the total number of children for whom a screening test gave accurate information i.e.,co-positives and co-negatives are added together and then divided by the entire sample (co-positives + co-negatives+ false-positives + false negatives). Hit rates are an extremely misleading statistic and should not be used as an indicator of test accuracy. In the example shown in Figure 1, the hit rate is (70+16)/100 = 86%. Because there are far more co-negatives, specificity carries excessive weight in the computation of hit rates and as can be seen, the hit-rate is closer to the specificity index than to the sensitivity index. If in the above example, sensitivity were only 50% (10 children with disabilities correctly identified and 10 under-detected), the hit-rate would still remain deceptively attractive [(70+10)/100 = 80%] and mask serious flaws in accuracy.
Utility
Less of a psychometric construct and more a function of practical attributes, screens should be studied for their usefulness to diverse professionals in varied settings. Such studies often address length of administration and scoring, acceptability to parents and children, readability, amount of training required, cost of administration in terms of professional time to deliver the measure, score and interpret it, and descriptions of other amenities helpful to specific applications (e.g., ability to aggregate results for program evaluation, availability of growth indicators for use in plotting progress over time, etc.).
Prescreening
In an effort to conserve educational and health care dollars, it is obviously desirable to select measures with a high degree of positive predictive value. Given the subtlety and gradations of developmental outcomes, high positive predictive value remains elusive. Nevertheless, positive predictive value by administering a second screening test or by using part of a screen (e.g., a subtest) as a prescreen. Prescreening tests are extremely brief measures with a high degree of sensitivity but limited specificity. Prescreens are administered routinely and are followed by screening tests only when children fail the prescreen. Although prescreening can simply compound error and lead to under-referrals, accurate prescreening improves detection rates, and saves considerable time, since prescreens reduce, often by one-half to two-thirds, the numbers of children requiring complete screening.