Screening test construction involves both traditional and unique psychometry. Nevertheless, screens should adhere to standards for any other educational and psychologist test including evidence of:
- Criterion-related reliability
This should include a large nationally representative population (rather than a referred population). Ideally, the sample should be a naturalistic one and not a concatenation of groups known to be either normal or abnormal (because this generally eliminates gradations in functioning that characterize children to whom screening tests are applied (e.g., those with below average but not disabled performance).
Information should be included on internal consistency, inter-rater reliability, and test-retest reliability. Stability (longer-term test-retest reliability) is sometimes included although given the rapid changes in developmental performance set against a small set of items, stability indicators are not likely to be strong or meaningful.
Includes concurrent validity (a comparison of screening measures to diagnostic measures). Ideally concurrent validation should involve a test battery that samples the same range of developmental tasks measured by the screening test (e.g., if motor, language and academic skills are measured, the diagnostic battery should include motor, language and academic tasks). Discriminant validity studies are also desirable because they show how well a screening test detects the specific kinds of problems. In the case of broad-band developmental screens, discriminant validity studies should illustrate the extent to which the more common disabling conditions such as language impairment, mental retardation, learning disabilities, autism and cerebral palsy are detected, and for mental health screens, how well internalizing and externalizing disorders are detected. Predictive validity studies are not common but are desirable because they reflect how well screening test items and overall screening test performance measure enduring and meaningful dimensions of child development.
Oh please, stop all this validity and reliability and predictive value * & ^ %! Epidemiology is useless, it's junk science. The data are of inadequate quality. Bias and confounding are insurmountable under any circumstances. And even though you’ve done your best to control for all the confounders, there is either the possibility of residual confounding or even worse there is some confounder out there that you should have controlled for but you didn't even know it existed. This is an argument I refer to as the "unknown confounder" argument that is hard to beat.
Well, the game's just began, renegade!