|
|
What is Probabilistic Linkage?
Introduction: Ever wondered what effect
seatbelt usage has on the amount of money spent for hospital admission for
crash victims? Or perhaps, does size appropriate splinting in the
pre-hospital setting reduce hospital admissions, lengths of stay, and
charges? In most cases the answers for these problems are rarely contained
in one database. A researcher must therefore start from scratch, building a
new database where patients are asked if they were wearing a seatbelt when
they are admitted, in the first case, or follow a patient from the point of
splinting, to the emergency department, and finally determine if the patient
was admitted. One can imagine that building these databases can be expensive
in terms of time and money. However, if one can find existing databases,
such as a motor vehicle crash database collected by police agencies and a
hospital discharge database collected by the hospital association, then
probabilistic record linkage may be the tool for you.
Purpose: The purpose of probabilistic record
linkage is to combine multiple databases into one extensive database for
analysis.
Description: Probabilistic record
linkage is accomplished by comparing data fields in two files, such as birth
date or gender. Comparisons of numerous data fields lead to a judgment that
two records refer to the same patient (and should be linked). This judgment
is based upon the cumulative weight of agreement and disagreement among
field values. The amount of information contained is related to a field’s
impact on the judgment process. For instance, agreement of the gender field
alone would not determine that two records refer to the same patient, but
agreement on Social Security Number nearly guarantees that two records refer
to the same individual. By assigning log-likelihood ratios to field
comparisons, it is possible to computerize the judgment process. Let mi
equal the probability the ith field agrees, given that
the records are known to refer to the same person (a true match). Let ui
equal the probability that the ith field will agree by
chance among records known to not match. Then for a given pair of records,
if field i agrees, the agreement weight is wi= log2(mi/ui).
If field i disagrees, a disagreement weight wi =
log2((1-mi)/(1-ui)) is
assigned. The composite weight for a record pair will be the sum of
agreement and disagreement weights for all fields available for comparison.
Examining composite weights for all record pairings
allows researchers to assign upper and lower critical values. All pairings
with weights higher than the upper critical value are considered matches,
while those with weights below the lower critical value are not matched.
Pairings with weights between the two critical values are reviewed manually
and assigned to either the matched or unmatched sets.
To improve computation time, both files are sorted on
one or several data fields. Comparisons are then made only on records that
agree on the sorted fields, which are called blocking variables. If an
error occurs in a data field that is used for blocking then records that
should match will not be compared. This is because when the file is
blocked, only records that agree on the blocking variable(s) are compared.
To account for this problem, records that fail to match are subjected to
subsequent attempts to match the files after re-blocking with different data
fields.
Examples: Probabilistic record linkage has been
used on a national level to look at the effects of seatbelts and motorcycle
helmets on medical outcomes1. We have used probabilistic linkage
in our center to study a variety of topics including: drivers with medical
conditions2, effect of wearing only a shoulder strap in a motor
vehicle crash3, older and teenage drivers as well as children
involved in motor vehicle crashes4-7, pediatric utilization of
pre-hospital emergency medical services8 and injuries sustained
in shop classes at public schools9.
Bibliography:
- Johnson SW, Walker J. The Crash Outcome Data
Evaluation System (CODES). Washington DC: National Highway Traffic Safety
Adminstration; 1996.
- Diller E, Cook LJ, Leonard DR, Dean M, Reading JM,
Vernon DD. Evaluating Drivers Licensed with Medical Conditions Licensed
with Medical Conditions in Utah, 1992 – 1996. National Highway Traffic
Safety Administration 1999 June;Report No. DOT HS 809 023.
- Knight S, Cook LJ, Nechodom PJ, Olson LM, Reading JC,
Dean JM. Improper Use of Shoulder Straps in Motor Vehicle Crashes: A
Statewide Analysis of Restraint Efficacy. In Press; Accident Analysis
and Prevention.
- Cook LJ, Knight S, Olson LM, Nechodom PJ, Dean JM.
Crash Characteristics and Medical Outcomes of Older Drivers in Motor
Vehicle Crashes in Utah, 1992 – 1995. Annals of Emergency Medicine
2000;35(6):585-591.
- Cvijanovich NZ, Cook LJ, Nechodom PJ, Dean JM. A
Population-Based Study of Teenage Drivers: 1992-1996. 43rd
Annual Proceedings Association for the Advancement of Automotive Medicine
1999;175-186.
- Berg
M, Cook LJ, Corneli H, Vernon D, Dean JM. Effect of Seating Position and
Restraint Use on Injuries to Children in Motor Vehicle Crashes.
Pediatrics 2000;105(4):831-835.
- Corneli
HM, Cook LJ, Dean JM. Adults and Children in severe motor vehicle crashes: A
Matched-Pairs Study. In Press; Annals of Emergency Medicine
- Suruda AJ, Vernon DD, Reading J, Cook LJ, Nechodom
PJ, Leonard D, Dean JM. Pre-Hospital Emergency Medical Services: A
Population-Based Study of Pediatric Utilization. Injury Prevention
1999;5(4):294-297.
- Knight S, Junkins EP, Lightfoot AC, Cazier C, Olson
LM, Injuries in School Shop Classes. Pediatrics 2000;106(1):10-13.
|