Utah CODES 
      Crash Outcome Data Evaluation System
 

 
 

 

 

 

 

 

 

 

 

 

 

 


What is Probabilistic Linkage?

Introduction:
 Ever wondered what effect seatbelt usage has on the amount of money spent for hospital admission for crash victims? Or perhaps, does size appropriate splinting in the pre-hospital setting reduce hospital admissions, lengths of stay, and charges? In most cases the answers for these problems are rarely contained in one database. A researcher must therefore start from scratch, building a new database where patients are asked if they were wearing a seatbelt when they are admitted, in the first case, or follow a patient from the point of splinting, to the emergency department, and finally determine if the patient was admitted. One can imagine that building these databases can be expensive in terms of time and money. However, if one can find existing databases, such as a motor vehicle crash database collected by police agencies and a hospital discharge database collected by the hospital association, then probabilistic record linkage may be the tool for you.

Purpose: The purpose of probabilistic record linkage is to combine multiple databases into one extensive database for analysis.

Description:  Probabilistic record linkage is accomplished by comparing data fields in two files, such as birth date or gender.  Comparisons of numerous data fields lead to a judgment that two records refer to the same patient (and should be linked).  This judgment is based upon the cumulative weight of agreement and disagreement among field values. The amount of information contained is related to a field’s impact on the judgment process.  For instance, agreement of the gender field alone would not determine that two records refer to the same patient, but agreement on Social Security Number nearly guarantees that two records refer to the same individual.  By assigning log-likelihood ratios to field comparisons, it is possible to computerize the judgment process.  Let mi equal the probability the ith field agrees, given that the records are known to refer to the same person (a true match).  Let ui equal the probability that the ith field will agree by chance among records known to not match.  Then for a given pair of records, if field i agrees, the agreement weight is wi= log2(mi/ui).  If field i disagrees, a disagreement weight wi = log2((1-mi)/(1-ui)) is assigned.  The composite weight for a record pair will be the sum of agreement and disagreement weights for all fields available for comparison.

Examining composite weights for all record pairings allows researchers to assign upper and lower critical values.  All pairings with weights higher than the upper critical value are considered matches, while those with weights below the lower critical value are not matched.  Pairings with weights between the two critical values are reviewed manually and assigned to either the matched or unmatched sets.

To improve computation time, both files are sorted on one or several data fields.  Comparisons are then made only on records that agree on the sorted fields, which are called blocking variables.  If an error occurs in a data field that is used for blocking then records that should match will not be compared.  This is because when the file is blocked, only records that agree on the blocking variable(s) are compared.  To account for this problem, records that fail to match are subjected to subsequent attempts to match the files after re-blocking with different data fields.

Examples: Probabilistic record linkage has been used on a national level to look at the effects of seatbelts and motorcycle helmets on medical outcomes1. We have used probabilistic linkage in our center to study a variety of topics including: drivers with medical conditions2, effect of wearing only a shoulder strap in a motor vehicle crash3, older and teenage drivers as well as children involved in motor vehicle crashes4-7, pediatric utilization of pre-hospital emergency medical services8 and injuries sustained in shop classes at public schools9.

Bibliography:

  1. Johnson SW, Walker J. The Crash Outcome Data Evaluation System (CODES). Washington DC: National Highway Traffic Safety Adminstration; 1996.
  2. Diller E, Cook LJ, Leonard DR, Dean M, Reading JM, Vernon DD.  Evaluating Drivers Licensed with Medical Conditions Licensed with Medical Conditions in Utah, 1992 – 1996. National Highway Traffic Safety Administration 1999 June;Report No. DOT HS 809 023.
  3. Knight S, Cook LJ, Nechodom PJ, Olson LM, Reading JC, Dean JM. Improper Use of Shoulder Straps in Motor Vehicle Crashes: A Statewide Analysis of Restraint Efficacy. In Press; Accident Analysis and Prevention.
  4. Cook LJ, Knight S, Olson LM, Nechodom PJ, Dean JM.  Crash Characteristics and Medical Outcomes of Older Drivers in Motor Vehicle Crashes in Utah, 1992 – 1995. Annals of Emergency Medicine 2000;35(6):585-591.
  5. Cvijanovich NZ, Cook LJ, Nechodom PJ, Dean JM. A Population-Based Study of Teenage Drivers: 1992-1996.  43rd Annual Proceedings Association for the Advancement of Automotive Medicine 1999;175-186.
  6. Berg M, Cook LJ, Corneli H, Vernon D, Dean JM.  Effect of Seating Position and Restraint Use on Injuries to Children in Motor Vehicle Crashes.   Pediatrics 2000;105(4):831-835.
  7. Corneli HM, Cook LJ, Dean JM. Adults and Children in severe motor vehicle crashes: A Matched-Pairs   Study. In Press; Annals of Emergency Medicine
  8. Suruda AJ, Vernon DD, Reading J, Cook LJ, Nechodom PJ, Leonard D, Dean JM.  Pre-Hospital Emergency Medical Services: A Population-Based Study of Pediatric Utilization.  Injury Prevention 1999;5(4):294-297.
  9. Knight S, Junkins EP, Lightfoot AC, Cazier C, Olson LM, Injuries in School Shop Classes. Pediatrics 2000;106(1):10-13.

 

Home   Staff   Publications   Linkage   Data   Traffic Safety   TRCC   Other Resources

Utah CODES (Crash Outcome Data Evaluation System)

 615 Arapeen Dr, Suite 202 Salt Lake City, UT 84108-1226 
Ph: (801) 581-6410, Fax: (801) 581-8686
General Information: larry.cook@hsc.utah.edu Website: IICRC Website