To confirm interrater reliability using blinded evaluation of a skills-assessment instrument to assess the surgical performance of resident and fellow trainees performing pediatric direct laryngoscopy and rigid bronchoscopy in simulated models.
Prospective, paired, blinded observational validation study.
Paired observers from multiple institutions simultaneously evaluated residents and fellows who were performing surgery in an animal laboratory or using high-fidelity manikins. The evaluators had no previous affiliation with the residents and fellows and did not know their year of training.
One- and 2-page versions of an objective structured assessment of technical skills (OSATS) assessment instrument composed of global and a task-specific surgical items were used to evaluate surgical performance.
Fifty-two evaluations were completed by 17 attending evaluators. The instrument agreement for the 2-page assessment was 71.4% when measured as a binary variable (ie, competent vs not competent) (κ = 0.38; P = .08). Evaluation as a continuous variable revealed a 42.9% percentage agreement (κ = 0.18; P = .14). The intraclass correlation was 0.53, considered substantial/good interrater reliability (69% reliable). For the 1-page instrument, agreement was 77.4% when measured as a binary variable (κ = 0.53, P = .0015). Agreement when evaluated as a continuous measure was 71.0% (κ = 0.54, P < .001). The intraclass correlation was 0.73, considered high interrater reliability (85% reliable).
The OSATS assessment instrument is an effective tool for evaluating surgical performance among trainees with acceptable interrater reliability in a simulator setting. Reliability was good for both the 1- and 2-page OSATS checklists, and both serve as excellent tools to provide immediate formative feedback on operational competency.