Advertisement

Development and validation of a procedure-specific assessment tool for hands-on surgical training in congenital heart surgery

  • Nabil Hussein
    Affiliations
    Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

    Division of Cardiovascular Surgery, Department of Surgery, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
  • Andrew Lim
    Affiliations
    Center for Image-Guided Innovation and Therapeutic Intervention, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
  • Osami Honjo
    Affiliations
    Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

    Division of Cardiovascular Surgery, Department of Surgery, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
  • Christoph Haller
    Affiliations
    Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

    Division of Cardiovascular Surgery, Department of Surgery, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
  • John G. Coles
    Affiliations
    Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

    Division of Cardiovascular Surgery, Department of Surgery, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
  • Glen Van Arsdell
    Affiliations
    Department of Surgery, David Geffen School of Medicine at UCLA and UCLA Mattel Children's Hospital, Los Angeles, Calif
    Search for articles by this author
  • Shi-Joon Yoo
    Correspondence
    Address for reprints: Shi-Joon Yoo, MD, Department of Diagnostic Imaging and Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, M5G1X8 Canada.
    Affiliations
    Division of Cardiology, Department of Paediatrics, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada

    Department of Diagnostic Imaging, The Hospital for Sick Children, University of Toronto, Toronto, Ontario, Canada
    Search for articles by this author
Open ArchivePublished:December 23, 2019DOI:https://doi.org/10.1016/j.jtcvs.2019.11.130

      Abstract

      Background

      Hands-on surgical simulation has been sought to address training limitations within congenital heart surgery (CHS). However, there is a need for objective assessment methods to measure surgeons’ performance to justify its global adoption. This study aimed to validate a procedure-specific assessment tool for the simulation of the arterial switch operation on 3D-printed models and to evaluate the consistency of scoring among evaluators with different levels of experience in CHS.

      Methods

      Five “expert” and 5 “junior” surgeons performed the arterial switch procedure on 3D-printed models with transposition of the great arteries during 2 hands-on surgical training courses. Their performance was retrospectively assessed by 9 evaluators with varying experience in CHS (staff surgeons, resident surgeons, and non-MD raters). Assessments were done using 2 assessment tools: the Hands-On Surgical Training–Congenital Heart Surgery (HOST-CHS) assessment tool and the global rating scale (GRS).

      Results

      The HOST-CHS tool showed a higher interrater and intrarater reliability compared with the GRS. Total scores for expert surgeons were highly consistent across all evaluators. Non-MD raters’ total scores for junior surgeons were slightly higher than those of residents and staff evaluators. All grades of evaluator were able to discriminate between junior and expert surgeons.

      Conclusions

      This study demonstrates the development and validation of an objective, procedure-specific assessment tool for the arterial switch operation with consistency among evaluators with different experience. There is now a platform for quantifying and accurately evaluating performance, which will be highly beneficial in training and developing the next generation of congenital heart surgeons.

      Graphical abstract

      Key Words

      Abbreviations and Acronyms:

      3D (three-dimensional), CHS (congenital heart surgery), GRS (global rating scale), HOST-CHS (Hands-On Surgical Training–Congenital Heart Surgery)
      Figure thumbnail fx2
      A 3D-printed model of transposition of the great arteries used for the validation study.
      Objective assessments within hands-on surgical simulation in congenital heart surgery is possible. As patient outcomes are scrutinized, such tools will be vital for surgeon development/progression.
      With the increasing support for hands-on surgical simulation in congenital heart surgery comes the need for objective assessments to evaluate surgeons’ performance. This work is fundamental before widespread curriculum implementation. This study aimed to validate an objective, procedure-specific assessment tool for the arterial switch operation, a technically challenging procedure in congenital heart surgery.
      See Commentaries on pages 240 and 242.
      There is a growing demand within congenital heart surgery (CHS) to evolve current training and address the future challenges facing the next generation of surgeons.
      • Husain S.A.
      Does practice make perfect?.
      • Hussein N.
      • Honjo O.
      • Haller C.
      • Hickey E.
      • Coles J.G.
      • Williams W.G.
      • et al.
      Hands-on surgical simulation in congenital heart surgery: literature review and future perspective.
      • Mavroudis C.D.
      • Mavroudis C.
      • Jacobs J.P.
      • DeCampli W.M.
      • Tweddell J.S.
      Simulation and deliberate practice in a porcine model for congenital heart surgery training.
      • Burkhart H.M.
      Simulation in congenital cardiac surgical education: we have arrived.
      • Karl T.R.
      • Jacobs J.P.
      Paediatric cardiac surgical education: which are the important elements?.
      • Kogon B.
      • Karamlou T.
      • Baumgartner W.
      • Merrill W.
      • Backer C.
      Congenital cardiac surgery fellowship training: a status update.
      • Fraser C.D.
      Becoming a congenital heart surgeon in the current era: realistic expectations.
      • Scanlan A.B.
      • Nguyen A.V.
      • Ilina A.
      • Lasso A.
      • Cripe L.
      • Jegatheeswaran A.
      • et al.
      Comparison of 3D echocardiogram-derived 3D printed valve models to molded models for simulated repair of pediatric atrioventricular valves.
      • Dearani J.A.
      Invited commentary.
      Hands-on surgical simulation has sought to address this issue, but is currently not incorporated within training curriculums in CHS.
      • Hussein N.
      • Honjo O.
      • Haller C.
      • Hickey E.
      • Coles J.G.
      • Williams W.G.
      • et al.
      Hands-on surgical simulation in congenital heart surgery: literature review and future perspective.
      For simulation to be widely adopted there is a need for objective assessment methods to measure performance, which requires development and validation. One barrier to achieving this is the emphasis on experienced surgeons to perform assessments and provide constructive feedback, which is onerous for already busy professionals.
      This study aimed to validate a procedure-specific assessment tool for the simulation of the arterial switch operation on 3D-printed models compared with an existing validated assessment tool in surgical simulation. The goal was to determine the consistency of scores among evaluators with different levels of experience in CHS and to evaluate whether the tool could discriminate between different grades of surgeon.

      Methods

      Study Design

      Five “expert” surgeons (with >5 years experience in CHS) and 5 “junior” surgeons (with <2 years experience in CHS) performed the arterial switch procedure on 3D-printed models with transposition of the great arteries during 1 of 2 hands-on surgical training courses held at The Hospital for Sick Children, Toronto, Canada. Their performances were video recorded and evaluated retrospectively by 9 evaluators with varying experience in CHS, including 3 established congenital heart surgeons with in-depth understanding of the arterial switch procedure, 2 cardiac surgical residents with a good theoretical understanding, and 4 non-MDs with no previous knowledge of congenital heart disease or surgery.
      Figure thumbnail fx3
      Video 1Clip from the gold standard video of the arterial switch procedure (medial trap door, closed technique) performed by an experienced staff surgeon. This video was used for training evaluators. Video available at: https://www.jtcvs.org/article/S0022-5223(19)40478-9/fulltext.
      The 10 procedures were scored independently by all the evaluators using 2 assessment tools: the Hands-On Surgical Training–Congenital Heart Surgery (HOST-CHS) assessment tool (Table 1) and the global rating scale (GRS), which was adapted from the scale originally described by Reznick and colleagues
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      (Table 2). After a 3-month period, 2 evaluators repeated the assessments to measure the intrarater reliability of the tools.
      Table 1Hands-On Surgical Training–Congenital Heart Surgery (HOST-CHS) assessment tool used to evaluate the arterial switch procedure on 3D-printed models
      StepYes/NoWeight of step (1–5)Included in HOST-CHS holistic score
      1.Transection of aorta
      Is the cut in the aorta
      1. i. Perpendicular to the vessel?YN2RESPECT
      2. ii. Clean? (ie, not jagged or having sharp protruding points)YN2RESPECT
      3.Is there enough distance on the proximal aorta (5-10 mm) for good sized coronary buttons?YN3KNOWLEDGE
      4.Is there enough distal length on the aorta for reconstruction of the neo-aorta?YN3KNOWLEDGE
      2.Excision of coronary artery buttons
      5.Have the coronary buttons been excised with a liberal amount of aortic sinus wall with the coronary artery?YN5RESPECT
      6.Is the coronary button rectangular-shaped?YN3KNOWLEDGE
      7.Is the coronary orifice in the center of the button?YN5KNOWLEDGE
      8.Is there enough aortic wall left for pulmonary artery reconstruction? (ie, oblique cut toward the anterior commissure)YN3KNOWLEDGE
      9.Has there been any damage to the coronary arteries or aortic/neo-pulmonary valve during excision and mobilization?NY5RESPECT
      3.Transection of ductus arteriosus and pulmonary trunk
      10.Has ductus been suture ligated and transected?YN1
      11.Is the proximal PDA suture a safe distance from the left pulmonary artery (>1-2 mm)?YN4
      Is the cut in the pulmonary trunk
      12. i. Perpendicular to the vessel?YN3RESPECT
      13. ii. Clean? (ie, not jagged or having sharp protruding points)YN3RESPECT
      14. iii. A safe distance away from the pulmonary bifurcation (2-5 mm) that it does not  compromise the branch PAs?YN4KNOWLEDGE
      15.Have one or more commissures been marked with a pen or stitch?YN4.KNOWLEDGE
      4.Reconstruction of neo-aorta
      16.Has the length of the ascending aorta been adjusted in a new position if required? (ie, trimmed)YN3KNOWLEDGE
      17.Has an end-to-end anastomosis been performed between the proximal neo-aorta and ascending aorta?YN3
      18.Was the anastomosis commenced posteriorly?YN3
      Suture/anastomosis assessment:
      19. i. Are all the sutures evenly spaced from one another with a gap of 2-3 mm between suture bites?YN3FLUENCY
      20. ii. Are all the sutures an adequate distance from the edge (2-3 mm)?YN3FLUENCY
      5.Implantation of coronary artery buttons to neo-aorta
      Left coronary button incision
      21. i. In the correct position for the technique of choice? (ie, medially based trap door for closed technique vs trap door/rectangular for open technique)YN5KNOWLEDGE
      22. ii. Adequate-sized incision for technique of choice? (eg, closed technique: incision is slightly smaller than button [4-6 mm] and edges of trap door are cut at right angles)YN4RESPECT
      Is the left coronary artery
      23. i. In the “best lie” position? (ie, lateral and superior, avoiding compression from PA, not stretching)YN5FLUENCY
      24. ii. Kinked or twisted?NY5FLUENCY
       iii. Suture/anastomosis assessment:
      25.a. Are all the sutures evenly spaced from one another with a gap of 1-2 mm between suture bites?YN4FLUENCY
      26.b. Are all sutures an adequate distance from the edge (1-2 mm) and is a safe distance from the neo-aortic valve and coronary ostium?YN4FLUENCY
      27. iv. Has the coronary button been trimmed appropriately? (ie, leaving more tissue medially than laterally in the trap door technique/not too much tissue left over effecting lay/anastomosis)YN3
      28. v. Is the coronary still in tact by the end of anastomosis (ie, not avulsed)?YN5
      Right coronary button incision
      29. i. In the correct position for the technique of choice? (ie, medially based trap door for closed technique vs trap door/rectangular for open technique)YN5KNOWLEDGE
      30. ii. Adequate sized incision for technique of choice? (eg, closed technique: incision is slightly smaller than button [4-6 mm] and edges of trap door are cut at right angles)YN4RESPECT
      Is the right coronary artery
      31. i. In the “best lie” position? (ie, lateral and superior, avoiding compression from PA, not stretching)YN5FLUENCY
      32. ii. Kinked or twisted?NY5FLUENCY
       iii. Suture/anastomosis assessment:
      33.a. Are all the sutures evenly spaced from one another with a gap of 1-2 mm between suture bites?YN4FLUENCY
      34.b. Are all sutures an adequate distance from the edge (1-2 mm), and is a safe distance from the neo-aortic valve and coronary ostium?YN4FLUENCY
      35. iv. Has the coronary button been trimmed appropriately? (ie, leaving more tissue medially than laterally in the trap door technique/not too much tissue left over effecting lay/anastomosis)YN3
      36. v. Is the coronary still intact by the end of anastomosis (ie, not avulsed)?YN5
      6.Reconstruction of neo-pulmonary trunk
      37.Has the candidate performed this procedure to completion? (ie, anastomosis of patch and then to branch PAs)YN4FLUENCY
      38.Is the height of patch level with the native tissue left following transection/ coronary button excision?YN2FLUENCY
      39.Is diameter of patch slightly larger than the native lumen size?YN2KNOWLEDGE
      40.Has an end-to-end anastomosis been performed between the neo-pulmonary trunk and the distal pulmonary artery?YN2
      41.Was the anastomosis commenced posteriorly?YN2
      Suture/anastomosis assessment:
      42. i. Are all the sutures evenly spaced from one another with a gap of 2-3 mm between suture bites?YN3FLUENCY
      43. ii. Are all the sutures an adequate distance from the edge (2-3 mm)?YN3FLUENCY
      Total score153
      The left part of the table shows the scoresheet with 43 questions within 6 categories. The 2 columns on the right show the predetermined weight of each score (1-5) and highlights the questions used to calculated the holistic HOST-CHS scores. These 2 columns were excluded from the evaluators scoresheet. HOST-CHS, Hands-On Surgical Training–Congenital Heart Surgery; PDA, patent ductus arteriosus; PA, pulmonary artery.
      Table 2Modified global rating scale based on the work by Reznick and colleagues
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      Task12345N/A
      Respect for tissueFrequently used unnecessary force on tissue or caused damage by inappropriate use of instrumentsCareful handling of tissue but occasionally caused inadvertent damageConsistently handled tissue appropriately with minimal damage
      Time and motionMany unnecessary movesEfficient time/motion but some unnecessary movesClear economy of movement and maximum efficiency
      Instrument handlingRepeatedly makes tentative or awkward moves with instruments by inappropriate use of instrumentsCompetent use of instruments but occasionally appears stiff or awkwardFluid moves with instruments and no awkwardness
      Flow of operationFrequently stopped operating and seemed unsure of next moveDemonstrated some forward planning with reasonable progression of procedureObviously planned course of operation with effortless flow from one move to the next
      This was used as the gold-standard assessment tool to compare the HOST-CHS assessment tool.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      Three steps—knowledge of instruments, use of assistants, and knowledge of specific procedure—were removed from the original rating scale, because they could not be assessed retrospectively with video analysis. N/A, Nonapplicable.

      HOST-CHS Assessment Tool Development and the GRS

      The HOST-CHS assessment tool is a procedure-specific checklist that has been designed to objectively assess the technical performance of each step involved in the arterial switch operation. The tool was developed using a combination of the fundamental principles of the nominal and Delphi methods of achieving consensus.
      • Fink A.
      • Kosecoff J.
      • Chassin M.
      • Brook R.H.
      Consensus methods: characteristics and guidelines for use.
      Four staff surgeons experienced in the procedure were independently asked to list the steps involved in the arterial switch operation. This information was collated, and a hierarchical task analysis was performed to deconstruct the operation into its essential components.
      • Zevin B.
      • Bonrath E.M.
      • Aggarwal R.
      • Dedy N.J.
      • Ahmed N.
      • Grantcharov T.P.
      Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery.
      A binary method of assessment was used for each step (ie, yes/no). The surgeons then re-reviewed the checklist and recommended which steps to include. After several rounds of review, a consensus was achieved. The tool was adapted for simulation by excluding steps that could not be performed on the 3D-printed model (eg, median sternotomy, establishing cardiopulmonary bypass). The surgeons then weighted each step based on its overall importance in the operation using a Likert scale of 1 to 5, with 5 signifying highest importance and 1 representing lowest importance. In total, there are 43 steps under 6 broad sections, with a maximum HOST-CHS score of 153 (Table 1).
      A holistic HOST-CHS score was developed to incorporate the general aspects of the surgical procedure. Each question of the HOST-CHS assessment tool was evaluated and placed into 1 of 3 categories as applicable: (1) fluency of the procedure (eg, suture placement, position of the reimplanted coronary artery); (2) knowledge of the technical aspects of the procedure (eg, correct shape and size of the coronary button); and (3) respect for tissue (eg, clean incisions, avoidance of collateral damage).
      The global rating scale (GRS) is a Likert scale–based validated assessment tool.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      The scale covers the fundamental characteristics that apply to all steps of a surgical procedure. The items “knowledge of instruments,” “use of assistants,” and “knowledge of specific procedure” were removed because they could not be assessed retrospectively via video analysis (Table 2).
      The results of each assessment tool were analyzed to assess the following: (1) the consistency of total score across all evaluators (interrater reliability) and individual scores for each question (intraclass correlation); (2) the consistency of scores for the same rater following a 3-month delay between assessments (intrarater reliability); (3) if there is a statistically significant difference in score between different levels of evaluators; and (4) discriminatory power: whether the assessment tool can differentiate between 2 grades of surgeon among all evaluators.
      Ethics approval was obtained from the appropriate Institutional Research Ethics Board.

      Statistical Methods

      Interrater and intrarater reliability of the assessment tools were performed using the intraclass correlation and correlation coefficient, respectively. The Kruskal–Wallis test was used to determine any differences in overall scores and the assessment tools' discriminatory power using a 95% confidence interval. To determine the rater consistency among each of the 43 HOST-CHS questions, a joint probability of agreement coefficient was averaged among the 9 raters and 10 videos and shown for each question.

      Results

      Reliability of the HOST-CHS and GRS Assessment Tools

      A total of 10 videos were assessed by 9 evaluators with different levels of experience in the arterial switch procedure. The interrater and intrarater reliability were higher for the HOST-CHS compared with the GRS, demonstrating a high level of consistency (Figure 1). The joint probability of agreement coefficient measures the fractional absolute agreement among raters and was high (0.81) for the HOST-CHS assessment tool. Nine questions were below the 0.7 threshold, showing a greater degree of variability among evaluators (Figure 2). These primarily involved the position, anastomosis, and trimming of the coronary buttons.
      Figure thumbnail gr1
      Figure 1Interrater (A) and intrarater (B) reliability/agreement for each question in the scoresheet among all evaluators for the Hands-on Surgical Training-Congenital Heart Surgery (HOST-CHS; blue) and the modified global rating scale (GRS; red) assessment tools across all 10 videos. The intraclass correlation (A) and correlation coefficient (B) were used to evaluate reliability. The HOST-CHS assessment tool demonstrated greater interrater and intrarater reliability than the GRS. Coefficients >0.7 are arbitrarily defined as “high reliability”. Averages: (A) HOST-CHS, 0.89; GRS, 0.08. (B) HOST-CHS, 0.76; GRS, 0.32.
      Figure thumbnail gr2
      Figure 2The joint probability of agreement coefficient for each question in the Hands-on Surgical Training–Congenital Heart Surgery (HOST-CHS) assessment tool (43 questions). The blue line indicates an arbitrary agreement coefficient threshold of 0.7, which indicates high reliability. Nine of the 43 questions fell below this threshold, showing a greater degree of variability among evaluators. The average agreement coefficient among all questions was 0.81 (highly reliable).

      Difference in Total Score Between Level of Evaluator and Discriminatory Power

      Total scores for expert surgeons were highly consistent across all evaluators with no statistically significant difference. Non-MD raters' total scores for junior surgeons were different, scoring slightly higher than resident and staff surgeons (Figure 3). However, all grades of evaluator were able to discriminate clearly between junior and expert surgeons in total score and all holistic HOST-CHS scores (Figure 4). Table E1 summarizes the expected and observed outcomes for the total score from the 2 assessment tools.
      Figure thumbnail gr3
      Figure 3Comparison of the “expert” surgeons' (A) and “junior” surgeons' (B) total Hands-on Surgical Training–Congenital Heart Surgery (HOST-CHS) scores for the arterial switch operation between the different grades of evaluators determined by the Kruskal–Wallis tests. Total scores for the expert surgeons were highly consistent across all evaluators, with no statistically significant difference. Non-MD raters' total scores for the junior surgeons (left) were slightly higher than those of residents (middle) and staff surgeons (right). Non-MD, Evaluators with no prior understanding of congenital heart disease or surgery.
      Figure thumbnail gr4
      Figure 4(Top) Difference in total Hands-On Surgical Training–Congenital Heart Surgery score (HOST-CHS) between “expert” and “junior” surgeons for each grade of evaluator. All evaluators were able to discriminate between both levels of surgeon demonstrating construct validity. A, Non-MD rater (P = .00004). B, Resident rater (P = .0002). C, Staff rater (P = .000003). (Bottom) Difference in the holistic HOST-CHS scores between expert and junior surgeons for MD raters (ie, resident + staff). Again the assessment tool is able to discriminate between different level of surgeon. D, Fluency of procedure (P = .000000006). E, Knowledge of procedure (P = .0000004). F, Respect for tissue (P = .008).

      Discussion

      Technical excellence is not automatically achieved by extensive experience, which is typically the mantra of surgical training in CHS, but more so from an active engagement in deliberate practice. This concept focuses training on the improvement of a particular task through repetition and immediate feedback.
      • Ericsson K.A.
      Deliberate practice and acquisition of expert performance: a general overview.
      Although this approach is prevalent in other surgical specialties, there is a need for objective standardized tests to reap these desired benefits within CHS.
      • Hussein N.
      • Honjo O.
      • Haller C.
      • Hickey E.
      • Coles J.G.
      • Williams W.G.
      • et al.
      Hands-on surgical simulation in congenital heart surgery: literature review and future perspective.
      The benefits of standardization are multiple, providing objective feedback for residents, early identification of resident deficiencies, program development and interinstitutional comparisons, and a potential tool for certification/training progression.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      Although effective, existing assessment tools in surgical simulation are primarily generalized and do not focus on the specific aspects of a procedure. There is also a heavy reliance on experienced surgeons to perform the assessments, which might be not feasible due to time limitations. This inadvertently may have a negative effect on their participation in the simulation process, which is crucial.
      The aim of this study was to develop an assessment tool that incorporates the benefits of expert surgeons' experience while being suitable for use by less experienced personnel. This would maximize the efficiency of the surgeons' time for teaching and increase the likelihood of simulation adoption. Therefore, an assessment tool needs to be both reliable and valid. Reliability refers to the precision of the assessment (ie, if the assessment is repeated on 2 successive occasions, without additional learning, it will produce the same result). A score/coefficient of >0.8 is deemed highly reliable, with 0.5 to 0.8 considered moderately reliable and <0.5 demonstrating low reliability. Validity refers to whether a test measures what it is intended to measure.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      The HOST-CHS tool demonstrated higher interrater reliability compared with the GRS (0.89 vs 0.08), with evaluators consistently scoring the same on each of the 43 items listed in the score sheet (Figure 1, A). In addition, after a lengthy period, 2 evaluators were able to repeat the evaluation with results very consistent with their first attempts (HOST-CHS, 0.76 vs GRS, 0.32). The GRS had a greater degree of variability, whereas the HOST-CHS tool showed consistency throughout all videos (Figure 1, B). Table 3 lists the different types of validity and whether they were addressed in the HOST-CHS tool.
      Table 3Types of validity as described by Gallagher and colleagues
      • Gallagher A.G.
      • Ritter E.M.
      • Satava R.M.
      Fundamental principles of validation, and reliability: rigorous science for the assessment of surgical education and training.
      and whether they were achieved in the HOST-CHS assessment tool
      Type of validityDescriptionAchieved in HOST-CHS assessment toolRationale
      Face validityContents of assessment tool are reviewed by experts to deem if it will assess what it intends toYesTool developed with input from experienced surgeons in various techniques of the arterial switch procedure
      Content validityEach item of the assessment is reviewed to determine appropriatenessYesMultiple rounds of review to achieve consensus of tasks
      Construct validityThe ability to differentiate between surgeons of different ability (ie, expert surgeon vs junior surgeon)YesStatistically significant difference in score between expert and junior surgeons among all evaluators (Figure 4)
      Concurrent validityAre the new assessment tools' results consistent with that of a gold standard assessment toolYesGlobal rating scale scores consistent with HOST-CHS total scores
      Discriminate validityThe ability to differentiate ability levels within a group with similar experience (ie, CHS fellows)NoStudy methodology did not assess this
      Predictive validityAre the scores in the assessment tool predictive of actual performanceNoStudy methodology did not assess this
      HOST-CHS, Hands-On Surgical Training–Congenital Heart Surgery.
      Simulation assessments are based on observations and thus are at risk of rater error affecting reliability and validity. Rater training improves accuracy, allowing evaluators with no clinical expertise to be as effective as expert assessors.
      • Feldman M.
      • Lazzara E.H.
      • Vanderbilt A.A.
      • DiazGranados D.
      Rater training to support high-stakes simulation-based assessments.
      In our study, non-MD raters successfully evaluated all videos and distinguished between junior and expert surgeons with total scores comparable with MD raters. However, their scores for junior surgeons were higher than MD markers, suggesting a potential limitation. These results were consistent with the results obtained using the gold standard tool. Table 4 lists the common rater errors and whether they were avoided in the HOST-CHS assessment tool.
      Table 4Common rater errors as described by Feldman and colleagues
      • Feldman M.
      • Lazzara E.H.
      • Vanderbilt A.A.
      • DiazGranados D.
      Rater training to support high-stakes simulation-based assessments.
      and whether they are avoided in the HOST-CHS assessment tool
      Common rater errorDescriptionAvoided in HOST-CHS assessment toolRationale
      Central tendencyAvoidance of extreme positive or negative ratingsYesBinary assessment method; outcome either positive or negative.
      Halo effectAll ratings based on one positive or negative observation.YesSpecificity of steps requires each question to be answered on its own merit.
      LeniencyAvoiding poor performance scale items.Yes
      Primary/recency effectRatings based on observations made early or late in the assessment.Yes
      Contrast effectRatings made relative to a performance of a previous group.YesEvaluators unaware of grade of operating surgeon as videos are randomized. All evaluators are given a reference gold standard video to refer to for calibration.
      Stereotype effectRatings based on group inclusion rather than individual differences.YesEvaluators blinded and videos randomized; videos were evaluated in one sitting.
      Similar-to-me effectRatings based on degree of similarity to rater.YesBinary assessment method and specificity of questions avoids this effect.
      HOST-CHS, Hands-On Surgical Training–Congenital Heart Surgery.

      Development of the HOST-CHS Assessment Tool

      Figure 5 summarizes the development and application of the HOST-CHS assessment tool, a procedure-specific checklist with items weighted based on their importance on a scale of 1 to 5. The HOST-CHS was used to assess 10 videos of the arterial switch operation by 9 evaluators, and showed a high interrater/intrarater reliability and total score consistency across all evaluators compared with the GRS. All evaluators were able to discriminate between “expert” and “junior” surgeons.
      Figure thumbnail gr5
      Figure 5The Hands-On Surgical Training–Congenital Heart Surgery (HOST-CHS) assessment tool was developed for simulating the arterial switch operation on 3D-printed models. The tool is a procedure-specific checklist with items weighed based on their importance on a scale of 1 to 5. Videos of 5 “junior” and 5 “expert” surgeons performing the operation were assessed blindly by 9 evaluators who were grouped into 3 categories based on their experience (4 non-MDs, 2 cardiac residents, and 3 congenital cardiac surgeons). The HOST-CHS assessment tool showed a high interrater and intrarater reliability and total score consistency across all evaluators. All evaluators were able to discriminate between expert and junior surgeons.
      The GRS is a widely used validated assessment tool developed for evaluating surgical tasks in simulation.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      ,
      • Martin J.A.
      • Regehr G.
      • Reznick R.
      • MacRae H.
      • Murnaghan J.
      • Hutchison C.
      • et al.
      Objective structured assessment of technical skill (OSATS) for surgical residents.
      ,
      • Szasz P.
      • Louridas M.
      • Harris K.A.
      • Aggarwal R.
      • Grantcharov T.P.
      Assessing technical competence in surgical trainees: a systematic review.
      It can be used across multiple procedures/specialties, providing a global assessment of the operative performance, and can differentiate between surgeons of varying abilities. However, its ability to focus on specific parts of an operation and provide constructive feedback is limited due to its generalized nature.
      • Szasz P.
      • Louridas M.
      • Harris K.A.
      • Aggarwal R.
      • Grantcharov T.P.
      Assessing technical competence in surgical trainees: a systematic review.
      A Likert scale- based assessment is also at risk for common rater errors and usually requires an experienced evaluator to perform the assessment. Furthermore, the differences among 5 grades are often arbitrary and indistinct.
      The binary system is an alternative method that can be used either to assess a trainee's competence in performing a procedure overall (ie, pass/fail) or whether they completed each of the tasks that compose a full procedure (ie, checklist).
      • Szasz P.
      • Louridas M.
      • Harris K.A.
      • Aggarwal R.
      • Grantcharov T.P.
      Assessing technical competence in surgical trainees: a systematic review.
      Checklists provide trainees with structured feedback, but their use elsewhere is limited due to their procedure-specific nature.
      • Szasz P.
      • Louridas M.
      • Harris K.A.
      • Aggarwal R.
      • Grantcharov T.P.
      Assessing technical competence in surgical trainees: a systematic review.
      This rigidity has led to findings of poor validity and reliability when used by experienced or less experienced evaluators
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      ,
      • Szasz P.
      • Louridas M.
      • Harris K.A.
      • Aggarwal R.
      • Grantcharov T.P.
      Assessing technical competence in surgical trainees: a systematic review.
      ,
      • Regehr G.
      • MacRae H.
      • Reznick R.K.
      • Szalay D.
      Comparing the psychometric properties of checklists and global rating scales for assessing performance on a OSCE-format examination.
      ; however, this was not experienced in our study. This was likely due to a combination of a very specific checklist, a reference gold standard video for calibration, and rater training.
      Both binary and Likert scale methods were incorporated into the HOST-CHS tool to make use of their benefits, while minimizing threats to reliability and validity. A binary method was found to provide the most objective assessment and to limit potential bias. In addition, the steps were developed in a very specific manner whereby a video could be evaluated by individuals with minimal experience in CHS.
      Within a surgical procedure, different tasks have varying levels of importance and consequences of incorrect performance. A weighting system using the Likert scale principles was developed and incorporated into the assessment tool to address this. Using methods of reaching consensus, we used staff members' experiences to predefine the importance of each step of the assessment. This predetermined weighting removed a significant proportion of the threats to validity of the assessment tool while incorporating the benefits of a Likert scale-based assessment.
      The inclusion of the holistic HOST-CHS score subcategorized the surgeons' performance into general aspects of the procedure. Using the HOST-CHS tool generated an objective score based on the original assessment, which was advantageous. This information can be used by the trainee to focus on aspects of the procedure they need to improve on (ie, fluency), in addition to the specific steps they performed incorrectly. Again, there was a clear difference in score between expert and junior surgeons.
      Achieving consensus is the key to developing a reproducible assessment tool. The methods used in this study collate staff opinion in a systematic manner, enabling each participant to express their views impersonally while providing information to the whole group.
      • Zevin B.
      • Bonrath E.M.
      • Aggarwal R.
      • Dedy N.J.
      • Ahmed N.
      • Grantcharov T.P.
      Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery.
      Congenital heart surgery presents a unique challenge in creating standardized procedure-specific assessments, because there are multiple ways to perform an operation with excellent outcomes. Therefore, we developed the assessment tool to incorporate all techniques without compromising specificity.
      Overall, the HOST-CHS assessment tool is effective in evaluating the performance of the arterial switch procedure of surgeons with different technical abilities. It provides surgeons with an objective score and highlights the specific areas and tasks needing improvement. This will focus on future training objectives and maximizing efficiency. Furthermore, evaluators with no previous knowledge of CHS can be trained to be as effective as experienced evaluators. Evaluations performed by unrelated evaluators (ie, non-MDs) may also eliminate the potential for biased ratings by the trainee surgeons' supervisors. These strengths will increase the likelihood that such assessment tools will be used and aid the incorporation of such simulation methods within future CHS curriculums.

      Limitations/Future Directions

      Although the HOST-CHS assessment tool has proven effective in evaluating the arterial switch operation, the specific nature of the assessment tool makes it unsuitable for other CHS procedures. The time to perform the assessment is acceptable (5-10 minutes), but the time to produce these assessment tools can be excessive. However, once these assessment tools are generated, they are reproducible and potentially can be used globally. This will be a significant step toward incorporating HOST simulation within curriculums and evolving training internationally. In addition, these checklists can be used by trainees while rehearsing and as a technique for self-assessment. In our institution, we have established a year-long curriculum and have committed to produce HOST-CHS assessment tools for all procedures performed.
      Despite the HOST-CHS assessment tools' high interrater and intrarater reliability, questions 8 and 22 to 35 showed the greatest degree of variably among evaluators. This result was consistent when both MD raters and staff raters were analyzed independently. These questions primarily cover implantation of the coronary button, technically the most challenging part of the procedure. Therefore, a degree of variability is expected. Potentially this could be improved by breaking down the section into further questions, but this will increase the assessment tool's complexity and compromise utilization. In addition, the videos are recorded at a single angle, which is a limiting factor for accurate assessment of the most complex steps. Early in our experience, we found that procedures involving primarily extracardiac repairs (eg, arterial switch, Norwood, interrupted aortic arch repair, supravalvular aortic stenosis repair) are excellent operations to capture on video and assess, but intracardiac procedures (ie, ventricular septal defect and atrioventricular septal defect repairs) can be difficult to record and assess due to the limited exposure. To address this, we are currently working on novel methods to allow for more accurate assessment of these procedures.
      Although promising there is still a requirement for standard setting within this assessment tool before it can be considered for trainee progression or certification purposes. Further research is needed to establish learning curves for each procedure, which will involve multiple attempts by trainees and an evaluation of their progression.
      • Ericsson K.A.
      Deliberate practice and acquisition of expert performance: a general overview.
      ,
      • Valsamis E.M.
      • Golubic R.
      • Glover T.E.
      • Husband H.
      • Hussain A.
      • Jenabzadeh A.R.
      Modeling learning in surgical practice.
      One method would be to generate a cutoff score (ie, “pass/fail”) as used in other assessment methods.
      • Reznick R.
      • Regehr G.
      • MacRae H.
      • Martin J.
      • McCulloch W.
      Testing technical skills via an innovative “bench station” examination.
      ,
      • Martin J.A.
      • Regehr G.
      • Reznick R.
      • MacRae H.
      • Murnaghan J.
      • Hutchison C.
      • et al.
      Objective structured assessment of technical skill (OSATS) for surgical residents.
      ,
      • de Montbrun S.
      • Satterthwaite L.
      • Grantcharov T.P.
      Setting pass scores for assessment of technical performance by surgical trainees.
      ,
      • Lou X.
      • Lee R.
      • Feins R.H.
      • Enter D.
      • Hicks Jr., G.L.
      • Verrier E.D.
      • et al.
      Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.
      However, within CHS, some surgical steps are significantly more important than others and if done incorrectly will lead to significant consequences. Setting a fixed pass score will allow a surgeon to potentially pass a procedure while making a mistake that would be catastrophic in actual practice. For example, a surgeon could perform the entire arterial switch operation perfectly but leave one of the coronaries kinked. The surgeon's score could be high enough to pass, but the consequences of this mistake would be fatal for the patient. We would prefer a method similar to an automobile driving test, in which each mistake is classified as major or minor, with a major mistake leading to overall failure regardless of total score. With the HOST-CHS assessment tool, this would be the steps that carry the most weight (5) and thus are easily identifiable.

      Conclusions

      This study describes the development and validation of an objective, procedure-specific assessment tool for the arterial switch operation that is comparable to an existing validated assessment tool used in surgical simulation. This new tool shows high consistency in scores between evaluators (interrater/intrarater reliability) and is able to discriminate accurately between different grades of surgeon. More importantly, this tool provides a high degree of objective feedback for the performing surgeon, which is fundamental for deliberate practice.
      Although further work is ongoing to produce assessment tools for the plethora of operations covered within the CHS, this study describes a methodology whereby evaluations can be performed accurately using less experienced evaluators. This will be fundamental to the global adoption of surgical simulation within CHS. With a higher degree of rater training, potentially all assessments could be accurately performed by evaluators with no experience in CHS.
      The use of HOST and other simulation studies are considered excellent methods to address the growing concerns within CHS training.
      • Hussein N.
      • Honjo O.
      • Haller C.
      • Hickey E.
      • Coles J.G.
      • Williams W.G.
      • et al.
      Hands-on surgical simulation in congenital heart surgery: literature review and future perspective.
      There is now a platform for quantifying and accurately evaluating performance. This will be highly beneficial in the training and development of the next generation of congenital heart surgeons worldwide.

      Conflict of Interest Statement

      Authors have nothing to disclose with regard to commercial support.

      Supplementary Data

      Appendix

      Table E1Comparison of expected and observed outcomes for the total score from the 2 assessment tools
      ComparisonExpected outcomeHOST-CHSGRS
      1Non-MD raters vs staff raters on junior surgeonsNo differenceDifference

      P = .0002
      Difference

      P = .02
      2Non-MD raters vs resident raters on junior surgeonsNo differenceDifference

      P = .0005
      Difference

      P = .004
      3Resident raters vs staff raters on junior surgeonsNo differenceNo difference

      P = .45
      Difference

      P = .06
      4Non-MD raters vs staff raters on expert surgeonsNo differenceDifference

      P = .03
      No difference

      P = .56
      5Non-MD raters vs resident raters on expert surgeonsNo differenceNo difference

      P = .31
      No difference

      P = .36
      6Resident raters vs staff raters on expert surgeonsNo differenceNo difference

      P = .24
      No difference

      P = .65
      7Junior surgeons vs expert surgeons according to non-MD ratersDifferenceDifference

      P = .00004
      Difference

      P = .045
      8Junior surgeons vs expert surgeons according to resident ratersDifferenceDifference

      P = .0002
      Difference

      P = .0003
      9Junior surgeons vs expert surgeons according to staff ratersDifferenceDifference

      P = .000003
      Difference

      P = .0016
      HOST-CHS, Hands-On Surgical Training–Congenital Heart Surgery; GRS, global rating scale; Non-MD, Evaluators with no prior understanding of congenital heart disease or surgery.

      References

        • Husain S.A.
        Does practice make perfect?.
        Semin Thorac Cardiovasc Surg. August 7, 2019; ([Epub ahead of print])
        • Hussein N.
        • Honjo O.
        • Haller C.
        • Hickey E.
        • Coles J.G.
        • Williams W.G.
        • et al.
        Hands-on surgical simulation in congenital heart surgery: literature review and future perspective.
        Semin Thorac Cardiovasc Surg. June 17, 2019; ([Epub ahead of print])
        • Mavroudis C.D.
        • Mavroudis C.
        • Jacobs J.P.
        • DeCampli W.M.
        • Tweddell J.S.
        Simulation and deliberate practice in a porcine model for congenital heart surgery training.
        Ann Thorac Surg. 2018; 105: 637-643
        • Burkhart H.M.
        Simulation in congenital cardiac surgical education: we have arrived.
        J Thorac Cardiovasc Surg. 2017; 153: 1528-1529
        • Karl T.R.
        • Jacobs J.P.
        Paediatric cardiac surgical education: which are the important elements?.
        Cardiol Young. 2016; 26: 1465-1470
        • Kogon B.
        • Karamlou T.
        • Baumgartner W.
        • Merrill W.
        • Backer C.
        Congenital cardiac surgery fellowship training: a status update.
        J Thorac Cardiovasc Surg. 2016; 151: 1488-1495
        • Fraser C.D.
        Becoming a congenital heart surgeon in the current era: realistic expectations.
        J Thorac Cardiovasc Surg. 2016; 151: 1496-1497
        • Scanlan A.B.
        • Nguyen A.V.
        • Ilina A.
        • Lasso A.
        • Cripe L.
        • Jegatheeswaran A.
        • et al.
        Comparison of 3D echocardiogram-derived 3D printed valve models to molded models for simulated repair of pediatric atrioventricular valves.
        Pediatr Cardiol. 2018; 39: 538-547
        • Dearani J.A.
        Invited commentary.
        Ann Thorac Surg. 2018; 105: 643-644
        • Feldman M.
        • Lazzara E.H.
        • Vanderbilt A.A.
        • DiazGranados D.
        Rater training to support high-stakes simulation-based assessments.
        J Contin Educ Health Prof. 2012; 32: 279-286
        • Reznick R.
        • Regehr G.
        • MacRae H.
        • Martin J.
        • McCulloch W.
        Testing technical skills via an innovative “bench station” examination.
        Am J Surg. 1997; 17: 226-230
        • Fink A.
        • Kosecoff J.
        • Chassin M.
        • Brook R.H.
        Consensus methods: characteristics and guidelines for use.
        Am J Public Health. 1984; 74: 979-983
        • Zevin B.
        • Bonrath E.M.
        • Aggarwal R.
        • Dedy N.J.
        • Ahmed N.
        • Grantcharov T.P.
        Development, feasibility, validity, and reliability of a scale for objective assessment of operative performance in laparoscopic gastric bypass surgery.
        J Am Coll Surg. 2013; 216: 955-965.e8
        • Ericsson K.A.
        Deliberate practice and acquisition of expert performance: a general overview.
        Acad Emerg Med. 2008; 15: 988-994
        • Gallagher A.G.
        • Ritter E.M.
        • Satava R.M.
        Fundamental principles of validation, and reliability: rigorous science for the assessment of surgical education and training.
        Surg Endosc. 2003; 17: 1525-1529
        • Martin J.A.
        • Regehr G.
        • Reznick R.
        • MacRae H.
        • Murnaghan J.
        • Hutchison C.
        • et al.
        Objective structured assessment of technical skill (OSATS) for surgical residents.
        Br J Surg. 1997; 84: 273-278
        • Szasz P.
        • Louridas M.
        • Harris K.A.
        • Aggarwal R.
        • Grantcharov T.P.
        Assessing technical competence in surgical trainees: a systematic review.
        Ann Surg. 2015; 261: 1046-1055
        • Regehr G.
        • MacRae H.
        • Reznick R.K.
        • Szalay D.
        Comparing the psychometric properties of checklists and global rating scales for assessing performance on a OSCE-format examination.
        Acad Med. 1998; 73: 993-997
        • Valsamis E.M.
        • Golubic R.
        • Glover T.E.
        • Husband H.
        • Hussain A.
        • Jenabzadeh A.R.
        Modeling learning in surgical practice.
        J Surg Educ. 2018; 75: 78-87
        • de Montbrun S.
        • Satterthwaite L.
        • Grantcharov T.P.
        Setting pass scores for assessment of technical performance by surgical trainees.
        Br J Surg. 2016; 103: 300-306
        • Lou X.
        • Lee R.
        • Feins R.H.
        • Enter D.
        • Hicks Jr., G.L.
        • Verrier E.D.
        • et al.
        Training less-experienced faculty improves reliability of skills assessment in cardiac surgery.
        J Thorac Cardiovasc Surg. 2014; 148 (e1-2): 2491-2496

      Linked Article

      • Quantitative assessment of technical performance during hands-on surgical training of the arterial switch operation using 3-dimensional printed heart models
        The Journal of Thoracic and Cardiovascular SurgeryVol. 160Issue 4
        • Preview
          Data supporting the use of hands-on simulation in congenital heart surgery are promising but primarily qualitative. This study aimed to demonstrate if there was an objective improvement in time and technical performance of the arterial switch procedure on 3-dimensional printed heart models by surgeons using a validated assessment method.
        • Full-Text
        • PDF
        Open Archive
      • Commentary: The time has come to measure and examine technical skills
        The Journal of Thoracic and Cardiovascular SurgeryVol. 160Issue 1
        • Preview
          After completion of the requirements of an Accreditation Council for Graduate Medical Education–approved thoracic surgery residency training program, potential practitioners in our specialty submit their operative experience and the evaluations of their program faculty to the American Board of Thoracic Surgery (ABTS) to sit for the Part I (written) examination. If they are successful with the written examination, they will sit for the Part II (oral) examination. For those pursuing a career in congenital heart surgery, the process begins again after completion of their congenital heart surgery training.
        • Full-Text
        • PDF
        Open Archive
      • Commentary: Surgical skill assessment: Time to examine?
        The Journal of Thoracic and Cardiovascular SurgeryVol. 160Issue 1
        • Preview
          Cardiothoracic surgery is arguably the most technically demanding surgical specialty. The current certification process for congenital cardiothoracic surgery includes a written examination (American Board of Thoracic Surgery congenital qualifying examination) that evaluates fund of knowledge and critical thinking skills. The oral examination (American Board of Thoracic Surgery congenital certifying examination) evaluates critical thinking skills and judgment. At present, there is no formal examination process to evaluate technical skills as part of final certification.
        • Full-Text
        • PDF
        Open Archive