Congenital heart disease| Volume 138, ISSUE 5, P1139-1153, November 2009
• PDF [570 KB]PDF [570 KB]
• Top

# An empirically based tool for analyzing mortality associated with congenital heart surgery

### Objective

Analysis of congenital heart surgery results requires a reliable method of estimating the risk of adverse outcomes. Two major systems in current use are based on projections of risk or complexity that were predominantly subjectively derived. Our goal was to create an objective, empirically based index that can be used to identify the statistically estimated risk of in-hospital mortality by procedure and to group procedures into risk categories.

### Methods

Mortality risk was estimated for 148 types of operative procedures using data from 77,294 operations entered into the European Association for Cardiothoracic Surgery (EACTS) Congenital Heart Surgery Database (33,360 operations) and the Society of Thoracic Surgeons (STS) Congenital Heart Surgery Database (43,934 patients) between 2002 and 2007. Procedure-specific mortality rate estimates were calculated using a Bayesian model that adjusted for small denominators. Each procedure was assigned a numeric score (the STS–EACTS Congenital Heart Surgery Mortality Score [2009]) ranging from 0.1 to 5.0 based on the estimated mortality rate. Procedures were also sorted by increasing risk and grouped into 5 categories (the STS–EACTS Congenital Heart Surgery Mortality Categories [2009]) that were chosen to be optimal with respect to minimizing within-category variation and maximizing between-category variation. Model performance was subsequently assessed in an independent validation sample (n = 27,700) and compared with 2 existing methods: Risk Adjustment for Congenital Heart Surgery (RACHS-1) categories and Aristotle Basis Complexity scores.

### Results

Estimated mortality rates ranged across procedure types from 0.3% (atrial septal defect repair with patch) to 29.8% (truncus plus interrupted aortic arch repair). The proposed STS–EACTS score and STS–EACTS categories demonstrated good discrimination for predicting mortality in the validation sample (C-index = 0.784 and 0.773, respectively). For procedures with more than 40 occurrences, the Pearson correlation coefficient between a procedure's STS–EACTS score and its actual mortality rate in the validation sample was 0.80. In the subset of procedures for which RACHS-1 and Aristotle Basic Complexity scores are defined, discrimination was highest for the STS–EACTS score (C-index = 0.787), followed by STS–EACTS categories (C-index = 0.778), RACHS-1 categories (C-index = 0.745), and Aristotle Basic Complexity scores (C-index = 0.687). When patient covariates were added to each model, the C-index improved: STS–EACTS score (C-index = 0.816), STS–EACTS categories (C-index = 0.812), RACHS-1 categories (C-index = 0.802), and Aristotle Basic Complexity scores (C-index = 0.795).

### Conclusion

The proposed risk scores and categories have a high degree of discrimination for predicting mortality and represent an improvement over existing consensus-based methods. Risk models incorporating these measures may be used to compare mortality outcomes across institutions with differing case mixes.

## CTSNet classification

#### Abbreviations and Acronyms:

ABC (Aristotle Basic Complexity), EACTS (European Association for Cardiothoracic Surgery), RACHS-1 (Risk Adjustment for Congenital Heart Surgery), STS (Society of Thoracic Surgeons)
Earn CME credits at http://cme.ctsnetjournals.org
Cardiac surgeons have recognized and emphasized the need to establish clinical registries and quantitative tools for responsible reporting of outcomes. Large multi-institutional databases, such as the Society of Thoracic Surgeons (STS) Adult Cardiac Surgery Database, among others, have developed, applied, and validated methods of risk adjustment in reporting outcomes. This has addressed appropriate concerns that the reporting of raw, unadjusted mortality data is misleading and potentially penalizes surgeons and centers that manage high-risk patients and complex procedures because observed mortality rates might be higher than in centers dealing with less challenging cases. The kinds of statistical tools and risk models that have been developed to address these issues when the clinical substrate is adult patients with acquired cardiovascular disease cannot simply be applied to the population of pediatric and adult patients with congenital heart disease. Here the problem is considerably more complex, in large part because the individual diagnoses and distinct types of surgical procedures number in the hundreds, despite the fact that the universe of patients with congenital heart disease is considerably smaller than that of adult patients with ischemic and valvular heart disease. As a result, the number of patients in some diagnostic and procedural groups is quite small. Nonetheless, it is recognized that the need to establish tools for case-mix adjustment is fundamental to any systematic attempt to measure outcomes, compare performance, and sustain a program of continual quality improvement.
As a response to the need for case-mix adjustment of outcome data but in the absence of significant amounts of registry data in 2000, the Aristotle Complexity score was developed.
• Lacour-Gayet F.
• Clarke D.
• Jacobs J.
• Comas J.
• Daebritz S.
• Daenen W.
• et al.
The Aristotle score: a complexity-adjusted method to evaluate surgical results.
• Lacour-Gayet F.
• Clarke D.
• Jacobs J.
• Gaynor W.
• Hamilton L.
• Jacobs M.
• et al.
The Aristotle score for congenital heart surgery.
Using the expert opinions of 50 internationally based surgeons, the Aristotle Basic Complexity (ABC) score was constructed for 145 distinct congenital heart surgery procedures. Three components (potential for mortality, potential for morbidity, and technical difficulty) were subjectively scored, and the sum became the ABC score.
Separately, another group of researchers developed the Risk Adjustment for Congenital Heart Surgery (RACHS-1) system, also using an expert panel.
• Jenkins K.J.
Risk adjustment for congenital heart surgery: the RACHS-1 method.
• Jenkins K.J.
• Gauvreau K.
Center-specific differences in mortality: preliminary analyses using the Risk Adjustment in Congenital Heart Surgery (RACHS-1) method.
RACHS-1 groups procedures into 6 levels of increasing risk of mortality. This allocation of procedures was subsequently refined using empirical data from 2 multi-institutional registries. When compared with the ABC score, the RACHS-1 categories appear to have better discrimination for predicting mortality, whereas the ABC score covers a larger proportion of congenital heart surgery case volume.
• Harrell Jr., F.E.
• Caldarone C.A.
• McCrindle B.W.
• Jacobs J.P.
• Williams M.G.
• et al.
Case complexity scores in congenital heart surgery: a comparative study of the Aristotle Basic Complexity score and the Risk Adjustment in Congenital Heart Surgery (RACHS-1) system.
• Kang N.
• Tsang V.T.
• Elliott M.J.
• de Leval M.R.
• Cole T.J.
Does the Aristotle score predict outcome in congenital heart surgery?.
• O'Brien S.M.
• Jacobs J.P.
• Clarke D.R.
• Maruszewski B.
• Jacobs M.L.
• Walters 3rd, H.L.
• et al.
Accuracy of the Aristotle Basic Complexity score for classifying the mortality and morbidity potential of congenital heart surgery operations.
The largest validation study of the ABC score was recently conducted by using a combined sample of nearly 36,000 patients from the STS Congenital Heart Surgery Database and the European Association for Cardiothoracic Surgery (EACTS) Congenital Heart Surgery Database.
• O'Brien S.M.
• Jacobs J.P.
• Clarke D.R.
• Maruszewski B.
• Jacobs M.L.
• Walters 3rd, H.L.
• et al.
Accuracy of the Aristotle Basic Complexity score for classifying the mortality and morbidity potential of congenital heart surgery operations.
In that study there was a significant increasing association between the ABC score and in-hospital mortality, with an overall C-index of 0.70. Although it was clear that the ABC score generally discriminated between low-risk and high-risk procedures, it was also clear that for a relatively small number of individual procedures, the initial estimation of mortality risk by the Aristotle international panel of surgical experts did not accurately predict the actual empirical estimates observed over the ensuing decade.
The goal of the present study was to derive a new system for classifying congenital heart surgery procedures based on their potential for in-hospital mortality using empirical data from the STS and EACTS databases. There were 3 specific objectives.
First, we sought to estimate procedure-specific relative risks of in-hospital mortality using a statistical model that accounts for uncertainty in procedures with small sample sizes.
Second, we sought to convert these procedure-specific mortality estimates into a scale ranging from 0.1 to 5.0. The range of this scale was chosen for consistency with the Aristotle method. The resulting score has been named the STS–EACTS Congenital Heart Surgery Mortality Score (2009) (or, briefly, the STS–EACTS score).
Third, we sought to group procedures with similar estimated mortality risk into a small number of relatively homogeneous categories (the STS–EACTS Congenital Heart Surgery Mortality Categories [2009] or, briefly, the STS–EACTS categories). These categories are intended to serve as a stratification variable that can be used to adjust for case mix when analyzing outcomes and comparing institutions.

## Materials and Methods

### Study Population

The STS Congenital Heart Surgery Database and the EACTS Database are described elsewhere.
• Jacobs J.P.
• Jacobs M.L.
• Maruszewski B.
• Lacour-Gayet F.G.
• Clarke D.R.
• Tchervenkov C.I.
• et al.
Current status of the European Association for Cardio-Thoracic Surgery and the Society of Thoracic Surgeons Congenital Heart Surgery Database.
The study population consisted of patients who underwent a congenital cardiovascular operation at an STS-participating hospital between January 1, 2002, and December 31, 2006, or at an EACTS-participating hospital between January 1, 2002, and April 4, 2007. Data from 1 STS center were excluded because this participant did not consistently report outcomes during the study period. Only the first operation of each hospital admission was analyzed. Operations were included if they involved one of the 148 cardiovascular procedures listed in Table 1. This list includes all cardiovascular procedures that were included in the short-list nomenclature of the STS and EACTS databases and appeared at least once as the primary procedure of an operation in the STS–EACTS dataset. Patients weighing less than or equal to 2500 g undergoing patent ductus arteriosus ligation as their primary procedure were excluded from the analysis because they are not included in mortality calculations in the EACTS and STS Congenital Database reports. In addition, 244 (0.3%) patients with missing in-hospital mortality status were excluded. The final study population consisted of 43,934 operations from 57 centers in the STS database and 33,360 operations from 91 centers in the EACTS database for a total of 77,294 operations.
Table 1Procedure names, proposed scores and categories, and data for model development
Procedure scoresNo. of operationsEstimated mortality risk
Procedure nameDifficulty rankingMortality scoreMortality categoryAll operationsNo. with nonmissing mortalityUnadjusted % (95% interval
Denotes 95% exact binomial confidence interval.
)
Model based % (95% interval
Denotes 95% Bayesian credible interval.
)
ASD repair, patch80.11403540280.2% (0.1%–0.4%)0.3% (0.1%–0.5%)
AVC (AVSD) repair, partial (incomplete) (PAVSD)310.11106410620.3% (0.1%–0.8%)0.5% (0.2%–0.9%)
ASD repair, patch + PAPCV repair280.214384380.2% (0.0%–1.3%)0.6% (0.2%–1.4%)
Aortic stenosis, subvalvar, repair420.21183418280.5% (0.3%–1.0%)0.6% (0.3%– 1.0%)
ICD (AICD) implantation140.213913840.3% (0.0%–1.4%)0.7% (0.2%–1.6%)
DCRV repair480.214674670.4% (0.1%–1.5%)0.8% (0.2%–1.6%)
ASD repair, primary closure70.21223022290.8% (0.5%–1.3%)0.9% (0.5%–1.3%)
VSD repair, patch320.21671767020.9% (0.7%–1.1%)0.9% (0.7%–1.1%)
Vascular ring repair190.218998950.8% (0.3%–1.6%)0.9% (0.4%–1.6%)
Coarctation repair, end to end240.21170317020.9% (0.5%– 1.5%)1.0% (0.6%–1.5%)
ICD (AICD) procedure150.211271260.0% (0.0%–2.9%)1.0% (0.2%–2.9%)
PFO, primary closure60.212172160.5% (0.0%–2.6%)1.1% (0.3%–2.5%)
AVR, bioprosthetic550.311011010.0% (0.0%–3.6%)1.2% (0.2%–3.4%)
VSD repair, primary closure300.317547521.1% (0.5%–2.1%)1.2% (0.6%–2.1%)
PVR440.316826801.2% (0.5%–2.3%)1.3% (0.6%–2.3%)
Conduit reoperation770.31130312991.3% (0.8%–2.1%)1.4% (0.8%–2.1%)
Pacemaker procedure30.31141114081.3% (0.8%–2.1%)1.4% (0.9%–2.1%)
PAPVC repair270.314814811.2% (0.5%–2.7%)1.5% (0.7%–2.7%)
TOF repair, ventriculotomy, nontransanular patch620.319309281.4% (0.7%–2.4%)1.5% (0.8%–2.4%)
TOF repair, no ventriculotomy810.318628601.4% (0.7%–2.4%)1.5% (0.8%–2.3%)
Glenn (unidirectional cavopulmonary anastomosis; unidirectional Glenn procedure)410.3165650.0% (0.0%–5.5%)1.5% (0.2%–4.3%)
AVC (AVSD) repair, intermediate (transitional)330.314214201.4% (0.5%–3.1%)1.6% (0.7%–3.0%)
Coarctation repair, interposition graft490.311141140.9% (0.0%–4.8%)1.7% (0.4%–4.1%)
Fontan, TCPC, lateral tunnel, fenestrated1010.317437421.6% (0.8%–2.8%)1.7% (0.9%–2.7%)
Sinus of Valsalva, aneurysm repair610.3153530.0% (0.0%–6.7%)1.7% (0.3%–5.2%)
AVR, mechanical520.313843831.6% (0.6%–3.4%)1.7% (0.7%–3.2%)
PDA closure, surgical50.42192219101.8% (1.3%–2.5%)1.9% (1.3%–2.5%)
PA, reconstruction (plasty), main (trunk)250.421921911.6% (0.3%–4.5%)1.9% (0.6%–4.0%)
LV to aorta tunnel repair900.4242420.0% (0.0%–8.4%)1.9% (0.3%–5.9%)
Valvuloplasty, mitral760.42175117471.9% (1.3%–2.6%)1.9% (1.3%–2.6%)
Valvuloplasty, aortic720.428618611.9% (1.1%–3.0%)1.9% (1.1%–2.9%)
11/2 Ventricular repair580.4239390.0% (0.0%–9.0%)2.0% (0.3%–6.2%)
Arrhythmia surgery– ventricular, surgical ablation850.4233330.0% (0.0%–10.6%)2.2% (0.3%–6.8%)
Pacemaker implantation, permanent20.42108610772.1% (1.4%–3.2%)2.2% (1.4%–3.1%)
Ross procedure1270.426206172.1% (1.1%–3.6%)2.2% (1.3%–3.4%)
Glenn + PA reconstruction710.424284262.1% (1.0%–4.0%)2.2% (1.1%–3.8%)
Aortopexy40.4230300.0% (0.0%–11.6%)2.3% (0.3%–7.3%)
Fontan, atriopulmonary connection940.4230300.0% (0.0%–11.6%)2.3% (0.3%–6.9%)
Bilateral bidirectional cavopulmonary anastomosis (bilateral bidirectional Glenn procedure)630.424494492.2% (1.1%–4.1%)2.4% (1.2%–3.8%)
Aortic root replacement, mechanical1110.521451452.1% (0.4%–5.9%)2.4% (0.7%–5.1%)
Conduit placement, LV to PA730.5225250.0% (0.0%–13.7%)2.4% (0.3%–7.9%)
Coarctation repair, end to end, extended500.52196519612.5% (1.9%–3.3%)2.5% (1.9%–3.3%)
Anomalous origin of coronary artery repair1190.523273262.5% (1.1%–4.8%)2.6% (1.2%–4.4%)
RVOT procedure400.52159115832.6% (1.9%–3.5%)2.6% (1.9%–3.5%)
Aortic aneurysm repair930.523223212.5% (1.1%–4.9%)2.6% (1.3%–4.5%)
Congenitally corrected TGA repair, VSD closure1060.5221210.0% (0.0%–16.1%)2.6% (0.3%–8.8%)
AP window repair350.521251252.4% (0.5%–6.9%)2.7% (0.9%–5.6%)
Valvuloplasty, pulmonic260.523073072.6% (1.1%–5.1%)2.7% (1.3%–4.7%)
TOF repair, ventriculotomy, transannular patch790.52254125352.7% (2.1%–3.4%)2.7% (2.1%–3.4%)
Aortic root replacement, bioprosthetic1200.5220200.0% (0.0%–16.8%)2.7% (0.3%–9.3%)
Bidirectional cavopulmonary anastomosis (bidirectional Glenn procedure)430.52250224922.7% (2.1%–3.4%)2.7% (2.1%–3.4%)
Aortic stenosis, supravalvar, repair640.523363352.7% (1.2%–5.0%)2.8% (1.4%–4.6%)
Pericardiectomy200.5248482.1% (0.1%–11.1%)2.9% (0.5%–7.5%)
Conduit placement, other750.5216160.0% (0.0%–20.6%)2.9% (0.3%–9.8%)
Aneurysm, ventricular, left, repair1070.5247462.2% (0.1%–11.5%)3.0% (0.5%–7.8%)
Fontan, TCPC, external conduit, fenestrated960.62124112383.0% (2.1%–4.1%)3.0% (2.1%–4.0%)
Pulmonary artery origin from ascending aorta (hemitruncus) repair890.6243432.3% (0.1%–12.3%)3.1% (0.6%–8.2%)
ASD, common atrium (single atrium), septation180.6244442.3% (0.1%–12.0%)3.1% (0.5%–8.3%)
PAPVC, scimitar, repair910.6272722.8% (0.3%–9.7%)3.2% (0.8%–7.7%)
Fontan, TCPC, external conduit, nonfenestrated970.628098073.2% (2.1%–4.7%)3.2% (2.1%–4.6%)
Ligation, pulmonary artery160.6211110.0% (0.0%–28.5%)3.4% (0.4%–12.1%)
Coronary artery fistula ligation170.6239382.6% (0.1%–13.8%)3.4% (0.6%–9.2%)
Aortic root replacement, valve sparing1420.6237372.7% (0.1%–14.2%)3.4% (0.6%–9.2%)
Mitral stenosis, supravalvar mitral ring repair740.6286863.5% (0.7%–9.9%)3.6% (1.0%–7.7%)
Arrhythmia surgery–atrial, surgical ablation840.722732723.7% (1.8%–6.7%)3.6% (1.9%–5.9%)
Systemic venous stenosis repair560.7259593.4% (0.4%–11.7%)3.7% (0.9%–8.6%)
PA, reconstruction (plasty), branch, peripheral (at or beyond the hilar bifurcation)700.721891893.7% (1.5%–7.5%)3.7% (1.6%–6.5%)
Valvuloplasty, tricuspid570.72118211783.7% (2.7%–5.0%)3.7% (2.8%–4.9%)
TVR650.721331333.8% (1.2%–8.6%)3.8% (1.5%–7.3%)
Valve replacement, truncal valve460.72880.0% (0.0%–36.9%)3.8% (0.4%–13.8%)
Fontan, TCPC, lateral tunnel, nonfenestrated990.721041043.8% (1.1%–9.6%)3.9% (1.3%–7.9%)
Atrial fenestration closure380.7229293.4% (0.1%–17.8%)3.9% (0.7%–11.3%)
Cor triatriatum repair600.721771764.0% (1.6%–8.0%)4.0% (1.8%–7.2%)
VSD, multiple, repair1130.723253244.0% (2.2%–6.8%)4.0% (2.2%–6.3%)
Atrial baffle procedure (non-Mustard, non-Senning)670.7226263.8% (0.1%–19.6%)4.0% (0.7%–11.0%)
Coarctation repair, subclavian flap230.722192194.1% (1.9%–7.7%)4.1% (2.0%–6.9%)
Partial left ventriculectomy (LV volume reduction surgery; Batista)1330.7226263.8% (0.1%–19.6%)4.1% (0.7%–11.3%)
TOF repair, RV–PA conduit800.723623584.2% (2.4%–6.8%)4.2% (2.4%–6.4%)
Transplantation, lung(s)1290.8394934.3% (1.2%–10.6%)4.2% (1.4%–8.6%)
Occlusion MAPCA(s)510.8326263.8% (0.1%–19.6%)4.2% (0.7%–12.1%)
Coarctation repair + VSD repair1120.833293274.3% (2.4%–7.1%)4.2% (2.4%–6.6%)
Konno procedure1310.831621624.3% (1.8%–8.7%)4.3% (1.9%–7.6%)
Coarctation repair, patch aortoplasty220.833953934.3% (2.5%–6.8%)4.3% (2.6%–6.5%)
PA, reconstruction (plasty), branch, central (within the hilar bifurcation)680.836466444.3% (2.9%–6.2%)4.3% (2.9%–5.9%)
Aneurysm, pulmonary artery, repair530.8323234.3% (0.1%–21.9%)4.3% (0.8%–12.2%)
Aneurysm, ventricular, right, repair860.8391914.4% (1.2%–10.9%)4.3% (1.4%–8.8%)
Ventricular septal fenestration450.8324244.2% (0.1%–21.1%)4.4% (0.8%–12.4%)
Shunt, ligation and takedown110.8365654.6% (1.0%–12.9%)4.5% (1.3%–9.9%)
Hemi-Fontan procedure780.832622604.6% (2.4%–7.9%)4.5% (2.4%–7.1%)
AVC (AVSD) repair, complete870.83286928604.6% (3.9%–5.4%)4.6% (3.9%–5.4%)
Anomalous systemic venous connection repair540.831661664.8% (2.1%–9.3%)4.8% (2.2%–8.2%)
ASO1150.83206920684.8% (3.9%–5.8%)4.8% (3.9%–5.7%)
Valvuloplasty, truncal valve590.8320205.0% (0.1%–24.9%)4.8% (0.8%–13.5%)
Fontan, atrioventricular connection1020.93220.0% (0.0%–84.2%)4.9% (0.4%–20.1%)
Pulmonary embolectomy, acute pulmonary embolus340.93220.0% (0.0%–84.2%)5.0% (0.4%–19.7%)
ASD partial closure100.9337375.4% (0.7%–18.2%)5.1% (1.1%–12.7%)
Rastelli operation1250.933333335.4% (3.2%–8.4%)5.3% (3.2%–7.8%)
Conduit placement, ventricle to aorta950.93110.0% (0.0%–97.5%)5.3% (0.5%–21.4%)
AVR, homograft1101330306.7% (0.8%–22.1%)5.8% (1.3%–13.8%)
REV1261.1326267.7% (0.9%–25.1%)6.3% (1.3%–15.5%)
Pulmonary artery sling repair1051.1388867.0% (2.6%–14.6%)6.4% (2.5%–11.9%)
Mustard procedure1001.1325258.0% (1.0%–26.0%)6.4% (1.4%–15.9%)
Pulmonary atresia–VSD (including TOF, PA) repair921.132892896.6% (4.0%–10.1%)6.4% (4.0%–9.3%)
Conduit placement, RV to PA661.239659646.7% (5.2%–8.5%)6.7% (5.2%–8.4%)
Pulmonary embolectomy371.239911.1% (0.3%–48.2%)7.1% (1.0%–22.1%)
MVR691.346376367.4% (5.5%–9.7%)7.3% (5.4%–9.4%)
Pericardial drainage procedure11.342582567.8% (4.8%–11.8%)7.5% (4.7%–11.0%)
Aortic arch repair821.447877827.9% (6.1%–10.0%)7.8% (6.1%–9.8%)
Fontan revision or conversion (redo Fontan procedure)1431.4468688.8% (3.3%–18.2%)7.9% (3.1%–14.6%)
DOLV repair1301.447714.3% (0.4%–57.9%)7.9% (1.0%–24.0%)
DORV, intraventricular tunnel repair1321.445835828.1% (6.0%–10.6%)8.0% (6.0%–10.3%)
Arterial switch procedure + aortic arch repair1361.44181811.1% (1.4%–34.7%)8.0% (1.7%–20.6%)
PA debanding291.441041048.7% (4.0%–15.8%)8.0% (3.7%–13.7%)
ASO and VSD repair1381.449879858.3% (6.7%–10.2%)8.2% (6.6%–10.0%)
Cardiac tumor resection881.442212208.6% (5.3%–13.2%)8.3% (5.1%–12.2%)
Transplantation, heart1031.446266258.5% (6.4%–10.9%)8.4% (6.3%–10.6%)
Coronary artery bypass981.5462629.7% (3.6%–19.9%)8.5% (3.5%–16.0%)
TOF–absent pulmonary valve repair1091.541661659.1% (5.2%–14.6%)8.6% (5.0%–13.1%)
Valve excision, tricuspid (without replacement)131.545520.0% (0.5%–71.6%)8.8% (1.2%–28.1%)
Shunt, systemic to pulmonary, MBTS391.54279327858.9% (7.9%–10.1%)8.9% (7.9%– 10.0%)
TOF–AVC (AVSD) repair1221.641451449.7% (5.4%–15.8%)9.1% (5.0%–14.1%)
Ross–Konno procedure1461.642052059.8% (6.1%–14.7%)9.4% (5.8%–13.9%)
Senning procedure1081.64454511.1% (3.7%–24.1%)9.4% (3.5%–18.6%)
Ebstein's repair1241.64656510.8% (4.4%–20.9%)9.5% (4.0%–17.6%)
Aortic arch repair + VSD repair1231.7433933810.1% (7.1%–13.8%)9.8% (6.9%–13.1%)
PA banding211.74129812929.9% (8.3%–11.7%)9.8% (8.3%–11.5%)
Aortic root replacement, homograft1211.7410410210.8% (5.5%–18.5%)9.9% (5.1%–16.2%)
Unifocalization MAPCA(s)1161.7431931910.3% (7.2%–14.2%)10.0% (7.1%–13.4%)
Aortic dissection repair1281.74323112.9% (3.6%–29.8%)10.0% (3.0%–21.1%)
Congenitally corrected TGA repair, VSD closure and LV to PA conduit1351.74121216.7% (2.1%–48.4%)10.1% (2.0%–25.9%)
Pulmonary atresia–VSD–MAPCA (pseudotruncus) repair1371.7416015810.8% (6.4%–16.7%)10.2% (6.1%–15.3%)
VSD creation/enlargement831.8410710611.3% (6.0%–18.9%)10.4% (5.6%–16.6%)
HLHS biventricular repair1451.94646412.5% (5.6%–23.2%)10.9% (4.8%– 18.8%)
TAPVC repair1041.941381137911.2% (9.6%–13.0%)11.2% (9.5%–12.8%)
Pulmonary venous stenosis repair1172427026811.9% (8.3%–16.4%)11.4% (8.0%–15.3%)
Shunt, systemic to pulmonary, central (from aorta or to main pulmonary artery)472.1466366112.3% (9.9%–15.0%)12.1% (9.7%–14.6%)
Interrupted aortic arch repair1182.1451951512.4% (9.7%–15.6%)12.2% (9.6%–15.1%)
Arterial switch procedure and VSD repair + aortic arch repair1442.4411311315.0% (9.0%–23.0%)14.0% (8.5%–20.5%)
Truncus arteriosus repair1342.4459258614.3% (11.6%–17.4%)14.1% (11.4%–16.8%)
ASD creation/enlargement92.5413813615.4% (9.8%–22.6%)14.5% (9.4%–20.9%)
Atrial septal fenestration122.64181822.2% (6.4%–47.6%)15.1% (4.5%–30.8%)
Valve closure, tricuspid (exclusion, univentricular approach)362.645540.0% (5.3%–85.3%)15.6% (2.7%–41.6%)
Damus–Kaye–Stansel procedure (creation of AP anastomosis without arch reconstruction)1142.9534434317.5% (13.6%–21.9%)17.1% (13.2%–21.5%)
Transplantation, heart and lung1413.25131330.8% (9.1%–61.4%)18.7% (5.4%–39.8%)
Congenitally corrected TGA repair, atrial switch and Rastelli operation1393.25181827.8% (9.7%–53.5%)18.9% (6.3%–37.2%)
Congenitally corrected TGA repair, atrial switch and ASO (double switch)1483.45323225.0% (11.5%–43.4%)20.0% (9.1%–34.7%)
Norwood procedure147452383235923.7% (22.0%–25.4%)23.6% (21.9%–25.3%)
Truncus + IAA repair14055434334.9% (21.0%–50.9%)29.8% (17.7%–44.3%)
ASD, Atrial septal defect; AVC, atrioventricular canal; AVSD, atrioventricular septal defect; PAVSD, partial atrioventricular septal defect; PAPVC, partial anomalous pulmonary venous connection; ICD, implantable cardioverter defibrillator; AICD, automatic implantable cardioverter defibrillator; DCRV, double-chambered right ventricle; VSD, ventricular septal defect; PFO, patent foramen ovale; AVR, aortic valve replacement; PVR, pulmonary valve replacement; TOF, tetralogy of Fallot; TCPC, total cavopulmonary connection; PDA, patent ductus arteriosus; PA, pulmonary artery; LV, left ventricle; RVOT, right ventricular outflow tract; TGA, transposition of the great arteries; AP, aortopulmonary; TVR, tricuspid valve replacement; RV, right ventricle; MAPCA, major aortopulmonary collateral artery; ASO, arterial switch operation; REV, réparation à l'étage ventriculaire (REV procedure); MVR, mitral valve replacement; DOLV, double-outlet left ventricle; MBTS, modified Blalock–Taussig shunt; HLHS, hypoplastic left heart syndrome; TAPVC, total anomalous pulmonary venous connection; IAA, interrupted aortic arch.
Denotes 95% exact binomial confidence interval.
Denotes 95% Bayesian credible interval.
The risk tool developed using this dataset was subsequently validated in a separate sample of STS and EACTS patients meeting the same inclusion criteria described above. This validation sample consisted of 20,042 operations performed between January 1, 2007, and June 30, 2008, in the STS database and 7658 operations performed between April 5, 2007, and April 8, 2008, in the EACTS database.
Hospitals participating in the STS and EACTS registries are required to comply with local regulatory and privacy guidelines. The Duke Clinical Research Institute serves as the data analysis center for the STS database and has an agreement, as well as institutional review board approval, to analyze the aggregate deidentified data for research purposes.

### Classification of Multiple-Procedure Operations

Several procedures listed in Table 1 are actually combinations of 2 or more procedures. These combinations were identified by the Aristotle expert panel because they occur frequently in the STS and EACTS databases and because the complexity of the combination is regarded as being different from the complexity of the component procedures when performed in isolation. For all other operations involving combinations of procedures, the operation was classified according to the most technically complex procedure, as determined by the difficulty component of the 2007 update of the ABC score. The ABC score contains some ties and is not defined for 3 of the procedures listed in Table 1. To deal with undefined or tied Aristotle scores, 6 of the study authors independently ranked the difficulty of each procedure listed in Table 1. Undefined or tied Aristotle scores were adjudicated by assigning the operation to the procedure with the highest average ranking determined by the 6 graders. The difficulty rankings are included in Table 1 so that users of the risk tool will be able to replicate our method of classifying multiple-procedure operations.

### End Point

The study end point was in-hospital mortality, which was defined as death during the same hospitalization as surgery regardless of cause.

### Estimation of Procedure-Specific Mortality Rates

Mortality estimates were calculated by using a Bayesian random effects model that adjusted each procedure's mortality rate based on the size of the denominator. Using a statistical model was considered advantageous because several individual procedures had small denominators, and hence their unadjusted mortality rates were susceptible to chance fluctuations. Unlike conventional methods, random effects models use data from all of the procedures in the database when estimating the probability of mortality for any single procedure. This “borrowing of information” across procedures produces estimates with good statistical properties, including smaller standard errors than conventional estimates. Heuristically, the model-based estimate is a weighted average of a procedure's actual observed mortality rate and the overall average mortality rate for all procedures in the database. The model weights an individual procedure's own data more heavily when the denominator is large enough to be reliable and weights the overall average mortality rate more heavily when the denominator is too small to support a reliable mortality estimate. For procedures with more than 200 occurrences, the model-based estimates were virtually identical to the usual unadjusted (raw) mortality percentages (Appendix 1).

### Creation of the Mortality Score

Each procedure was assigned a numeric score (STS–EACTS score) ranging from 0.1 to 5.0. The scores were assigned by shifting and rescaling the estimated procedure-specific mortality rates to lie in the interval from 0.1 to 5.0 and then rounding to one decimal place. The following formula was used:
$Mortalityscoreofj-thprocedure=0.1+4.9×pj−minmax−min,$

where $pj$ denotes the estimated risk of the j-th procedure, and max and min denote the maximum and minimum values of $pj$ across the 148 procedures.

### Creation of Mortality Categories

Procedures were sorted by increasing estimated risk and partitioned into 5 relatively homogeneous categories (STS–EACTS categories). Five categories was the smallest number that did not result in excessive within-category heterogeneity. Within-category homogeneity was measured objectively using a weighted sum of squares criterion (Appendix 2).
• O'Brien S.M.
Cutpoint selection for categorizing a continuous predictor.
A dynamic programming algorithm was then used to find the categorization that maximizes the homogeneity criterion. This data-driven approach ensures that procedures in the same category will be as similar as possible with respect to their estimated mortality risk.
To determine the number of categories, we evaluated the performance of different categorizations consisting of 2 to 20 categories. Performance was assessed internally based on 2 criteria. First, we evaluated the internal homogeneity of the categories using the criterion described in Appendix 2. Second, we assessed the discrimination of the categories as predictors of mortality. Discrimination was quantified by the area under the receiver operating characteristic curve (also known as the C-index).
• Hanley J.A.
• McNeil B.J.
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
The C-index is interpreted as the probability that a randomly selected patient who died was considered to be higher risk than a randomly selected patient who survived. The C-index generally ranges from 0.5 to 1.0, with 0.5 representing no discrimination (ie, a coin flip) and 1.0 representing perfect discrimination.

### Models Combining Scores and Categories With Patient-Level Risk Factors

Two logistic regression models were developed to illustrate the utility of modeling the proposed scores and categories together with patient-level risk factors. The first model included the STS–EACTS score (modeled as a continuous variable) plus 3 patient-level factors: age, weight, and preoperative length of stay. To allow for possible nonlinear effects, the score and the square of the score were both entered in the model. Age and weight were modeled jointly by converting them into a single categorical variable with 7 levels (see Results). Preoperative length of stay was dichotomized as less than or equal to 2 days versus more than 2 days. The second model was identical but used the STS–EACTS categories (modeled as a set of category indicators) instead of the STS–EACTS score. Additional patient factors, such as comorbidities, were not included because these data were not available to us for the EACTS subset at the time of analysis.

### Comparisons With RACHS-1 Categories and ABC Scores

The models described above were also estimated with RACHS-1 categories in place of the STS–EACTS categories and with the ABC score in place of the STS–EACTS score to facilitate comparisons with existing methods. Briefly, the ABC score of a procedure is a number ranging from 1.5 to 15 points that reflects the Aristotle expert panel's assessment of that type of procedure's potential for mortality, morbidity, and technical difficulty. When analyzing operations with multiple procedures, the ABC score was defined as the maximum ABC score across all procedures in the operation. The RACHS-1 methodology divides procedures into 6 categories based on an expert panel's assessment of the procedure's average mortality risk, where category 1 has the lowest risk of mortality and category 6 has the highest. Unlike the ABC method, the classification of some procedures is allowed to depend on the patient's age. When analyzing operations with multiple procedures, the operation is assigned to the procedure with the highest RACHS-1 category. Because very few data points were available in RACHS-1 category 5, it was combined with category 6 for analysis. The “full” RACHS-1 methodology involves fitting a logistic regression model that includes indicator variables for the RACHS-1 categories together with an indicator variable for single versus multiple cardiac procedures, plus additional adjustment for 3 patient-level risk factors: age, prematurity, and presence of a major noncardiac structural anomaly. Because the required patient-level risk factors were not available in our dataset, we did not implement the full RACHS-1 methodology but instead focused on evaluating the discrimination of the RACHS-1 categories with and without adjustment for patient age, weight, and preoperative length of stay.

### Independent Validation Using 2007–2008 Data

The performance of each model was assessed in a separate, more contemporary sample of STS and EACTS data. Overall discrimination was quantified by the C-index. The ability of the proposed score to predict the risk of individual procedures was quantified by calculating the Pearson correlation coefficient between the score and the actual calculated procedure-specific mortality rate in the validation sample. Because sampling variation in the validation sample might artificially increase or decrease the Pearson correlation coefficient, procedures with fewer than 40 occurrences in the validation sample were excluded when calculating the Pearson correlation coefficient. For graphing the association between the proposed score and observed mortality, data from procedures with the same score were aggregated, and the mortality rate of each group of procedures was plotted as a function of the score, excluding groups with fewer than 40 cases. The entire validation was also repeated in the subset of procedures having at least 200 cases in the development sample. Finally, to permit a fair comparison with RACHS-1 and ABC scores, the performance of each model was assessed in the subset of procedures for which both RACHS-1 categories and ABC scores are defined (n = 25,106 patient operations). Statistical comparisons of the C-index for different models were performed using the method of DeLong and colleagues.
• DeLong E.R.
• DeLong D.M.
• Clarke-Pearson D.L.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.

## Results

A total of 77,294 patient operations were analyzed, including 3308 (4.3%) in-hospital deaths. There were 71 procedures with at least 200 occurrences, 104 procedures with at least 50 occurrences, and 133 procedures with at least 20 occurrences. Procedures with at least 200 occurrences accounted for 94% of the total patients and 91% of the deaths.

### Mortality Rates for Individual Procedures

The frequency of in-hospital mortality for individual procedures ranged from 0% to 40.0%. There were 18 procedures with zero deaths; all of these had sample sizes smaller than 200. When Bayesian modeling was used to estimate mortality risk for individual procedures, the estimates ranged from 0.3% (atrial septal defect repair with patch) to 29.8% (truncus plus interrupted aortic arch repair, Figure 1). For the procedures with more than 200 cases, the raw and model-based estimates were virtually identical (Pearson correlation coefficient > 0.999, Appendix 1).

### Mortality Scores and Categories

Names of the procedures analyzed in this study are listed in Table 1, along with their raw and model-based mortality estimates and their proposed scores and categories. The STS–EACTS score takes on values between 0.1 and 5.0 and has 29 unique values. The STS–EACTS categories consist of 5 groups labeled 1 to 5, with higher numbers implying higher mortality risk. The number of patients and procedures per category and their aggregated mortality rates are summarized in Table 2.
Table 2Characteristics of proposed risk categories in 2002–2007 STS and EACTS data
STS–EACTS mortality category
12345
Range of scores0.1–0.30.4–0.70.8–1.21.3–2.62.7–5.0
No. of procedures265227376
No. of patients28,36323,235902613,8622808
No. of deaths2346014491374650
Mortality0.8%2.6%5.0%9.9%23.1%
STS–EACTS, Society of Thoracic Surgeons–European Association for Cardiothoracic Surgery.
The within-category homogeneity criterion and the C-index were plotted as functions of the number of categories to help us determine the optimal number of mortality categories. As shown in Figure 2, A, within-category homogeneity increases rapidly with the number of categories when the number of categories is small. With more than 4 or 5 categories, the homogeneity continues to increase, but the marginal improvement per additional category approaches zero. Similarly, Figure 2, B, shows that the estimated discrimination of the categories changes dramatically when the number of groups is varied between 2 and 5, but using more than 5 categories has a relatively modest effect on the C-index. Five categories were chosen as the smallest number that produces both acceptable within-category homogeneity and good discrimination.
Examples of regression models using the proposed scores and categories are summarized in Table 3. The C-index was 0.814 for the model that combined patient factors with the STS–EACTS score and 0.810 for the model that combined patient factors with the STS–EACTS categories. For comparison, when age, weight, and preoperative length of stay were analyzed in a logistic regression model without adjustment for the STS–EACTS scores or categories, the C-index was 0.755.
Table 3Summary of logistic regression models combining the proposed STS–EACTS scores and categories with patient-level risk factors
Odds ratio (95% confidence interval)
VariableModel 1: STS–EACTS score + patient factorsModel 2: STS–EACTS categories + patient factors
STS–EACTS mortality score
0.5 vs 0.251.4 (1.4–1.5)
1.0 vs 0.252.6 (2.4–2.8)
2.0 vs 0.256.3 (5.6–7.1)
4.0 vs 0.259.4 (8.2–10.8)
STS–EACTS mortality category
Category 1Reference
Category 22.9 (2.4–3.3)
Category 34.3 (3.6–5.0)
Category 47.5 (6.5–8.7)
Category 515.9 (13.3–18.9)
Age and weight category
Age ≥1 yReferenceReference
Age 1–11 mo, weight ≥6.0 kg1.0 (0.8–1.2)0.9 (0.8–1.1)
Age 1–11 mo, weight 4.0–5.9 kg1.4 (1.2–1.6)1.3 (1.2–1.5)
Age 1–11 mo, weight <4.0 kg2.6 (2.2–3.0)2.6 (2.3–3.0)
Age <1 mo, weight ≥3.0 kg2.0 (1.8–2.2)1.9 (1.7–2.2)
Age <1 mo, weight 2.0–2.9 kg3.3 (2.8–3.8)3.2 (2.8–3.7)
Age <1 mo, weight <2.0 kg4.9 (4.2–5.8)4.9 (4.2–5.7)
Preoperative LOS
≤2 dReferenceReference
>2 d1.4 (1.3–1.6)1.4 (1.3–1.5)
STS–EACTS, Society of Thoracic Surgeons–European Association for Cardiothoracic Surgery; LOS, length of stay.

### Validation Using 2007–2008 Data

There was a strong positive association between the proposed STS–EACTS score and actual observed mortality in the validation sample (C-index = 0.784). For the 82 procedures with at least 40 occurrences in the validation sample, the Pearson correlation coefficient between the score of a procedure and its actual observed mortality rate in the validation sample was 0.80. An increasing association between the score and mortality was observed across the range of scores, although several groups of procedures had lower than expected mortality (Figure 3).
The observed mortality rate in the validation sample was slightly lower than in the development sample (3.9% vs 4.3%, P = .004), reflecting a trend toward lower mortality in a more contemporary sample. This lower mortality was seen in each of the 5 STS–EACTS categories (Figure 4). Despite the trend toward lower absolute mortality in 2007–2008, the chosen categories continued to perform well at discriminating between high-risk and low-risk procedures (C-index = 0.773). Receiver operating characteristic curves for the proposed scores and categories are displayed in Figure 5. When the validation was repeated in the subset of 73 procedures with at least 200 cases in the development sample, there was a similarly high level of discrimination (C-index = 0.790 for STS–EACTS scores; C-index = 0.782 for STS–EACTS categories) and high correlation between the STS–EACTS score and procedure-specific mortality rates (Pearson correlation coefficient = 0.87).
To assess whether the proposed method discriminates mortality better than the existing RACHS-1 categories and Aristotle scores, each of these was evaluated in the validation sample using the subset of procedures for which both RACHS-1 categories and ABC scores are defined. As summarized in Table 4, discrimination was highest for the STS–EACTS score (C-index = 0.787), followed by the STS–EACTS categories (C-index = 0.778), RACHS-1 categories (C-index = 0.745), and ABC scores (C-index = 0.687, all differences P < .0001). Adding patient-level covariates substantially improved each model's discrimination. With the addition of these patient variables, discrimination was highest for the STS–EACTS score (C-index = 0.816), followed by STS–EACTS categories (C-index = 0.812; comparison with STS–EACTS score, P = .035), RACHS-1 categories (C-index = 0.802; comparison vs STS–EACTS categories, P = .008), and ABC scores (C-index = 0.795; comparison vs STS–EACTS score, P < .0001).
Table 4Comparison of C-index for models using the STS–EACTS score, STS–EACTS categories, RACHS-1 categories, and ABC scores
Validation sample, subset of procedures for which both RACHS-1 categories and ABC scores are defined.
Method of modeling proceduresModel without patient covariates (C-index)Model with patient covariates (C-index)
STS–EACTS score0.7870.816
STS–EACTS categories0.7780.812
RACHS-1 categories0.7450.802
ABC score0.6870.795
STS–EACTS, Society of Thoracic Surgeons–European Association for Cardiothoracic Surgery; RACHS-1, Risk Adjustment for Congenital Heart Surgery; ABC, Aristotle Basic Complexity.
Validation sample, subset of procedures for which both RACHS-1 categories and ABC scores are defined.

## Discussion

The goal of this study was to derive a valid tool that can be used to stratify congenital heart surgery procedures based on their relative risk of in-hospital mortality. Using the combined resources of the STS and EACTS databases, we estimated the average mortality rate of 148 procedures and then applied a data-driven algorithm to determine the grouping of procedures that was optimal in the sense of creating internally homogeneous strata. The resulting scores and categories are intended to serve as tools for case-mix adjustment when comparing outcomes of hospitals that perform congenital heart surgery. These measures can be used to perform a stratified analysis that adjusts for type of procedure or they can be included along with patient-level variables in a comprehensive risk adjustment model.
Previous investigators have used a combination of expert opinion and empirical data to group procedures with a similar risk of in-hospital mortality. Experts initially used clinical judgment to group procedures with a similar potential for in-hospital mortality to create the RACHS-1 risk categories. This allocation of procedures was subsequently refined by using empirical data from 2 multi-institutional registries. The goals of the present study were similar to those of RACHS-1 in that we also sought to create internally homogeneous procedure categories using the end point of discharge mortality. A major difference between our approach and the derivation of RAHCS-1 categories is that our procedure categories were determined empirically without the input of an expert panel. When the proposed methodology was assessed in an independent validation sample, models based on the STS–EACTS score and categories had substantially better discrimination than comparable models based on RACHS-1 categories and ABC scores.
Despite the advantages of an empirically based risk stratification system, there are several limitations and caveats.
First, our study focused on estimating procedural mortality and determining homogeneous procedure categories. Additional research is needed to determine the best method of combining these procedural variables with adjustment for patient-specific risk factors.
Second, despite the large database, several individual procedures had small sample sizes, and the true mortality of these procedures may have been estimated with error. We attempted to minimize this error by using a statistical model, which accounted for small denominators.
Third, because the EACTS and STS registries are voluntary, it is possible that the results observed in this database will differ from those of other nonparticipating institutions.
Fourth, because auditing of the STS and EACTS databases has been limited to a small number of sites, the completeness and accuracy of the data are largely unknown. In an audit of 200 patient records from 10 different STS centers, there was 99.0% agreement in the reporting of discharge mortality by STS sites versus independent auditors and no evidence of selective reporting based on discharge mortality status (personal communication, unpublished STS data).
Another potential limitation rests in the fact that mortality was determined only on the basis of status at the time of discharge. Operative mortality has been defined by the STS Congenital Database Taskforce and the Joint STS–EACTS Congenital Database Committee.
• Jacobs J.P.
• Mavroudis C.
• Jacobs M.L.
• Maruszewski B.
• Tchervenkov C.I.
• Lacour-Gayet F.G.
• et al.
What is operative mortality? Defining death in a surgical registry database: a report of the STS Congenital Database Taskforce and the Joint EACTS-STS Congenital Database Committee.
It requires knowledge not only of status at discharge but of patient status at 30 days after the operation. Going forward, validation of the STS–EACTS scores and categories using this definition will be possible as the completeness of these data fields in the STS and EACTS databases improves (Appendix 3).
In summary, we have developed a new tool for grouping procedures with a similar empirically estimated risk of in-hospital mortality. Empirically based mortality stratification was possible to a considerable extent because of the large sample sizes of the STS and EACTS congenital databases. The resulting scores and categories can be incorporated into case-mix adjustment methods, such as stratification and regression analysis, to compare institutions on a level playing field.

## Appendix 1. Statistical Model for Estimating Procedure-Specific Mortality Rates

Procedure-specific mortality rates were estimated by using a hierachical (random effects) model. For each of the 148 procedures in the analysis, the number of deaths was modeled by using the following binomial distribution:
$yj∼Binomial(nj,πj),j=1,2,…,148,$

where $πj$denotes the unknown theoretical probability of mortality for the j-th procedure, $nj$ denotes the number of patients undergoing the procedure in the database (denominator), and $yj$ denotes the actual observed number of mortalities in the database (numerator). Variation in the theoretical probability of mortality was modeled by assuming the log odds were normally distributed. Thus the model is as follows:
$log(πj/[1−πj])=ηj;$

$ηj∼indN(μ,σ2),$

where $μ$and $σ2$ denote the unknown mean and variance, respectively, of the assumed normal random effects distribution. Parameters of the model were estimated in a Bayesian framework using WinBUGS software. A vague (noninformative) prior distribution was chosen for the parameters $μ$ and $σ2$. The WinBUGS code for this model is available from the authors on request.
As shown in Figure 6, A, there was a high degree of correlation between the Bayesian model–based estimate of a procedure's risk and the simple raw unadjusted mortality percentage; however, several procedures had large discrepancies. The difference between the model-based versus raw estimates decreased with increasing sample size. For procedures with more than 200 cases, the raw and model-based estimates were virtually identical (Pearson correlation coefficient > 0.999; Figure 6).

## Appendix 2. Methodology for Creating Internally Homogeneous Risk Categories

Procedures were first sorted in order of increasing estimated risk (based on the model in Appendix 1) and then grouped into homogeneous categories to create the risk categories. Let $πi$denote the true unknown mortality for the i-th procedure, and let $πˆi$ denote the corresponding estimate. We first sorted procedures so that $πˆ1<πˆ2<⋯<πˆ148$. Let k denote the number of categories and let $ck={c1denote a set of category cut points that partition the categories into k groups. The symbol $cj$ denotes a number between 1 and 148 and represents the index of the highest-risk procedure in the j-th category. Also, define $c0=0$ and $ck=149$. For any particular choice of k and $ck$, within-category homogeneity is measured by the weighted sum-of-squares criterion:
$WSS(ck;π)=∑j=1k∑i=cj−1+1cjni(πi−π¯j)2πi(1−πi),$

where $π¯j=∑i=cj−1+1cjniπi/∑i=cj−1+1cjni$ denotes the average risk of mortality among all procedures in the j-th category. This criterion is similar to one that has been used previously for defining optimum cut points for categorizing a continuous explanatory variable.
• O'Brien S.M.
Cutpoint selection for categorizing a continuous predictor.
The notation $WSS(ck;π)$ is intended to emphasize that WSS is a function of the chosen cut points $ck$ and also depends on the unknown procedure-specific probabilities $πi$. If the $πi$ were known instead of unknown, then the “optimal” cut points could (in theory) be determined by enumerating all possible choices for the $cj$ and choosing the one that minimizes the WSS. Because the $πi$ are unknown, we instead choose cut points that minimize the Bayesian estimate of $WSS(ck;π)$. Specifically, we chose the cut points that minimize the estimated Bayesian posterior mean as follows:
$WSSˆ(ck)=13000∑h=13000WSS(ck;π(h)),$

where $π(h)$ denotes a random draw from the joint posterior distribution of the $πi$'s. Finding the set of cut points that minimizes this quantity exactly is technically challenging and required the use of a novel dynamic programming algorithm (unpublished).
The criterion described above gets smaller as the within-category homogeneity improves. For plotting the change in homogeneity versus k, it is intuitively appealing to use a criterion that increases rather than decreases. The criterion used in Figure 2 (and throughout the article) is defined as follows:
$Homogeneity=1−WSSˆ(ck)/WSSˆ(c1).$

This criterion ranges from 0.0 to 1.0 and increases as the categories become more homogeneous.

## Appendix 3. Completeness of STS Mortality Data

The mortality end point for this study was mortality status at the time of discharge, ie, in-hospital mortality. It was chosen over operative mortality (ie, death prior to discharge or after discharge but within 30 days of surgery) or 30-day mortality status in large part because 30-day status is frequently missing whereas discharge mortality is rarely missing. As shown in Figure 7, the completeness of 30-day mortality status has improved over time. In the future, it may be feasible to adapt the STS-EACTS methodology (or develop a new methodology) to predict the endpoint of operative mortality or 30-day mortality, assuming the completeness of 30-day mortality reporting continues to improve.(Figure 7)

## References

• Lacour-Gayet F.
• Clarke D.
• Jacobs J.
• Comas J.
• Daebritz S.
• Daenen W.
• et al.
The Aristotle score: a complexity-adjusted method to evaluate surgical results.
Eur J Cardiothorac Surg. 2004; 25: 911-924
• Lacour-Gayet F.
• Clarke D.
• Jacobs J.
• Gaynor W.
• Hamilton L.
• Jacobs M.
• et al.
The Aristotle score for congenital heart surgery.
Semin Thorac Cardiovasc Surg Pediatr Card Surg Annu. 2004; 7: 185-191
• Jenkins K.J.
Risk adjustment for congenital heart surgery: the RACHS-1 method.
Semin Thorac Cardiovasc Surg Pediatr Card Surg Annu. 2004; 7: 180-184
• Jenkins K.J.
• Gauvreau K.
Center-specific differences in mortality: preliminary analyses using the Risk Adjustment in Congenital Heart Surgery (RACHS-1) method.
J Thorac Cardiovasc Surg. 2002; 124: 97-104
• Harrell Jr., F.E.
• Caldarone C.A.
• McCrindle B.W.
• Jacobs J.P.
• Williams M.G.
• et al.
Case complexity scores in congenital heart surgery: a comparative study of the Aristotle Basic Complexity score and the Risk Adjustment in Congenital Heart Surgery (RACHS-1) system.
J Thorac Cardiovasc Surg. 2007; 133: 865-875
• Kang N.
• Tsang V.T.
• Elliott M.J.
• de Leval M.R.
• Cole T.J.
Does the Aristotle score predict outcome in congenital heart surgery?.
Eur J Cardiothorac Surg. 2006; 29: 986-988
• O'Brien S.M.
• Jacobs J.P.
• Clarke D.R.
• Maruszewski B.
• Jacobs M.L.
• Walters 3rd, H.L.
• et al.
Accuracy of the Aristotle Basic Complexity score for classifying the mortality and morbidity potential of congenital heart surgery operations.
Ann Thorac Surg. 2007; 84: 2027-2037
• Jacobs J.P.
• Jacobs M.L.
• Maruszewski B.
• Lacour-Gayet F.G.
• Clarke D.R.
• Tchervenkov C.I.
• et al.
Current status of the European Association for Cardio-Thoracic Surgery and the Society of Thoracic Surgeons Congenital Heart Surgery Database.
Ann Thorac Surg. 2005; 80: 2278-2284
• O'Brien S.M.
Cutpoint selection for categorizing a continuous predictor.
Biometrics. 2004; 60: 504-509
• Hanley J.A.
• McNeil B.J.
The meaning and use of the area under a receiver operating characteristic (ROC) curve.
• DeLong E.R.
• DeLong D.M.
• Clarke-Pearson D.L.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.
Biometrics. 1988; 44: 837-845
• Jacobs J.P.
• Mavroudis C.
• Jacobs M.L.
• Maruszewski B.
• Tchervenkov C.I.
• Lacour-Gayet F.G.
• et al.
What is operative mortality? Defining death in a surgical registry database: a report of the STS Congenital Database Taskforce and the Joint EACTS-STS Congenital Database Committee.
Ann Thorac Surg. 2006; 81: 1937-1941