By: Dr. Geoffrey Modest
One factor often not considered in the breast cancer screening algorithm is the accuracy of the pathology report after biopsies. The widespread use of mammograms has led to lots of biopsies (1.6 million women in the US each year), with a reported approx 25% being positive for cancer. From 2011 to 2014 highly experienced pathologists from 8 states reviewed slides and the concordance of their readings was assessed (see JAMA. 2015;313(11):1122-1132).
Results:
–65% of the invited pathologists consented to participate (n=115) and each given 60 slides to review from a total of 240 cases (1 slide/case), including 23 cases of invasive cancer, 73 of ductal ca in situ, 72 with atypical hyperplasia (atypia), and 72 benign cases without atypia
–49% of cases were in women 40-49yo, 50.8% had either heterogeneously dense or extremely dense breast tissue on mammogram, 57.5% were from core biopsies/42.5% excisional
–Pathologist impressions of the slides (all rated 1-5): confident of assessment (1 being very confident) 81% were 2-3; challenge of interpreting (1 being very easy) 78% were 3-4
–The overall concordance rate of diagnostic interpretations between the pathologists was 75.3% (CI 73.4%-77.0%), or 5194 of 6900 interpretations
–Higher discordance was found in women with higher breast density on mammograms (73%) vs lower (77%) (p<0.001) and in pathologists who interpreted lower weekly case volumes (p<0.001), or worked in smaller practices (p=0.034), or nonacademic settings (p=0.007)
–There was also variation depending on the diagnosis:
–benign without atypia (number of interpretations = 2070): concordance rate was 87%, overinterpretation rate 13% (most of the incorrect ones were read as atypia, some DCIS, but several read as invasive carcinoma)
–atypia (number of interpretations = 2070): concordance rate was 48%, overinterpretation rate 17%, underinterpretation rate 35% (most of the incorrect ones were read as DCIS or benign, a few as invasive carcinoma)
–DCIS (number of interpretations = 2097): concordance rate was 84%, overinterpretation rate 3%, underinterpretation rate 13% (most of the incorrect ones were read as benign or atypia, but quite a few as invasive carcinoma)
–invasive ca (number of interpretations = 663): concordance rate was 96%, underinterpretation rate 4% (most of the incorrect ones read as DCIS, but there were a few read as benign)
So, there are obvious issues here. It is quite disturbing the degree of variability here, with even totally benign tissue occasionally read as invasive carcinoma and vice versa. Though there are a few caveats. Only one slide per case was available, and in real practice if a question arose, the pathologist would likely check other cuttings/slides. Also, in real life the pathologist might have asked a compatriot to give their opinion. But this study shows that there is significant potential here for errors, creating unnecessary extreme anxiety, incorrect and potentially devastating therapy (chemo,radiation and surgery), or, on the other hand, missing a potentially curable lesion. I think the lessons here are:
–It is reasonable and appropriate for us in primary care to question the pathologist interpretations (ie, they are not always accurate, and i think at least many of us do incorrectly consider them to be a gold standard)
–And perhaps there should be structural changes to how slides are read:
–it seems reasonable to ask pathologists to somehow rate the certainty of their interpretation when they submit a diagnosis (ie, my guess is that the accuracy was higher when the pathologist looked at a slide and felt certain of their interpretation, which happened rarely in the above study)
–perhaps pathologists should regularly look at more than one slide per biopsy, to potentially give them a better ability to self-monitor and self-correct their first interpretation
–perhaps there should be a routine system of getting second opinions on the slide reading