When should good papers be retired from the scientific record?
This clinical study from the 90s continues to influence knee surgery today: Why we should retire it gracefully.
In 1999, this study on reconstruction of the anterior cruciate ligament in the knee was cutting edge, and the latest information for surgeons in making decisions about the material they should use as a graft. At the time, the question was topical, the methodology was reasonable and the findings were influential across the globe.
In a first attempt of hopefully many, to bring more rigour to orthopaedic and musculoskeletal literature, I looked at one of the most cited papers from my own neck of the woods (Australia) and put it under the microscope. Here is the citation for those playing at home;
Pinczewski, L. A., Lyman, J., Salmon, L. J., Russell, V. J., Roe, J., & Linklater, J. (2007). A 10-Year Comparison of Anterior Cruciate Ligament Reconstructions with Hamstring Tendon and Patellar Tendon Autograft. The American Journal of Sports Medicine, 35(4), 564–574. https://doi.org/10.1177/0363546506296042
Having been inspired somewhat by the discourse of forensic metascience on social media and the release of an online textbook outlining the basics, I extended the approach to be somewhat more holistic with respect to quality - short of performing a formal methodology assessment (luckily someone had already done one, of sorts).
Good for the time
In the 1990s, the research landscape in clinical orthopaedics was quite different from what is available (but often not used) today. Registering trial methods and protocols prior to study commencement was non-existent, reporting guidelines were yet to be developed and the statistical methods to deal with many of the complexities faced in analysing clinical data had yet to be developed or were not yet broadly available.
The gold standard is a rolling stone
Using contemporary reporting methodology as a guide, I re-examined this influential study, not to attempt to discredit the authors, but rather to demonstrate how this study should be interpreted today given the evolution of quality metrics over the last two or three decades since its original inception.
You can have a look at the full report here but in short I performed the following;
Extracted the (official) citation in full
Retrieved the list of articles citing it from CrossRef
Pinged the RetractionWatch database for;
The citation identifier (doi)
Each author surname
Conducted a (quick) review of citation sentiment from articles citing this paper in the last year or two
Worked through each element of the relevant reporting checklist (recently updated) as they applied to the paper and its reporting
Attempted to verify elements of the reported data using forensic techniques
It’s important to keep in mind that I reviewed the article in question from a lens of what we know now about trial methodology, statistics and even the outcomes of ACL (anterior cruciate ligament) reconstruction, which, in its defence, this article pioneered. It’s also important to note that when patient recruitment for this trial began, the Russian Federation was experiencing a crisis of government under Boris Yeltsin, the UN mission in Somalia was having a day of days and Michael Jordan was retiring from the NBA for the first time.
In the followup to this post and analysis, I will sketch out what a research strategy could look like for the same question given the methods guidance and good science practices we have available to us today. Stay tuned.
The verdict
Stacked against contemporary recommendations, the selected paper (and its related reports) falls short of what would be considered a solid trial today. While its always easy to flag issues with any piece of research in hindsight, this particular body of work displays weaknesses in the following areas, that if addressed, could change our understanding of the problem. Apologies to some in advance for the more technical language, but if you would like to see more information, jump into the report linked above;
Lack of clearly structured question with a specified estimand and pre-specified clinically relevant margins for superiority (or inferiority)
No discussion regarding sample size | power calculation
A lack of clinical equipoise in comparing the two available graft options (uncertainty about which graft is better)
Failure of the randomisation process - patients were assigned to treatment groups based on when the patient presented for review, rather than true randomization, which could bias results toward the surgeon's preferred (new) technique
Inconsistencies in the overall inclusion/exclusion criteria
Validity of key instruments for all patients in the sample (patient-reported outcomes not yet validated for children)
The evolution of inclusion/exclusion criteria for analyses across multiple follow up time points
The analytical approach would be drastically revised given contemporary recommendations
Level of reporting for certain methodological elements
The lack of open science practices
Pre-registration
Protocol publication
Overall, the paper in today’s context provides an uncertain answer to the original question, while the sentiment of contemporary citations shows its diminishing influence as a key article in this space. Nevertheless, we may need to consider the issues this exercise raises more broadly in the field.
Graceful Evolution
If you’re in anyway connected to academia or consuming medical research, you may be aware of the issues facing the area - namely the increased rate of retractions from the scientific record and the threat of paper mills polluting databases worldwide. If if you’re not familiar, maybe jump off from here;
But to be very clear, that’s not what is happening here. Rather than retraction, which is to correct the scientific record from inputs that should not have been part of the conversation to begin with, what I’m suggesting is a graceful retirement of meaningful contributions that have since been superseded by improved methods, as they become less reliable or relevant to the contemporary knowledge. Part of the cycle of science is that newcomers expend a lot of time (and energy) to establish their credentials by critiquing what has come before and building on existing knowledge with better ways of doing things, with sharper analyses and more accurate instruments. However, when we discuss or review existing articles, there isn’t really an accessible way of determining where a given paper sits in the broader sense without intensely reviewing it and conducting certain assessments for quality. If we are concerned about all science being largely ignored by the clinicians that we expect to act on our findings, we need to make flagging what can be useful and what may have fallen out of good practices much more accessible. By being upfront and open about the evolution of methods and standards over time, we can aid consumers of research to more efficiently appraise what articles can adequately contribute to answering their questions on a given day. The added advantage is we have a clear roadmap to targeting areas that require replication or reproduction.1
Nice idea…but how?
What I’m not suggesting here is that papers need to disappear from the record. When we consume research either in the research process, clinical guidelines or organisational policy, the last thing that the system can manage (or our sanity can handle) is the subsequent removal of articles from the searchable record. It would also be extremely difficult to define at what point an article should be removed for being “out of touch”.
So what instead? Well, some of these things are already in place - mostly in existing clinical research reporting guidelines like CONSORT, SPIRIT or STROBE. Reporting key dates (trial protocol, study recruitment commencement/close, analysis) within the study record is an important element to establishing where a body of work exists relative to when relevant reporting guidelines were published. By adding fields into databases that deliver literature to the public, papers could easily be placed in context at a glance as to whether they pre-date key guidelines. Research that has been superseded clearly by new methods could be flagged, as well as articles where key dates pre-date current accepted standards.
Researchers could be more mindful in their reporting of certain methods in articles around dating and versioning the instruments used, as well as the analytical frameworks employed. Better funding for replication and reproducibility efforts in general, but particularly in the musculoskeletal and surgical space, will help maintain bridges between historical references and the contemporary knowledge, while building trust amongst research consumers that the information available is of sufficient quality for clinical use.
These steps could feed into further cultural change in this space, where questioning methods becomes more acceptable, while maintaining the established authority of the knowledge that has come before. Research is not necessarily good or bad when it is performed in good faith with the best methods available at the time. The issue of research being conducted that ignores best practice is another topic entirely. By acknowledging that work may have a shelf life, we can encourage an improved research culture that aims to publish lasting bodies of work that will stand the test of time and seeks to identify gaps that can be revisited and improved upon.
What next
In the free-flowing thoughts through this process and given I’m trying to appeal to as broad a base as possible, there are a few items now on the to-do list;
Sketch out a hypothetical program to address the question of graft material in knee surgery based on contemporary frameworks
Expand and define more of the technical concepts touched on here
Talk more about publishing, funding and the different interest groups in this space
Dive into some of the analysis aspects - applying good methods to this field specifically
The methodological literacy amongst clinicians and clinical-researchers in the orthopaedics | musculoskeletal space, with particular attention to recent papers on this topic2
If any of this seems of interest - let me know in the comments what you would like to see next.

