TOMAS-R: A template to identify and plan analysis for clinically important variation and multiplicity in diagnostic test accuracy systematic reviews
Diagnostic and Prognostic Research volume 6, Article number: 18 (2022)
The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (DTA) provides guidance on important aspects of conducting a test accuracy systematic review. In this paper we present TOMAS-R (Template of Multiplicity and Analysis in Systematic Reviews), a structured template to use in conjunction with current Cochrane DTA guidance, to help identify complexities in the review question and to assist planning of data extraction and analysis when clinically important variation and multiplicity is present. Examples of clinically important variation and multiplicity could include differences in participants, index tests and test methods, target conditions and reference standards used to define them, study design and methodological quality. Our TOMAS-R template goes beyond the broad topic headings in current guidance that are sources of potential variation and multiplicity, by providing prompts for common sources of heterogeneity encountered from our experience of authoring over 100 reviews. We provide examples from two reviews to assist users. The TOMAS-R template adds value by supplementing available guidance for DTA reviews by providing a tool to facilitate discussions between methodologists, clinicians, statisticians and patient/public team members to identify the full breadth of review question complexities early in the process. The use of a structured set of prompting questions at the important stage of writing the protocol ensures clinical relevance as a main focus of the review, while allowing identification of key clinical components for data extraction and later analysis thereby facilitating a more efficient review process.
Systematic reviews are widely recognised as the best way of summarising current evidence on a particular research question . To be clinically relevant, systematic reviews need to have a clear research question and pre-specified review methods based on a detailed understanding of both clinical pathway and clinically important issues within the review question [2, 3]. Diagnostic test accuracy (DTA) systematic reviews can have additional complexities compared to intervention systematic reviews. These arise from all parts of the review but frequently occur due to the inclusion of multiple index tests, reference standards and sometimes multiple test thresholds . In addition the variation in participants and their disease state may be greater than is found in intervention reviews, as DTA reviews can include a range of disease severity and participant groups.
Most DTA systematic reviews have clinical and statistical complexities that require careful and robust planning to allow pre-specification of analysis and to avoid additional data extraction at a late stage in the review because data and analysis complexities were not identified during protocol development. In particular, taking proper account of complexity in the data structure is important for appropriate statistical analysis . The Cochrane Screening and Diagnostic Tests Methods Group provides a range of resources to assist in the preparation of a DTA systematic review . The Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy , supported by Cochrane DTA online learning modules (https://methods.cochrane.org/sdt/dta-author-training-online-learning), includes help on how to develop a systematic review question, in terms of population, target condition, index test(s) and reference standard(s) within the context of a clinical pathway. The Cochrane Review Manager software is a review authoring tool that includes a template for writing a DTA systematic review protocol, with prompts for the key areas which need defining in a review.
The TOMAS-R template (Template of Multiplicity and Analysis in Systematic Reviews) is intended to ensure the clinical relevance of a systematic review and to enable a more efficient review process. TOMAS-R goes beyond the broad topic headings provided in current guidance, providing a structured format with prompting questions, to help identify complexities in the review question and to assist planning of data extraction and analysis when clinically important variation and multiplicity is present. A thorough understanding of any inherent complexities and a clear plan for dealing with them are important to maintain the clinical relevance of the review and to understand heterogeneity in the evidence base. The template is intended to be used at the important stage of writing the protocol, with the aim of increasing reliability and efficiency at later stages of the review process. In our experience, failure to identify the full breadth of review question complexities early in the process causes considerable additional work, compromising efficiency.
We provide this template and guidance with examples from two DTA systematic reviews with an aim to enhance the quality and consistency of DTA protocols and reviews. The TOMAS-R template is intended to be used alongside existing guidance for DTA reviews provided in the Cochrane Handbook.
Our objective is to provide a template to help review authors identify the critical sources of clinically important variation and multiplicity in a DTA review question, and to consider the implications for data extraction and analysis. This template aims to facilitate communication between methodological, clinical and patient/public review team members, to ensure important clinical complexities are identified at the start of a review, ideally during protocol development.
We developed this template based on our experience as authors and reviewers of more than 100 DTA reviews. The tool was piloted by colleagues with expertise in methodology, statistics and systematic reviews, both individually and in seminars (e.g. Test Evaluation Research Group seminar at University of Birmingham 2016) and workshops (Systematic reviews and meta-analyses of diagnostic test accuracy in 2017, 2018, 2019), and using examples of reviews in different clinical areas. The template was piloted and adapted using two reviews (SM) and was also employed as a peer review tool for a series of Cochrane DTA reviews (SM 2017 to 2019). We elicited feedback as edits on the template, verbally and in email feedback on our proposed uses for the template, and ways to improve its usefulness. The author group refined the template and elaboration during article preparation and additional suggestions raised by article reviewers during the peer review were incorporated.
Feedback on the first draft related to (a) content: addition of variation by risk of bias (QUADAS2); addition of prior symptoms and prior treatment as part of participant characteristics; adaption to allow multiple diseases (b) presentation: headings; ordering of sections (c) explanation to new users: presentation of example templates; explanations; modification of examples; improvements to wording (d) intended use: consideration of patient involvement; consideration of how the template might be used for different types of review (intervention, prognostic, exploratory, scoping); how the template could be used in peer review; to record differences between the protocol and the final review.
The TOMAS-R template is based on the recognition that although every review is different, there are common issues that often underlie important clinical differences and variations affecting the clinical applicability of a systematic review. We recommend TOMAS-R should be used for protocol development during and after initial discussions with clinical colleagues to identify the complexities of the review question, objectives and study eligibility criteria, and after some scoping searches of the literature have been performed. We recommend that one or two example primary studies likely to be included in the review are used alongside TOMAS-R to generate and guide discussion.
The tool sets out five steps to be followed across four key domains for any DTA review: participants, index test(s), target condition and study design. We illustrate each domain using worked examples from two DTA systematic reviews, one on rapid tests for diagnosis of typhoid , and one on biomarker tests in ovarian cancer , supplemented with reference to other reviews to illustrate specific points. The five steps are set out in Table 1, and Table 2 presents a full template example for the review of rapid diagnostic tests for detection of typhoid. A blank template table is provided in supplementary materials (Table S1).
Step 1: Summary and review objectives
In step 1 of TOMAS-R, a summary section lists the main review question headings, which allows the title, primary and secondary objectives of a review to be recorded, including a broad outline of participants, index test(s), target condition and study design.
Step 2: Scoping potential complexities
At step 2 each of the four domains is considered in turn, in order to identify and record sources of complexity. A number of subsections representing key sources of possible variation and multiplicity are suggested for each domain, each featuring a prompt to discuss whether it applies to the current review question.
This template could also be used to identify and record how the scoping of the review is affected by the purpose of a review and the funder. For example, reviews commissioned by the National Institute for Health and Care Excellence, the World Health Organisation, the National Institute for Health Research (NIHR) or published by Cochrane, may have a different focus.
Domain 1: Participants
For participants in a study, the template in Table 2 highlights three important components for review scoping (1) the clinical pathway and setting, including prior tests, comorbidities and geographical region (2) the severity of disease and (3) participant demographics.
The point on the clinical pathway at which a test is used in patient management affects the composition of the participant group receiving the test, largely because the quantity and type of tests a person receives before the index test modifies the likelihood of having the target condition. For example, tests to detect typhoid can be used both in people with an a priori clinical suspicion of enteric fever and in those with fever but without any clear suspicion of typhoid. Similarly, geographical location and the level of disease endemicity of participants were identified as potentially important to understand the applicability of study results. Geographical region and the level of endemicity influences the background level of typhoid amongst competing infectious agents potentially causing fever and can also distinguish the type of bacterial infection underlying typhoid.
In the review of tests for ovarian cancer, scoping suggested that the CA125 test was likely to perform differently in pre- and post-menopausal women. As menopausal status can be established in a simple patient history or approximated by age, it was important to provide separate estimates of accuracy by menopausal status. This required separate data extraction of results by menopausal status, and in this review, exclusion of studies where separate results were not available. Analysing results separately according to disease severity corresponding to cancer stage was also determined likely to be clinically relevant; however, few studies identified during scoping provided separate results by stage, so separate data extraction and analysis was not attempted.
Domain 2: Index test(s)
The potential for variation in the index test is common. While a review question is usually focussed on the accuracy of a generic diagnostic test type (for example ‘rapid diagnostic tests’ for detecting enteric fever), in reality many different tests may exist within a generic class of tests for a specific purpose. How we define what constitutes a similar enough test to allow clinically meaningful grouping, and which variations in the test should be analysed separately, are integral to producing aggregate estimates of test accuracy that answer the systematic review objective and are clinically useful, generalisable and methodologically valid.
TOMAS-R highlights and provides prompts (Table 2) for three common potential causes of variation and complexity in the review index test(s): (1) different types of index tests; (2) different methods (including differences in test versions, manufacturers, sampling methods, staff training, treatment of inconclusive test results or methods used to assist test interpretation); and (3) different thresholds to define a positive index test result.
In the typhoid review, scoping identified several different rapid tests in use including three main commercial tests. Test methods were different between studies including variations in: manufacturer test versions; samples used; and index test thresholds for one test. For the purpose of the review, it was considered important to summarise each commercial test separately because the assay formats were different (ELISA, lateral flow, magnetic bead), and differences in the type of antibody detected meant that tests would have different time spans of detection post infection (IgM or both IgM and IgG); however, variations within the same commercial brand of test were grouped together. For the KIT test, the most clinically relevant threshold was identified as greater than 1. In the protocol, it was recognised that rapid tests could use either blood or urine samples; data extraction planned to record the type of test sample to allow separate presentation, if sufficient results were available.
In the ovarian cancer review, tests were grouped by the biomarker type, for example HE4 biomarker, with different commercial tests analysed together in the same group as the review focussed on identifying which biomarkers were potentially useful, rather than which specific test brand was the most accurate. Tests used different biomarker thresholds to define a positive test result, so the review focussed on a small number of pre-specified commonly used test thresholds for each biomarker. Data extraction was limited to results using thresholds within a small range of values around the pre-specified test thresholds. At these pre-specified test thresholds, average sensitivity and specificity were estimated using meta-analysis methods based on a single result per study. In future reviews, if newer meta-analysis methods that allow multiple thresholds from each study to be combined in a single analysis are planned [9, 10], then all thresholds would need to be extracted.
Domain 3: Target condition
Differences in how the presence of the target condition (or disease) is defined can vary between studies, affecting measurement of diagnostic accuracy. The potential for variation and complexity in the review target condition is influenced by four components: (1) different types of target condition, (2) different reference standards, (3) different severities of the target condition (reference standard thresholds) and (4) differences in the time interval between the index test and reference standard.
In the systematic review of typhoid tests, there are two different organisms that can cause enteric fever, typhoid caused by Salmonella typhi and paratyphoid caused by Paratyphi A. Ideally test accuracy would be examined separately for each type of typhoid, however this was not expected to be possible due to small numbers of studies examining these forms of typhoid separately. From scoping, three main reference standards were identified and preferentially ranked for analysis: bacterial culture using samples from (1) bone marrow culture; (2) blood culture; or (3) blood sample PCR assays which in some studies were interpreted in combination with bacterial culture from blood samples. If a study reported data for an index test against more than one reference standard, all data were extracted. This enabled comparisons between index tests to be restricted to studies using the same reference standard.
In the ovarian cancer review, two target conditions are recognised as malignant and borderline disease. The focus of the review question was to identify women as having disease defined as either malignant or borderline, compared to no disease defined as benign. Scoping identified that primary studies considered borderline disease in different ways; some studies grouped borderline with malignant disease, others grouped borderline with benign and some studies excluded women with borderline disease. Consequently, separate data extraction for different definitions of target condition was planned, allowing a focus on studies that were most applicable to the review, and investigation of how study results were affected by different reference standard choices.
The ovarian cancer review included studies with different time intervals between the index test and reference standard, affecting results for women where clinical follow-up was the reference standard. Clinical follow up is the reference standard for women who do not attend surgery for ovarian disease, and therefore do not have a histology reference standard. Differences in clinical follow up needs consideration as this can affect test accuracy.
Domain 4: Study design and methodological quality
Study design and quality can affect which study results are considered appropriate to combine in a systematic review. The QUADAS-2 tool is the internationally recognised tool to assess the methodological quality (both risk of bias and applicability) of DTA studies . QUADAS-2 suggests study quality information can be integrated directly into the review analysis, by including a meta-analysis of studies providing the strongest evidence (lowest risk of bias, highest applicability). QUADAS-C, the risk of bias tool for comparative diagnostic accuracy studies , can be used similarly.
For the study design domain, Table 2 identifies three types of variation between studies: (1) unit of analysis (2); risk of bias ratings from QUADAS-2/QUADAS-C ratings or individual sources of bias; and (3) applicability ratings from QUADAS-2.
Study results in a systematic review can refer to participants, samples, lesions, organs, images or hospital visits. The unit of analysis identifies who/what the results refer to, for example whether the test accuracy results are reported using the number of participants or, if a participant can have more than one image, the number of images. Sometimes a systematic review will include studies with results using more than one unit. For example, an imaging test to identify polyps in the colon could report the accuracy to identify a person with polyps, or the accuracy to identify a polyp .
Accuracy per participant is important if the aim of the test is to identify the right patients for further tests and interventions. Accuracy per polyp is important for tests such as colonoscopy, which aim to identify and at the same time treat polyps, to understand if all relevant polyps within a patient would be treated. In a review estimating the accuracy of CT colonoscopy, per polyp analyses were based on polyp size (large, medium, all size), pre-specified from clinical guidelines according to treatment recommendations, so data were extracted and reported by polyp size .
In the two example reviews of typhoid and ovarian cancer, it is only clinically relevant to consider test accuracy based on participants as blood tests can only provide results across all potential disease sites within a patient.
In both the typhoid and ovarian cancer reviews, the QUADAS-2 signalling question about study design was used to understand how a key potential source of bias might affect results with planned heterogeneity analysis and presentation based on study design being case-control or not case-control.
Step 3: Simplifying a review
The aims of step 3 are to simplify a review, by combining complexities within an analysis where possible without compromising clinical relevance, and to enable more efficient planning of the review. Decisions and the reasons underlying them are recorded in the column ‘step 3’ of Table 2. Identifying groups of participants, index tests or target conditions where it is essential to have separate analysis requires good communication between members of the review team with clinical and methodological expertise.
At the same time, it is important to minimise the number of separate main analyses, or the review can quickly become a descriptive analysis of individual studies. Investigations of heterogeneity, sensitivity analyses and graphical presentation of data are other useful ways of exploring and understanding the effects of different aspects of the complexity of a review. Some elements of complexity may not be considered clinically relevant to a particular review so that it is not necessary to present data separately in graphs or analyses, while other sources of clinical variability may be important to retain.
We recommend a flowchart of studies is used to identify how different review questions are answered depending on how complexity is combined or separated in subgroups. As studies are subdivided into separate subgroups for meta-analysis, the question answered by the meta-analysis is different. We present flowcharts for our two example reviews (Figs. 1 and 2).
For the typhoid review, the clinical members of the review team deemed it important to investigate diagnostic test accuracy separately for each of the main commercial tests, simplifying the review by combining different versions of the same test. For the Test-It typhoid test where there were two thresholds, separate analysis was required at each test threshold so that no simplification of thresholds was possible. Other sources of variation can be presented graphically, or where there were sufficient studies as heterogeneity or sensitivity analyses.
Step 4: Planning data extraction
The aim of step 4 is to identify if any complexity in the data affects data extraction. Using a separate column ensures that discussion between methodologists and clinical experts consider and record all these decisions during the review planning.
We recommend researchers design and pilot a standardised data extraction sheet with explanations of what should be extracted and how missing information and inconclusive test results will be handled.
Data extraction can be speeded up and made more consistent if all team members understand and use the same methods so only clinically important categories are extracted separately. Common data extraction issues arise when studies report test results at multiple time points or at multiple test thresholds. If a study reports 20 different results, it is possible that not all thresholds or time points are relevant to the review question.
In the ovarian cancer review, studies used different reference standard thresholds. Data extraction was completed using a priority order to reflect the most important results for achieving the review aims. We also speeded up our data extraction by deciding to extract results only for commonly used and clinically relevant index test thresholds. Data were not extracted where the index test threshold was not reported as these results cannot inform clinical practice. In the typhoid review, data extraction was simplified based on pre-specification of the reference standard according to standard definitions in the typhoid literature, as grade 1 or grade 2 and included pre-specified rules on how multiple tests within the reference standard would be considered.
Step 5: Planning presentation and analysis of data
Once decisions have been made on simplifying a review (step 3, Fig. 1 and Table 2) and which data to extract (step 4, Table 2), planning the analysis follows from these decisions, and the practical realities of the number of studies in analysis groups and subgroups.
The TOMAS-R template includes a column to record the planned presentation and analyses for each issue raised in the review using the column ‘step 5’ of the TOMAS-R template (Table 2). As in other statistical analyses, primary and secondary outcomes need to be identified. Some of these will include analysis if there are sufficient data, but some outcomes may focus on graphical display of data. The Cochrane DTA Handbook includes details of methods for meta-analysis and how to present the results (e.g. displaying summary points (with confidence and prediction regions on SROC plots) and investigation of heterogeneity. Example software code is provided for different statistical software packages [5, 14].
Choosing presentation and analysis
There are three main types of analysis in DTA reviews: (1) meta-analysis of a single index test; (2) meta-analysis to compare the accuracy of two or more index tests; and (3) investigation of heterogeneity. The first two types of analysis are usually the primary analyses. Presentation of data alongside the analysis facilitates clarity and transparency and this may be done graphically or in a tabular format as appropriate. Network meta-analysis methods are available but currently not widely used [15, 16].
Where index tests are compared, the strongest evidence is based on a direct comparison within the same study, either where both tests are completed on the same participants (paired study data), or where participants with the two tests are as similar as possible, i.e. participant is randomised to each test [17, 18]. It is important that an analysis plan states whether comparisons of tests will be based on direct comparisons using comparative accuracy studies or on all available data, including data from studies that assessed only one of the index tests (indirect comparisons).
Presentation of data in a review typically includes graphical display of results using SROC and forest plots. Both graphs allow both sensitivity and specificity results to be displayed, with forest plots providing a clearer display of 95% confidence intervals (CIs) when there are a large number of studies. Although 95% CIs can be displayed on SROC plots, once there are several overlapping studies, the plot becomes overcrowded and unclear. Paired study results can be displayed in an SROC plot with a line linking results from the same study.
Guidance on investigation of heterogeneity and sensitivity analysis
Investigation of heterogeneity is used to determine how test accuracy varies with clinical and methodological characteristics, whereas sensitivity analysis is used to understand how robust the main study results are to decisions made during the review process.
To understand whether study characteristics affect study results, investigations of heterogeneity can be performed. Graphical displays of subgroups in SROC or forest plots allows visual inspection for potential heterogeneity. This is particularly important when it is not possible to statistically investigate heterogeneity due to the inclusion of a small number of studies in the primary meta-analyses. In a heterogeneity analysis involving a categorical variable, the dataset will consist of non-overlapping subgroups which may be statistically compared in a meta-analysis (meta-regression) or an analysis is performed for each subgroup separately (subgroup analyses). This contrasts to sensitivity analyses where meta-analysis is repeated using a subset of studies, in order to assess the robustness of the findings to assumptions made during the review process. Both heterogeneity and sensitivity analyses should be pre-planned in the review protocol.
In the typhoid review, investigation of heterogeneity analysis was planned to examine the role of nine study characteristics including disease endemicity of typhoid, geographical region and index test format. However, there were insufficient studies to examine any of these in a statistical analysis, although an SROC graph was used to display studies according to type of reference standard and study design.
By contrast, the typhoid review included a sensitivity analysis restricted to studies of the rapid test typhidot where there was a low bias expected from inconclusive test results, caused by conflicting results from IgG and IgM antibodies. The ovarian cancer review was only able to complete planned heterogeneity analyses comparing studies including borderline results as part of the reference standard, as opposed to studies either unclear or specifically excluding borderline test results.
Guidance on index test thresholds in meta-analysis
A common mistake in DTA reviews that compromises the clinical relevance is to combine test results across very different thresholds for defining a positive test result by using methods that allow only one threshold per study for the estimation of an average sensitivity and specificity. Results combined across very different thresholds in this way do not give a result that can be interpreted at any clinically relevant threshold, but correspond to an average result reflecting how often different thresholds are reported. For example, in the typhoid review, it is important not to combine results from the two thresholds of the Test-It test.
Therefore, the choice of a meta-analysis method depends on the type of data available and the focus of interest. If studies report a common threshold, estimating an average sensitivity and specificity (summary point) at that threshold is appropriate. However, if studies report different thresholds, estimating a SROC curve across different thresholds by including one threshold per study is more appropriate. If some or all of the studies report more than one threshold, more complex methods that produce SROC curves across the thresholds as well as estimates of average sensitivity and specificity at specific thresholds can be used to make the most of the available data as well as to identify a relevant threshold that meets a desired level of test performance [9, 10]. The DTA Cochrane handbook provides guidance on data extraction  and meta-analysis with multiple test thresholds .
Including TOMAS-R in systematic review protocols
TOMAS-R is suitable as a tool to guide planning in a review and to maintain communication within a team, but also to provide a clear summary table of review planning for inclusion in a systematic review protocol. Clearly, it is not possible to plan for all eventualities in a review protocol, and TOMAS-R could also be used to report changes between the protocol and final review.
DTA systematic reviews require careful planning to enable them to address clinical objectives in an informative way. Careful planning is facilitated by a structured approach, particularly in DTA reviews where there is often considerable complexity due to variations between studies.
TOMAS-R is a template to allow structured planning with prompts to identify sources of complexity identified as common in DTA systematic reviews. In this article we have described how this template can be used during protocol development for planning DTA reviews. We anticipate this template will enhance the quality and consistency of protocols by providing a structured approach, similar to tools and checklists already in use, such as reporting guidelines and risk of bias tools. An earlier version of this template has been adapted for prognostic reviews , using terminology used in prognostic SRs. A blank template table is provided in supplementary materials (Table S1).
The template can also be used for reporting what was done in a review and changes between the protocol and the review. In addition, we have also found the template is useful for peer review of DTA and prognostic reviews, either at the protocol or full review stage.
As with other checklists and tools in medical research, TOMAS-R and its guidance will require updating as methods for diagnostic accuracy studies develop and further validation is undertaken. We recommend downloading the latest version of TOMAS-R and accompanying guidance, including detailed examples, from the OSF open repository site (https://osf.io/).
Availability of data and materials
Cancer antigen 125
Diagnostic test accuracy
Enzyme-linked immunosorbent assay
Epithelial ovarian cancer
General practitioner (primary care clinician)
Human epididymis protein 4
National Institute for Health Research
Ovarian cancer test OVA1
Polymerase chain reaction
Quality assessment of diagnostic accuracy studies
Quality assessment of diagnostic accuracy studies comparison
Rapid diagnostic tests
Receiver operating characteristic
Summary receiver operating characteristic
Template of Multiplicity and Analysis in Systematic Reviews
Table of study characteristics
Oxman AD, Guyatt GH. The science of reviewing research. Ann N Y Acad Sci. 1993;703:125–33 discussion 133-124.
Bossuyt PMM. Chapter 4: Understanding the design of test accuracy studies. In: Deeks JJ, PMMB, Leeflang MMG, Takwoingi Y, editors. 2022.
Deeks JJ, Wisniewski S, Davenport C. Chapter 4: Guide to the contents of a Cochrane Diagnostic Test Accuracy Protocol. In: Deeks JJ, Bossuyt PMM, Gatsonis C, editors. 2013.
Leeflang MMG, Davenport C, Bossuyt PMM: Chapter 6: Defining the review question. In: Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Edited by Deeks JJ, Bossuyt PMM, Leeflang MMG, Takwoingi Y. London: Cochrane; 2022.
Macaskill P, Takwoingi Y, Deeks JJ, Gatsonis C. Chapter 10: Understanding meta-analysis. In: Deeks JJ, PMM B, Leeflang MMG, Takwoingi Y, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. London: Cochrane; 2022.
Cochrane Screening and Diagnostic Tests Methods Group (SDTM). https://methods.cochrane.org/sdt. Accessed 21 July 2022.
Wijedoru L, Mallett S, Parry CM. Rapid diagnostic tests for typhoid and paratyphoid (enteric) fever. Cochrane Database Syst Rev. 2017;5:CD008892.
Rai N, Champaneria R, Snell K, Mallett S, Bayliss SE, Neal RD, et al. Symptoms, ultrasound imaging and biochemical markers alone or in combination for the diagnosis of ovarian cancer in women with symptoms suspicious of ovarian cancer. Cochrane Database Syst Rev. 2015. https://doi.org/10.1002/14651858.CD011964.
Steinhauser S, Schumacher M, Rucker G. Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies. BMC Med Res Methodol. 2016;16(1):97.
Jones HE, Gatsonsis CA, Trikalinos TA, Welton NJ, Ades AE. Quantifying how diagnostic test accuracy depends on threshold in a meta-analysis. Stat Med. 2019;38(24):4789–803.
Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. Group Q-: QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.
Yang B, Mallett S, Takwoingi Y, Davenport CF, Hyde CJ, Whiting PF, et al. QUADAS-C: A Tool for Assessing Risk of Bias in Comparative Diagnostic Accuracy Studies. Ann Intern Med. 2021;174(11):1592–9.
Halligan S, Altman DG, Taylor SA, Mallett S, Deeks JJ, Bartram CI, et al. CT colonography in the detection of colorectal polyps and cancer: systematic review, meta-analysis, and proposed minimum data set for study level reporting. Radiology. 2005;237(3):893–904.
Takwoingi Y, Dendukuri N, Schiller I, Rücker G, Jones HE, Partlett C, et al. Chapter 11: Undertaking meta-analysis. In: Deeks JJ, Bossuyt PMM, Leeflang MMG, Takwoingi Y, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. London: Cochrane; 2022.
Ma X, Lian Q, Chu H, Ibrahim JG, Chen Y. A Bayesian hierarchical model for network meta-analysis of multiple diagnostic tests. Biostatistics. 2018;19(1):87–102.
Owen RK, Cooper NJ, Quinn TJ, Lees R, Sutton AJ. Network meta-analysis of diagnostic test accuracy studies identifies and ranks the optimal diagnostic tests and thresholds for health care policy and decision-making. J Clin Epidemiol. 2018;99:64–74.
Takwoingi Y, Partlett C, Riley RD, Hyde C, Deeks JJ. Methods and reporting of systematic reviews of comparative accuracy were deficient: a methodological survey and proposed guidance. J Clin Epidemiol. 2020;121:1–14.
Takwoingi Y, Leeflang MM, Deeks JJ. Empirical evidence of the importance of comparative studies of diagnostic test accuracy. Ann Intern Med. 2013;158(7):544–54.
Dinnes J, Deeks JJ, Leeflang MMG, Li T. Chapter 9: Collecting data. In: Deeks JJ, Bossuyt PMM, Leeflang MMG, Takwoingi Y, editors. Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. London: Cochrane; 2022.
Halligan S, Boone D, Bhatnagar G, Ahmad T, Bloom S, Rodriguez-Justo M, et al. Prognostic biomarkers to identify patients destined to develop severe Crohn's disease who may benefit from early biological therapy: protocol for a systematic review, meta-analysis and external validation. Syst Rev. 2016;5(1):206.
We would like to thank all those who trialled the template and provided feedback during workshops or informally.
SM receives funding from the NIHR and the NIHR UCL/UCLH Biomedical Research Centre. YT receives funding from the NIHR through an NIHR Postdoctoral Fellowship and is supported by the NIHR Birmingham Biomedical Research Centre. LFR is supported by York Health Economics Consortium (YHEC). JD is supported by the NIHR Birmingham Biomedical Research Centre. This paper presents independent research supported by the NIHR Birmingham Biomedical Research Centre at the University Hospitals Birmingham NHS Foundation Trust and the University of Birmingham. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Consent for publications
Ethics approval and consent to participate
All authors have worked extensively on various aspects of many Cochrane DTA reviews, and also peer reviewed published Cochrane DTA protocols and full reviews. JD, SM and YT are members of the Cochrane DTA Editorial Board, and the Cochrane Screening and Diagnostic Tests Methods Group, and have provided training for Cochrane and other DTA review authors. JD, SM and YT are also authors of chapters in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. In addition, YT is an Associate Editor of the Handbook, and an editor in the Cochrane Infectious Diseases Group.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Mallett, S., Dinnes, J., Takwoingi, Y. et al. TOMAS-R: A template to identify and plan analysis for clinically important variation and multiplicity in diagnostic test accuracy systematic reviews. Diagn Progn Res 6, 18 (2022). https://doi.org/10.1186/s41512-022-00131-z
- Diagnostic test accuracy
- Systematic review