TOMAS-R: A template to identify and plan analysis for clinically important variation and multiplicity in diagnostic test accuracy systematic reviews

Mallett, Sue; Dinnes, Jacqueline; Takwoingi, Yemisi; de Ruffano, Lavinia Ferrante

doi:10.1186/s41512-022-00131-z

Table 2 Example TOMAS-R template to identify clinically important variation and multiplicity for typhoid review

From: TOMAS-R: A template to identify and plan analysis for clinically important variation and multiplicity in diagnostic test accuracy systematic reviews

Summary of review (step 1) Title: Rapid diagnostic tests for typhoid and paratyphoid (enteric) fever Primary objective: to assess the diagnostic accuracy of rapid diagnostic tests (RDTs) for detecting enteric fever in persons living in endemic areas presenting to a healthcare facility with fever. Secondary objectives: To identify which types and brands of commercial test best detect enteric fever To investigate the sources of heterogeneity between study results including: Salmonella enterica serovars (Typhi or Paratyphi A); case-control study design; test population; reference standard; index test format; index test sample; population disease endemicity; participant age; geographical location.
Participants: Clinically-suspected enteric fever patients or unselected febrile patients Index test(s): All rapid diagnostic tests specifically designed to detect enteric fever cases; applied to patient blood or urine samples Role of test for patient (delete as appropriate diagnosis, monitoring, staging): diagnosis Role of test in planned clinical pathway: (delete as appropriate triage, add on, replacement): replacement Target condition: typhoid and paratyphoid (enteric) fever Reference standards: bone marrow culture, peripheral blood culture, peripheral blood culture, and polymerase chain reaction (PCR) on blood Study designs: prospective cohort, retrospective case control
Domain 1: Participants
Potential sources of clinically important complexity	STEP 2: List categories identified from review scoping	STEP 3: Report which categories will be separate or combined. Give Reason	STEP 4: Data extraction. Report if any categories will be preferentially extracted.	STEP 5: Presentation and meta-analysis. Report how categories will be treated
1.1: Clinical pathway/prior tests/different comorbidities/geographical regions Are there important differences between participants that could affect test accuracy? Examples • Different clinical pathways or healthcare settings (primary care, secondary, tertiary care) • Different prior tests (referral based on different prior tests) • Differences in other conditions likely to be present at same time • Different geographical settings	Clinical pathway/prior tests Two groupings: • clinically-suspected enteric fever • unselected febrile patients • some studies may include a mixture of patients	Keep as separate groups if possible. Retain studies with mixed or unclear populations. Report grouping based on study inclusion criteria in TOC. Reason: studies could include populations with varying pre-test probabilities of disease, or other concomitantly circulating infectious diseases	Preferential data extraction in separate groups^b. Otherwise extract as a mixed population group.	Planned SROC or forest plots with groups indicated for each study. Planned heterogeneity analysis.
	Level of disease endemicity Use two groups for level of disease endemicity to take account of pre-test probability of disease (e.g. medium versus high using classification of Crump 2004).	Keep as separate groups if possible. Report grouping based on study inclusion criteria in table of study characteristics. Use prevalence in study as measure of endemicity if not otherwise reported. Reason: tests have potential for varying performance in endemic and non-endemic regions	If a study includes data from two different endemicity disease levels separately (e.g. different centres or different seasons) preferentially extract data in separate groups. Where not reported, or populations are mixed, use study prevalence as proxy for endemicity.	Planned SROC or forest plots with groups indicated for each study or ordered by individual study prevalence. Planned heterogeneity analysis.
	Geographical location Use two groups (sub-Saharan Africa versus the rest of the world).	If sufficient studies then keep separate, otherwise combine. Report country in TOC Reason: in sub-Saharan Africa non-typhoidal Salmonellae are an important cause of bacteraemia; may affect the performance of enteric fever RDTs in this region.	Preferential data extraction in separate groups. Otherwise extract as a mixed population group.	Planned SROC or forest plots with groups indicated for each study. Planned heterogeneity analysis.
1.2: Disease type or severity Are there groupings within participants by disease type or severity that could affect test accuracy? Examples • Different severity of disease: patients with mild disease vs with severe disease • Different disease state, e.g. active vs past disease (inactive) • Different types of diseased, e.g. pigmented vs non-pigmented lesions in skin cancer	Not considered in this review	Not applicable in this review Reason: diagnosis is presence of typhoid disease. Severity of disease is measured by patient symptoms and signs.	Not applicable in this review	Not applicable in this review
1.3: Participant demographics Are there any important groupings by participant age, gender, ethnicity? Example separate groups by • Different ages such as children and adults • Different demographics such as gender, ethnicity, genetic groups	Two groups by age • Adult (over 16 years) • children (16 years or younger)	If sufficient studies then keep separate, otherwise combine. Reason: test might perform differently in children and adults, in part due to different prevalence of other infectious diseases.	Preferential data extraction in separate groups. Otherwise extract as a mixed population group.	Planned SROC or forest plots with groups indicated for each study.
Domain 2: Index test(s) Criteria used to focus review to most clinically relevant test(s):
Reason for potential groupings or categories.	STEP 2: List categories identified from review scoping	STEP 3: Report which categories will be separate or combined. Give reason	STEP 4: Data extraction. Report if any categories will be preferentially extracted.	STEP 5: Presentation and meta-analysis. Report how categories will be treated
2.1 Type of underlying index test Is more than one underlying type of index test included that could affect test accuracy? Examples • Different indications of disease presence e.g. DNA of infectious agent, antibodies against infectious agent. • Different formats of test, e.g. ELISA, PCR, dipstick • Different equipment needed that affect test e.g. laboratory test using specialist equipment, point of care test.	3 main commercial tests 1. Typhidot 2. TUBEX 3. KIT Other tests include PanBio Multi-test Dip-S-Tick, Mega Salmonella and SD Bioline tests	Separate groups for main tests. Variations within a test are grouped together 1. Typhidot 2. TUBEX 3. KIT All other tests considered separately include PanBio Multi-test Dip-S-Tick, Mega Salmonella and SD Bioline tests Reason: Identified 3 tests where sufficient studies to consider meta-analysis. Meta-analysis across other tests using different tests and test approaches not useful for review	Extract all test data from each study.	Separate meta-analysis for each commercial test^a,c. Where insufficient number of studies for meta-analysis, then graphical data presentation with descriptive analysis. Where sufficient number of studies available, make comparison between tests.
2.2 Index test methods within an index test grouping Is there more than one method or manufacturer for a test that could affect test accuracy? Also consider if the test might be done by people with different level of experience or using different approaches to interpretation. Examples • Different test versions of tests • Different participant samples used to detect disease, e.g. blood sample, urine sample • Differences in staff, e.g. trained laboratory staff vs nurse point of care test • Different treatment of inconclusive test results • Different approach to assist test interpretation, e.g. algorithms or checklists	Different test versions 1. Typhidot; Typhidot-M; TyphiRapid Tr-02 - grouping within Typhidot of IgM or IgG Ab detection 2. KIT: dipstick assay; latex agglutination assay; lateral flow immuno-chromatographic test 3. TUBEX: 1 format	Different versions within a test will be combined Reason: main review question about accuracy of 3 main test types.	Record test version in TOC.	All test versions combined in meta-analysis as single group^a,b.
	Different samples	Separate by sample type Reason: sample type considered important for test results	Separate data extraction by sample type	Planned heterogeneity analysis if sufficient studies. In review, no heterogeneity analysis as all studies use blood samples
	Different treatment of inconclusive results: Typhidot test only	Will combine Typhidot tests regardless of treatment of inconclusive results. Reason: Most studies report results for Typhidot such that IgM results can be extracted, so expect data extraction to standardise inconclusive results reporting.	For Typhidot, we extracted IgM in preference to IgG. Reason: IgM indicates recent infections whereas IgG can pick up previously resolved infections. Using Typhidot IgM allowed better comparison with TUBEX and KIT as these tests both detect IgM antibodies	Individual results will be presented in SROC plot labelled by method of treatment of inconclusive results. Main analysis across all Typhidot^a,c, but with sensitivity analysis limited to those reporting inconclusive results or where test format means there are no inconclusive results. Reason: different treatment of inconclusive results could influence results
2.3 Threshold(s) for positive index test result Are different thresholds used to define a positive result that could affect test accuracy? Has a clinically relevant index test threshold been identified for this review? Examples • Different test thresholds used to define a positive test result for semi-quantitative or continuous test results	Some KIT tests provide semi-quantitative test results where different thresholds can be used to define positive test results. Other test formats provide qualitative test results without any thresholds.	Keep KIT thresholds separate. Main result for KIT based on threshold of > 1+ which was judged the most meaningful clinically. Reason: reporting results at clinically relevant test threshold(s) is most important result for clinical practice. Results combined across very different thresholds do not give a result that can be interpreted at any clinically relevant threshold, but correspond to an average result reflecting how often different thresholds are reported.	Results extracted separately for each KIT test threshold. KIT threshold of > 1+ was judged the most meaningful clinically	Meta-analysis undertaken for the threshold of > 1+ only, as this was judged the most meaningful^a,c. Individual study results presented in SROC graphs
Domain 3: Target condition
Reason for potential groupings or categories.	STEP 2: List categories identified from review scoping	STEP 3: Report which categories will be separate or combined. Give reason	STEP 4: Data extraction. Report if any categories will be preferentially extracted.	STEP 5: Presentation and meta-analysis. Report how categories will be treated
3.1 Types of target condition Are there different target conditions included that could affect test accuracy? Examples • Different causes of disease (e.g. different organisms causing typhoid infection, different causes of trauma injury) • Different types or severity of disease that are treated differently, e.g. malignant and borderline disease in ovarian cancer disease diagnosis, any melanoma or melanoma with high potential to progress to malignancy	Salmonella typhi or Paratyphi A	Keep as separate groups if possible. Retain studies with mixed or unclear populations. Reason: tests likely to perform differently for different bacteria and bacterial subtypes	Preferential data extraction in separate groups. Otherwise extract as a mixed population group.	Protocol planned heterogeneity analysis if sufficient number of studies. In review, no heterogeneity analysis as all studies of Salmonella typhi infection
3.2 Reference standards Are different methods used to verify disease presence or absence that could affect test accuracy? Examples • Different methods to detect typhoid infection measured by detection of viral DNA or by bacterial culture	Four methods: bone marrow culture, blood culture, PCR peripheral blood, combinations of tests. For labelling of studies, two grouping of reference standards are defined. Grade 1 study was defined as one using both bone marrow culture and peripheral blood culture. A Grade 2 study was defined as using either peripheral blood culture only, or peripheral blood culture and peripheral blood PCR as the composite reference standard.	Keep reference standards separate. Categorise as grade 1 or grade 2 for TOC. For subsequent analysis it may be important to compare different reference standards within studies where possible. Reason: reference standards are likely to have differing ability to detect low levels of infection (blood vs bone marrow) and also to be able to differently detect live untreated (both culture and PCR), treated (unlikely with culture but should detect with PCR) and dead bacteria (not culture but should detect with DNA).	Extract all test data from each study, so index test data maybe extracted for two or more reference standards.	Meta-analysis using most commonly used reference standards as priority^a,c. Planned SROC or forest plot with groups indicated for each study. Planned heterogeneity analysis if sufficient number of studies. In review, no heterogeneity analysis as insufficient studies, but SROC plot displayed data for different reference standards.
3.3 Thresholds for reference standard Are different criteria or thresholds used to define presence of disease that could affect test accuracy? Examples • Different definitions of fasting blood glucose to define diabetes	Not applicable in this review	Not applicable in this review Reason: infection classified as present or not	Not applicable in this review	Not applicable in this review
3.4 Time of reference standard determination Are there differences in when reference standard is completed that that could affect test accuracy? Examples • Different time points reference standard assessed • Different maximum or minimum time intervals between reference standard and index test	Not applicable in this review	Not applicable in this review Reason: index test and reference standard are determined at a single time point.	Not applicable in this review	Not applicable in this review
Domain 4: Study design and quality
Reason for potential groupings or categories.	STEP 2: List categories identified from review scoping	STEP 3: Report which categories will be separate or combined. Give reason	STEP 4: Data extraction. report if any categories will be preferentially extracted.	STEP 5: Presentation and meta-analysis. report how categories will be treated
4.1: Unit of analysis: Are there differences in whether test results are a single test or more than one test result per participant (unit of analysis)? Examples • Test results could refer to individual participants, lesions, organ, clinic visits or imaging scans	Per participant	Not applicable Reason: In this review all results were reported based on disease status of patients	Not applicable	Not applicable
4.2: Risk of bias: QUADAS-2/QUADAS-C item or domain Based on risk of bias assessed using QUADAS-2/QUADAS-C, are there important differences between studies that could affect test accuracy? Examples • Single signalling question, e.g. specific design criteria (case control vs better design using cohort or nested case control) • Differences in QUADAS-2/QUADAS-C overall domain assessment of bias, e.g. participant domain	Study design: case control, prospective cohort, randomised controlled trial, paired comparative trial	Decision to retain all study designs in main analyses and to graphically present variation in study design. Reason: In this review, test type and test threshold prioritised as sources of variation, due to the limited number of studies. The bias due to case-control design will be presented graphically. Case-control studies were only included where controls consisted of patients with similar clinical presentation. Case-control studies with most extreme bias, due to use of healthy control patients, were excluded from the review.	Only one group of data at study level.	Planned SROC plot with groups indicated for each study. Protocol planned heterogeneity analysis if sufficient number of studies. In review, no heterogeneity analysis but descriptive comment.
4.3: Applicability: QUADAS-2 item or domain Based on applicability of study results assessed using QUADAS-2, are there important differences between studies that could affect test accuracy? Applicability of participants could depend on several factors and might be best summarised by analysis grouped by applicability of the participant recruitment assessed in QUADAS-2 Example Differences in QUADAS-2 domains for applicability of: • Participants • Index tests • Reference standard	Not considered	Not applicable Reason: main biases from individual QUADAS-2 domains are addressed in presentation and analyses above. Participant domain: case control vs cohort presentation Index test: threshold bias addressed. Interpretation of inconclusive results addressed. Reference standard domain: 3 grades of reference standard addressed. Flow and timing: Verification bias not applicable, time intervals not issue in review, inconclusive results included missing data issues	Not applicable	Not applicable

The first section is where the summary of the review is reported (step 1 of TOMAS-R), including the review title, objectives and components of PICO adapted for a diagnostic accuracy review. Steps 2, 3, 4 and 5 are reported in separate columns of the TOMAS-R template for each domain (participants, index test, target condition and study design and quality. Within each domain the first table column describes and gives examples of potential sources of clinically important complexity commonly found in reviews. This allows a systematic discussion of potential sources and reporting of identified complexities alongside data extraction and analysis decisions and rationale used to handle these complexities. The template is filled in for the rapid typhoid review. A blank template is included in the supplementary appendix
Abbreviations: DNA deoxyribonucleic acid, ELISA enzyme-linked immunosorbent assay, PCR polymerase chain reaction, QUADAS-2 quality assessment of diagnostic accuracy studies, RDTs rapid diagnostic tests, ROC receiver operating characteristic, SROC summary receiver operating characteristic, TOC table of study characteristics
^aMeta-analysis will only be done if (i) there are four or more studies where results are given in the same format (e.g. 2 × 2 table for diagnosis) (ii) study results are sufficiently homogeneous visualised in forest plots or ROC space for a meaningful representation by a single summary statistic.
^bPriority order of data extraction means that not all data will be extracted from published articles.
^cTo avoid over representing results from a study in meta-analysis results, we will include only one set of results per index test from each study

Back to article page

ISSN: 2397-7523

Contact us

Submission enquiries: Access here and click Contact Us
General enquiries: info@biomedcentral.com

Diagnostic and Prognostic Research

Contact us