Whole exome sequencing of cell-free DNA – A systematic review and Bayesian individual patient data meta-analysis

Molecular profiling of tumor derived cell free DNA (cfDNA) is gaining ground as a prognostic and predictive biomarker. However to what extent cfDNA reflects the full metastatic landscape as currently determined by tumor tissue analysis remains controversial. Though technically challenging, whole exome sequencing (WES) of cfDNA enables thorough evaluation of somatic alterations. Here, we review the feasibility of WES of cfDNA and determine the sensitivity of WES-detected single nucleotide variants (SNVs) in cfDNA on individual patient data level using paired tumor tissue as reference ( × 100% sharedSNVsAlltissueSNVs ). The pooled sensitivity was 50% (95% credible interval (CI): 29–72%). The tissue mutant allele frequency (MAF) of variants exclusively identified in tissue was significantly lower (12.5%, range: 0.5–18%) than the tissue MAF of variants identified in both tissue and cfDNA (23.9%, range: 17–38%), p = 0.004. The overall agreement ( × 100%) shared SNVs All SNVs between SNVs in cfDNA and tumor tissue was 31% (95% CI: 15–49%). The number of detected SNVs was positively correlated with circu- lating tumor DNA (ctDNA) fraction (p = 0.016). A sub analysis of samples with ctDNA fractions ≥ 25% improved the sensitivity to 69% (95% CI: 46–89%) and agreement to 46% (95% CI: 36–59%), suggesting that WES is mainly feasible for patients with high ctDNA fractions. Pre- and post-analytical procedures were highly variable between studies rendering comparisons problematic. In conclusion, various aspects of WES of cfDNA are largely in its investigative phase, standardization of methodologies is highly needed to bring this promising technique to its clinical potential.


Introduction
Next generation sequencing of tumor tissue is increasingly being performed since more and more targeted treatments require presence of specific genomic alterations [1][2][3]. Although metastatic tissue can be obtained for this analysis, it is a cumbersome procedure for patients and repetitive sampling is frequently not feasible. Therefore, genomic profiling of plasma derived cell-free DNA (cfDNA) is considered as a minimally-invasive surrogate to predict outcome and predict or monitor treatment efficacy [4].
CfDNA consists of short fragments of DNA derived from normal-and tumor cells (ctDNA). Contrary to a single tumor tissue biopsy, ctDNA might give a more accurate representation of the entire mutational profile present across the different lesions within an individual cancer patient [5][6][7]. Although significant progress has been made for tracking previously detected tumor mutations using targeted gene panels or single gene assays [8], whole exome sequencing (WES) enables a more comprehensive analysis covering the complex landscape of somatic alterations. Hence, can be used as a tool to gain insight into tumor biology, for example by which genomic mechanisms tumor cells can confer resistance.
In addition, WES enables the identification of genomic signatures such as tumor mutational burden (TMB) and microsatellite instability (MSI), all being recognized as biomarkers for selected therapies such as immunotherapy [9,10]. So, compared to targeted panels comprising a relatively limited number of genes, WES analyses of ctDNA holds great promise to identify emerging genes that are of interest in treatment resistance and to capture DNA signatures important for treatment decision making. However, WES on cfDNA is technically challenging due to the often low tumor fractions in a high background of normal cfDNA.

T
The aims of this systematic review were to (1) describe to what extent WES of cfDNA in cancer patients is technically feasible and which approaches are being used, and to (2) analyze the sensitivity of WES-detected single nucleotide variants (SNVs) in cfDNA using tumor tissue as reference ( × 100% shared SNVs All tissue SNVs ) as well as the agreement between cfDNA and tumor tissue ( × 100%) shared SNVs All SNVs .

Literature search
PubMed was searched from May 2013 to July 2019 to find full publications. Search terms included cell free DNA and whole exome sequencing. Also synonyms of the terms and MeSH terms were used (Table  A1). For the technical feasibility analysis, studies were eligible if (1) they were written in English (2) WES was used for molecular profiling of cfDNA, and (3) patients had solid tumors. Exclusion criteria were: (1) solely focusing on bioinformatics pipeline not presenting unique data, (2) cfDNA derived from other liquids than blood, and (3) patients without cancer. Subsequently for sensitivity and agreement meta-analyses, studies that reported WES-detected SNVs in cfDNA and matched tumor tissue were included. Studies were excluded if: (1) time between collection of tumor tissue and cfDNA for individual cases exceeded 2 months, and if (2) SNVs in tumor tissue and cfDNA were not reported on individual patient level.

Data extraction
Two authors (M.B. and L.A.) independently performed the article selection and data extraction. For all studies the following data were extracted using a data-extraction form (Table A2): year of publication, sample size, cancer type, time between plasma and tissue collection, pre-analytical variables (amount of DNA input, ctDNA fraction), analytical conditions (sequencing methods and coverage), post-analytical conditions (variant calling and analysis), and the mutant allele frequency (MAF) of detected variants. An overview of used source files is available in Table A3. In case of discordances the authors reached agreement during a consensus meeting.

Pooled sensitivity and agreement analysis
To calculate a pooled sensitivity and agreement rate of WES-detected SNVs in paired cfDNA and tumor tissue (irrespective of primaryor metastatic lesion) we extracted individual patient data from each study. Per sample we collected the number of "shared SNVs" (SNVs detected in both tumor tissue and cfDNA), SNVs only found in tissue and SNVs only present in cfDNA. Also cfDNA input and sequencing coverage were extracted on individual sample level. Using SNVs detectable in tissue as reference, sensitivity was calculated as follows:

× 100%
Shared SNVs All tissue SNVs . The agreement rate between WES-detected SNVs in tumor tissue and cfDNA was calculated as follows: × 100% shared SNVs All SNVs , in which "all SNVs" was defined as: SNVs only detected in tissue + SNVs only detected in cfDNA + shared SNVs. Patients without detectable SNVs in tumor tissue were excluded from the sensitivity and also from the agreement analysis to keep both groups comparable. We did not calculate specificity, since we were unable to calculate the numbers of true negatives (wild type genes).

Additional WES-detected SNVs in cfDNA
For all studies included in the meta-analysis, we extracted the number of additionally detected SNVs in cfDNA for each sample pair and calculated the fraction of uniquely detected variants versus all variants in cfDNA: × 100% ctDNA variants unique to plasma all ctDNA variants . Per study we displayed the median of the individual sample data. To score the clinical potential of exclusively detected SNVs in plasma, we used the clinical annotation database OncoKB [11] (September 1st, 2019). Additionally, per study we scored whether variants detected exclusively in cfDNA had been described previously in the corresponding tumor type using cBioPortal for Cancer Genomics (September 4th, 2019) [12]. We reported SNVs with a MAF ≥ 2%.

Statistical analysis
An individual patient data (IPD) meta-analysis was used to estimate the overall sensitivity and agreement rates across all the studies. Taking into account the heterogeneity, the patient-specific effects and the study-specific effects were employed as random-effects in the (multilevel) model. For this purpose, Bayesian IPD meta-analyses were employed. Results of these analyses were shown using a forest plot, where the median and the 95% highest probability density (hpd) of credible intervals (CI) were reported for each study separately and pooled in an overall sensitivity and agreement rate. A sub-analysis was performed to estimate the sensitivity and agreement on a subset of cfDNA samples which contained an estimated tumor fraction ≥ 25%.
Computations and graphics were performed in R program language [13]. All Bayesian computations were performed using the Markov Chain Monte Carlo (MCMC) sampler through Jags [14] interface in R program language and relatively non-informative priors were used for the parameters in the model. The MCMC sampling was run for each analysis for 200 k iterations after discarding the first 200 k iterations (burn-in) to reach the convergence.
To assess the correlation between the total number of SNVs with ctDNA fraction, a Spearman's ρ was used. To compare the MAF in tumor tissue versus cfDNA a Mann-Whitney U test was performed.

Feasibility of WES of cfDNA
To evaluate the technical feasibility of WES of cfDNA we summarized pre-analytical and analytical parameters of all studies performing WES of cfDNA (Table 1). In total, WES has been performed on 303 samples, with a median coverage of 137X (range: 43-500X) using a median of 15 ng cfDNA (range: 2-100 ng). Most studies (n = 7) extracted cfDNA from plasma collected in EDTA tubes or did not report the tube type used (n = 6). Four studies performed only WES on samples with a high tumor fraction, i.e. ≥ 10% [15,17] and ≥ 25% [33] or "high" was not further specified [26]. Overall, the median tumor fraction of all samples from which individual tumor fractions were available, was 37% based on estimation by different platforms such as ultra-low pass whole genome sequencing (ULP-WGS), Sequenza [34] or maximum MAF of variants.
To focus on tumor-specific SNVs, all studies except for Dietz et al. [20] sequenced germline DNA derived from leukocytes or normal tissue. In addition, most studies used a combination of databases such as dbSNP [35], 1000 Genomes Project [36] or Exome Sequencing Project [37] to filter out single nucleotide polymorphisms (SNPs) present in germline DNA. Final selection of variants based on MAF, coverage and sequencing quality was highly variable (Table A4). For example, some studies only called SNVs based on a minimum MAF ranging from 1 [31,33] to 5% [16,17]. Sequencing quality scores involved either in-house developed algorithms or Phred scores with different cut-offs. Finally, not all studies performed an exome-wide final analysis and only reported data on cancer-associated genes [16] or genes involved in MAPK-pathway analysis [17] limiting the number of detected SNVs per patient. We further studied the correlation between total numbers of detected SNVs in cfDNA and ctDNA fraction on IPD which showed a positive correlation, p = 0.016 (Fig. A1). Since only the minority of studies provided IPD on cfDNA input or coverage, these variables were not individually tested.

Pooled sensitivity and agreement rate of cell-free DNA versus tumor tissue
Out of the 303 cfDNA samples on which WES was performed, WES data of matched tumor tissue was available for only 71 unique sample pairs. To calculate a pooled sensitivity and agreement between WESdetected SNVs in cfDNA versus tumor tissue we performed a Bayesian random-effect meta-analysis on this subset ( Table 2; Table A5 for IPD). Most studies compared SNVs between metastatic tumor tissue and cfDNA [15][16][17]21,26,27,32], whilst three studies analyzed primary tumors [20,29,38] and two studies analyzed both [19,23]. The pooled sensitivity of WES-detected SNVs in cfDNA using tumor tissue as reference was 50% (95% CI: 29-72%) (Fig. 2). The tissue MAF of variants exclusively identified in tissue was significantly lower (12.5%, range: 0.5-18%) than the tissue MAF of shared variants (23.9%, range: 17-38%), p = 0.004. For cfDNA, the median MAF of variants detected in both tumor tissue and cfDNA (12.2%, range: 2.1-26.9%) was higher than the MAF of variants detected in cfDNA only (4.6%, range: 0.4-9.0%), although not statistically significant p = 0.093. The pooled  Table 1 Pre-analytical and analytical parameters of all studies which performed WES on cfDNA.
First author Year of publication  * the original article mentions 'an average sequence depth of > 500 bp and genome coverage > 98%' which we interpreted as 500× depth.

Table 2
Overview of studies investigating WES-detected SNVs in cfDNA and tissue. Shared SNVs: Single nucleotide variants in both plasma and tissue. Tissue SNVs: All single nucleotide variants in tissue (independent of plasma). All SNVs: All single nucleotide variants in plasma and/or tissue. 1 Values are calculated medians unless ranges are reported.

Additional value of WES of cfDNA
To evaluate the additional value of WES of cfDNA to identify clinically useful SNVs that were not present in tissue, we calculated the number of additional SNVs unique to cfDNA and the ratio between ctDNA variants unique to plasma versus all variants detected in ctDNA (Table 3). Of all plasma-detected SNVs per sample a median of 43% (range: 0-96%) was exclusively detected in plasma. The median number of additionally detected SNVs per sample was 17 (range: 0-2840). Of these additionally detected SNVs, 36 variants detected in 20 out of 53 patients (38%) were detected in cancer associated genes as reported in cBioportal. Matching IPD with targetable genes according to OncoKB, we identified in 11 variants in 9 out of 53 patients (17%). The targetability of these variants ranged from level 1 (FDA-approved) to level 4 (biological evidence) ( Table 3).

Discussion
The increasing interest to capture the complex genomic landscape of individual cancer patients real time and in a minimally-invasive way, has initiated efforts on technical developments in the field of WES of cfDNA. The main purpose of this systematic review was to evaluate the technical feasibility of WES of cfDNA and to analyze the sensitivity and agreement of WES-detected SNVs in cfDNA using tumor tissue as reference.
It has become clear that there was significant variability between studies in the pre-and post-analytical conditions used (Table 1; Table  A4) which severely impacted comparability of results. Differences between studies were observed regarding technical aspects of sequencing including sequencing coverage and amount of cfDNA used. Especially  sequencing coverage was highly variable, ranging from 43 to 500X coverage, which theoretically results in lower limits of variant detection of 2.3% and 0.2% respectively, assuming that the variant is heterozygous and the genome is diploid. Although cfDNA input generally consisted of 10-20 ng, inputs ranged from 2 ng to 100 ng. In addition, the post-analytical part in which different bioinformatics pipelines were used also impacted the final variant calling since most studies used their own set of criteria to filter SNPs and to perform final variant calling (Table A4).
Clonal hematopoiesis has been identified as an important factor affecting accurate variant interpretation. During the process of aging different mutations accumulate in hematopoietic stem cells. This phenomenon occurs frequently in the elderly and its prevalence has been estimated at 31% [39]. The mutations resulting from clonal hematopoiesis are often detected during cfDNA sequencing analysis, since the majority of cfDNA is derived from leukocytes. Recently, it has been demonstrated that 53.2% of all mutations detected by cfDNA sequencing analysis result from clonal hematopoiesis, indicating the need for collection and sequencing of leukocytes as a reference [39].
Taken into account that for only 71 out of 303 cfDNA samples WES data of matching tumor tissue was available, the merit of our comparison is that we performed the meta-analysis on IPD level which allowed us to adjust for patient-specific effects in our model. Compared to large studies describing agreement between cfDNA and tumor tissue based on targeted sequencing approaches covering pre-specified gene sets [40], the number of sample pairs which was analyzed by WES is considerably limited. Most studies lacked IPD on cfDNA input and sequencing coverage hampering analysis of the impact of those variables on sensitivity and agreement.
Some studies only performed WES on samples with a minimum tumor fraction [15,17,26,27,33]. Overall, the samples selected for WES consisted of a median tumor fraction of 37%, much higher than generally occurs in cancer patients [41]. By using techniques as ultralowpass whole-genome sequencing with 10% tumor fraction as a cutoff value to pre-select samples, Adalsteinsson et al. [15] showed that only 34% of cfDNA samples from metastatic breast-and prostate cancer patients were feasible for WES analysis, including samples from all treatment lines. This implies that the number of samples with a sufficient tumor fraction in earlier lines of treatment might be even lower. Notably, studies have not reported success rates of WES in correlation to tumor fraction.
Our meta-analysis of the sensitivity of WES-detected SNVs in cfDNA versus tumor tissue has shown that 50% of SNVs present in tumor tissue are also detected in cfDNA. The reason for the rather low sensitivity of WES of cfDNA is probably multifactorial including technical and biological aspects. A major technical issue is the generally low sequencing coverage used for WES resulting in false-negative results for cfDNA variants present below the limit of detection. This is supported by the comparison of WES (coverage 226X) with targeted deep sequencing (coverage 1806X) on the same sample [5], demonstrating that some variants with low MAFs (< 5%) were detected by targeted deep sequencing only [5]. However, with the introduction of unique molecular identifiers and Elimination of Recurrent Artefacts and Stochastic Errors (ERASE-Seq) [42] discrimination of sequencing artefacts from true variants can be improved, enabling detection DNA variants at ultralow frequency.
Another technical aspect possibly affecting WES sensitivity is the tube type used for blood collection. Previous studies have demonstrated that cfDNA isolated from serum instead of plasma increased the background of normal DNA by the release of germline DNA (gDNA) due to lysis of leukocytes during coagulation [43,44]. This might partly explain the low sensitivity of Dietz et al. [20]. Size selection of short fragments before sequencing might positively influence the ratio between gDNA and cfDNA as well [45]. Biological challenges are the amount of cfDNA available for sequencing, generally low overall tumor fraction present in the sample and the subclonal presence of clones bearing alterations associated with treatment resistance [19]. The importance of the ctDNA fraction for the sensitivity of WES is supported by our finding that sensitivity improves when analyzing samples with a ctDNA fraction ≥ 25%.
Our findings demonstrate that in addition to tumor-tissue detected SNVs, WES of cfDNA also discovers SNVs exclusively detected in cfDNA. When calculating the ratio of these SNVs versus all cfDNA variants, we observed that the fraction of variants unique to cfDNA was highly variable amongst studies. The variability might partly be explained by factors as sequencing coverage and cfDNA input. This assumption, however, could not be substantiated by our data as availability of sequencing coverage and cfDNA input on individual patient level was insufficient. Nevertheless, SNVs which are exclusively detected in cfDNA potentially reflect intra-and inter tumor heterogeneity. Whether these additionally detected SNVs in plasma are derived from clonal or subclonal fractions in tumor tissue remains a topic of interest. Adalsteinsson et al. [15] estimated clonality and subclonality of SNVs detected in plasma and demonstrated that on average 88% of the clonal and 45% of the subclonal mutations were confirmed in the tumor. Assuming that all plasma detected SNVs are true variants, i.e. free of sequencing artefacts, these results imply that the majority of SNVs exclusively detected in cfDNA were subclonal in this study. Furthermore, SNVs indicated as subclonal in cfDNA might be of clonal origin in tumor tissue from other metastatic sites. Studies comparing cfDNA to multiple tumor region sampling support this hypothesis [5,25]. Huang et al. found that nearly all exclusively detected SNVs in plasma by WES were also detected in tumor samples from different liver lesions using targeted deep sequencing (average 98.7%, range: 69.3-100%) [5]. Another study also showed that when two tissue biopsies were taken and compared to cfDNA, the number of overlapping alterations between cfDNA and tumor tissue increased [25]. Importantly, some of these exclusively identified mutations were previously associated with therapy resistance (ESR1, ERBB2 and NF1) [46] and treatment outcome (PIK3CA) [47] in breast cancer. The clinical relevance of these findings and to what extent the MAF and its dynamics will impact outcome on certain therapies have thus far not been elucidated in prospective clinical studies. Secondly, the number of additional identified targetable mutations in cfDNA is currently very limited, but might be improved by efforts unraveling new actionable targets or profiles.
The added value of WES currently thus mainly resides in the discovery of resistance mechanisms and genomic alterations for which a wide coverage of the genome is needed such as TMB and mutational signatures. Goodall et al. [21] demonstrated this discovery-capacity of WES by identifying frameshifts in germline and somatic DNA repair mutations as mechanisms of resistance to PARP inhibitors in prostate cancer. For estimation of TMB, large targeted sequencing panels can be used [48]. However when taking estimation of TMB by whole genome sequencing (WGS) as reference, 30% of patients were misclassifiedeither false negative or false positive -when targeted sequencing panels were used. Concordance improved by increasing number of megabases (Mb) covered by the targeted panel [49]. Furthermore, reported correlations between cfDNA and tissue TMB using targeted panels, Spearman's correlation coefficient of 0.64 and 0.6 [50] are lower than correlations reported using WES, 0.85 [25]. To this end, further design and validation of highly needed targeted sequencing panels for TMB estimation are currently ongoing.
Altogether WES is an attractive tool for identification of genomic signatures and discovery of resistance mechanisms. In this IPD metaanalysis we show that the sensitivity of WES of cfDNA is 50% and that the overall agreement is 31%. Furthermore we describe large variability in pre-and post-analytical conditions of WES of cfDNA. Moreover, our results underline that the applicability of WES mainly resides in a selected group of patients with high tumor fractions. We recognize that WES is still in its developmental phase and that implementation of methods such as unique molecular barcoding and ERASE-seq will further improve sensitivity of WES. However, standardization of methodologies is highly needed to further define the clinical utility of this promising approach.

Contributors
MKB and LA wrote the manuscript, which was edited, reviewed and approved by AJ, MPHMJ, JWMM and SS. KN performed the statistical analyses.