Systematic or Meta-analysis Studies Systematic review of the clinical and economic value of gene expression proﬁles for invasive early breast cancer available in Europe

Gene expression proﬁles with prognostic capacities have shown good performance in multiple clinical trials. However, with multiple assays available and numerous types of validation studies performed, the added value for daily clinical practice is still unclear. In Europe, the MammaPrint, OncotypeDX, PAM50/Prosigna and Endopredict assays are commercially available. In this systematic review, we aim to assess these assays on four important criteria: Assay development and methodology, clinical valida- tion, clinical utility and economic value. We performed a literature search covering PubMed, Embase, Web of Science and Cochrane, for studies related to one or more of the four selected assays. We identiﬁed 147 papers for inclusion in this review. MammaPrint and OncotypeDX both have evi- dence available, including level IA clinical trial results for both assays. Both assays provide prognostic information. Predictive value has only been shown for OncotypeDX. In the clinical utility studies, a higher reduction in chemotherapy was achieved by OncotypeDX, although the number of available studies differ considerably between tests. On average, economic evaluations estimate that genomic testing results in a moderate increase in total costs, but that these costs are acceptable in relation to the expected improved patient outcome. PAM50/prosigna and EndoPredict showed comparable prognostic capacities, but with less economical and clinical utility studies. Furthermore, for these assays no level IA trial data are available yet. In summary, all assays have shown excellent prognostic capacities. The differences in the quantity and quality of evidence are discussed. Future studies shall focus on the selection of appropriate subgroups for testing and long-term outcome of validation trials, in order to determine the place of these assays in daily clinical practice. (cid:1) The


Introduction
In the past decades, there has been a steady increase in the survival rates of patients with breast cancer. Among other factors like early screening and awareness, the majority of this effect is attributed to the concept of adjuvant therapy [1,2]. However, among all patients receiving adjuvant chemotherapy, the majority would not have developed metastases even without adjuvant therapy, whereas in contrast some patients without the indication for adjuvant therapy still develop distant metastases. A recent progress in this optimal selection is the development of genomic profiling assays [3]. We chose four crucial criteria for determining the value of these assays.

Assay development and methodology
The first criterion is the methodological robustness, both during development and during the commercial activities. For example, the tests should be validated in a cohort independent from the training cohort, and should not be used in a patient population in which the test was not validated unless re-validation is performed. Furthermore, there should be little to no inter-test variation when the same tissue samples are tested multiple times.

Economic value
The fourth, and last criterion for genomic testing is the economic value of the test. Due to the commercialisation of the assays, the tests are more expensive than the regular pathological assessment, with costs ranging from €1800 to €3700 per test. In an era of emphasis on healthcare efficiency, the costs of the test should be justified by its clinical and health benefits, and the reduction in costs by reducing adjuvant therapy use.

Test descriptions
The first test, which was first developed in 2002 by van 't Veer et al. and for which the prognostic capacities were shown simultaneously by van de Vijver et al., is the 70-gene prognosis profile, better known as MammaPrint (Agendia BV, Amsterdam, The Netherlands) [5,6]. This assay uses the mRNA expression of 70 genes using microarray technology, to categorize patients in either a low or high risk. These 70 genes were identified from a total of 25,000 genes using supervised clustering.
The second test in this review is the 21-gene Recurrence Score, also known as the OncotypeDX Recurrence Score (RS) (Genomic Health Inc., Redwood City, CA). The test is based on the expression of 21 genes in FFPE cancer tissue, determined using reverse transcriptase PCR (RT-PCR) [7]. Of these genes, 16 genes are cancerrelated and were selected out of 250 rationally selected candidate genes based on their prognostic capacity and consistency in test performances [7]. Based on these relative expressions, the Recurrence Score is calculated ranging from 0 to 100, with low risk ranging from 0 to 17, intermediate risk ranging from 18 to 30, and high risk ranging from 31 to 100. However, for the most important validation trial of this test, the risk categories in this trial were adjusted to 0-10, 11-25 and 26-100 for the low-, intermediateand high risk respectively [8].
The third test included in this review is the Prosigna, based on the better-known PAM50 test (NanoString Technologies, Seattle, WA). This test, based on the expression of 46 genes using quantitative PCR (qPCR) is able to distinguish between the molecular subtypes of breast cancer (luminal A, luminal B, HER2enriched, normal-like and basal-like) [9]. Furthermore, it provides the risk of recurrence score (ROR) and the subsequent risk category. The test was adapted by NanoString in order to allow the use in local pathology laboratories [10].
The fourth and last test which will be discussed in this systematic review is the EndoPredict (Myriad Genetics Inc, Salt Lake City, UT). This assay uses the expression of 8 cancer-related and 3 reference genes determined by RT-PCR, which results in a risk score from 0 to 15 (EP), which is subsequently divided into low and high risk [11]. A special feature of the EndoPredict is the integration of tumor size and nodal status, resulting in an EP clinical score (EPclin). The EndoPredict can be performed in local laboratories, in contrast to the MammaPrint and OncotypeDX which are centrally determined and therefore need more elaborate logistical planning.
In this review, we evaluate four genomic assays available in Europe using a systematic evaluation focusing on all four major criteria with the aim to assess each test individually for its strengths and weaknesses.

Search strategy
This systematic review was to comprehensively cover all four aspects of the four commercially available genomic profiling tests in Europe on four different aspects: developmental and methodological robustness, extend of clinical validation, clinical utility and economic value. These items were chosen after a consensus meeting and cover those evaluation criteria we deemed most important. We searched PubMed, Embase, Web of Science and Cochrane for articles published before April 2016. The search strategy (supplementary document 1) was applied on April 7th 2016, and after evaluation of all abstracts it was updated at September 9th 2016. Abstracts were screened for relevance based on the title and abstract, and remaining full-text articles were screened based on the inclusion criteria.

Selection criteria
Articles were selected if they studied one of the four tests available in Europe: OncotypeDX, MammaPrint, Prosigna or Endopredict. Furthermore, the article should be original peerreviewed research; abstracts, posters, reviews and metaanalyses were excluded. The article needed to cover one of the four criteria: development of the test, clinical validation, clinical utility or an economical evaluation. For the clinical validation studies, survival analysis was required, evaluating either the differences in survival between test-outcome groups, or the benefit of therapy in one or more test-outcome groups. For the clinical utility studies, decision impact studies were to be available in a representative cohort, and had to report both the absolute increase or decrease in chemotherapy as well as the shift from one treatment category to the other. Retrospective large-scale population-based impact studies were also included, reporting real-life shifts in the use of genomic testing and the subsequent changes in therapy decisions. Two reviewers (EJB, EB) independently selected articles that met the above inclusion criteria based on title and abstracts. Next, full-texts of potentially relevant articles were screened. Agreement concerning eligibility was achieved during consensus.

Data extraction and statistics
Data extraction was independently performed by the two reviewers. Data was collected concerning the performed test, the number of included patients, the results of the test, and survival outcome or change in treatment where appropriate. Disagreements in data extraction and interpretation were resolved during a consensus meeting. There were no changes in eligibility criteria during the selection of articles. All studies that fulfilled the inclusion criteria were included, independent of their methodological quality; no risk of bias assessment was performed. Both retrospective and prospective studies were included without exclusion of particular study designs with an emphasis on prospective RCTs (where available). Data were recorded in the tables as mentioned in the articles, no additional statistics were performed. Both point estimates and 95%CI were recorded, where appropriate and mentioned in the selected articles.
Due to the heterogeneity of the studies chosen, the patient selection and endpoints reported, no further statistical analyses could be performed. Results were stratified in (1) one of the four tests and (2) lymph node positive or lymph node negative patients or articles where the distinction could not be made or both groups were included.
For the clinical utility, extracted data from decision-impact studies were pooled (weighted by the number of patients) to give an estimate of the chemo-reduction and shift in therapy a test can establish. We only considered a change in chemotherapy and recorded the percentage of patients who would receive chemotherapy before the test, and after the test (as mentioned in the included articles). For the table on clinical validation, the number of patients who were high or low risk according to the test were recorded and the outcome in the groups. Outcomes were recorded as mentioned in the articles: distant metastasis or distant recurrence free survival, breast cancer specific survival, and overall survival were most frequently reported. Where known, both the point estimate and the 95%CI were recorded. The Hazard Ratio and corresponding 95%CI for the difference in outcome between the risk groups was recorded if this was mentioned in the articles. For the economic review, original evaluations were included if they compared costs beyond the assay costs alone. Evaluations could be cost minimization analyses (CMA), cost effectiveness analyses (CEA, comparing costs to life years) or cost utility analyses (CUA, comparing costs to quality-adjusted life years (QALYs)). To aggregate, QALYs were imputed for CMAs and CEAs (as predicted by the average and the life year gain, respectively) and costs were updated to Euros at price level 2016. When more than one (non-) genomic strategy was included in an economic evaluation, the (non-) genomic strategy with the highest QALYs was used in the review.

Results
Using our search strategy, we identified 1345 unique titles and abstracts. Limiting ourselves to the manuscripts only related to the topics of this review, we selected 280 studies for further full-text evaluation. From these 280 full-text manuscripts, we selected 149 papers for inclusion in this review: 11 about developmental validation, 12 about biomarker prediction, 50 about clinical studies, 28 about clinical utility and the effect on chemotherapy reduction, 44 economic evaluations and 4 studies making direct head-to-head comparisons on test outcome between two or more of the included tests (Fig. 1).

Assay development and methodology
In the development of MammaPrint, multiple evolutions were necessary to allow high-throughput screening of FFPE tissue. Glas et al. first converted the original research-based micro-array containing approximately 25,000 probes to a mini-assay with good concordance and reproducibility [12,13].
A second step was the conversion from frozen to FFPE tissue by Mittempergher et al., with an R 2 of 0.94 [14]. After this proof of principle, Sapino et al. further developed the MammaPrint towards an FFPE platform, again with a good correlation between FFPE and frozen tissue (r = 0.92), and a high concordance between high-and low-risk classifications between both methods (j-score 0.82) [15]. Beuner et al. validated both the conversion to a mini-assay and the conversion from frozen tissue to FFPE retrospectively, by comparing the scores of both methods [16].
Gyanchandani et al. studied whether intratumoral heterogeneity might influence the outcome of a gene expression test in 74 ERpositive cases using most included gene expression panels, by assessing different tumor regions from the same FFPE block [17]. They showed that genomic assays with a higher number of included genes resulted in a lower rate of discordant samples. Drury et al. studied the use of 0.6 mm cores and compared these with full sections, to establish whether tissue-microarrays (TMAs) could be used for genomic profiling using OncotypeDX [18]. Although the total RNA yield was lower from tissue cores compared to full sections, the OncotypeDX Recurrence Score results from individual cores clustered closely, and had an excellent correlation with full-section RS (Spearman R = 0.91).
For the Endopredict, the use of pre-surgery biopsies and surgical sections from 40 ER-positive HER2-negative tumours was compared. It was shown that comparing both results resulted in a Pearson correlation coefficient of 0.92, showing that core needle biopsies can be used for genomic profiling using Endopredict [19]. Another aspect of the EndoPredict is decentral assessment, meaning that every individual pathological laboratory can perform this test and thereby reducing the logistical strain on the testing procedure. Denkert et al. tested this decentral evaluation [20]. The Pearson correlation coefficient for all measurements was a near-perfect 0.994, and 100% of the samples were assigned to the same EP risk group as the reference test. Furthermore, Kronenwett et al. showed that this decentral approach had excellent precision and reproducibility, although with a small sample size [21].
Although these published studies showed a good reliability and reproducibility, the MINDACT trial shows that there can be problems which hamper the reliability and feasibility of a test. Between May 2009 and January 2010, 162 patients were falsely identified as being high risk, due to a change in RNA-extraction solution [22]. Furthermore, of all 11,288 screened patients, there was a screening failure in 1182 patients (10%) in which the MammaPrint was not feasible [22].
Another concern for the reliability of test results is the ratio between tumor and normal tissue in the tested specimen. Elloumi et al. showed that an increase of normal tissue in the specimen leads to biased test results when compared to uncontaminated tumor tissue test results [23]. For the PAM50 this bias was linear, showing a more favourable outcome with increasing normal tissue content. For the MammaPrint and OncotypeDX the bias was unpredictable, switching both from low to high risk and vice versa with increasing normal tissue content. All tests have since developed strategies to mitigate this bias.
A couple of studies directly compared the test results of multiple tests performed on one tumor. In the OPTIMA Prelim trial, patients were randomized between standard therapy or OncotypeDX-directed therapy [24,25]. Among others, also Mam-maPrint and Prosigna tests were performed. Strikingly, the kappa measurements were between 0.40 and 0.53. In the same cohort of patients, OncotypeDX predicted 17.9% to be high risk, compared to 38.6% and 34.5% for MammaPrint and Prosigna respectively. This pilot trial is now followed by the OPTIMA trial, in which treatment directed by the Prosigna assay is compared with regular care. In a smaller prospective study, 52 samples were analysed with both the OncotypeDX and Prosigna, showing a Spearman correlation coefficient of 0.08 [26]. Remarkably, 57.1% of the patients classified as high risk by Prosigna were classified as low risk by OncotypeDX. In a similar study comparing Endopredict and OncotypeDX results in 34 samples, a Pearson correlation of 0.65 was shown, with a concordance between risk categories of 76% [27].

Prediction of test results
Theoretically, a genomic profile can have an excellent prognostic value, but is 100% predicted by the occurrence of other markers and therefore has no added value. Therefore, it is crucial to establish the added value of the test, by testing whether the test result can be predicted by standard clinicopathological parameters. This testing could identify subgroups for which the test is not valuable. We identified 12 studies evaluating this effect, which are reported in Table 1. In general, tumours which are (a combination of) grade 1, PR-positive and/or have a Ki-67 expression lower than 10%, are almost always low risk when genomic testing is performed. Similarly, tumours which are (a combination of) grade 3, PR-negative and/or have a Ki-67 score of more than 40%, are almost always high-risk. For these subgroups, genomic profiling provides little additional information.

Clinical validation
A total of 50 studies was identified assessing the clinical benefit of the genomic assays; 21 assessing the MammaPrint, 20 assessing the OncotypeDX, 5 assessing the PAM50/Prosigna and 4 assessing the Endopredict. Most of the studies were retrospectively stratifying the cohort in separate risk categories determined by the test, and showing a difference in either distant metastasis-free, disease-free or overall survival. Table 2 shows the results of the retrospective included studies, according to test and patient inclusion. In general, the studies are difficult to compare due to different patient inclusion and outcome measures. All published studies showed a good differentiation in high and low risk and were associated with survival (both Distant Metastases/Recurrence Free Survival (DMFS/DRFS) as Overall Survival (OS)). In more detail, MammaPrint was reported to be of significant prognostic value for patients with lymph node negative breast cancer and the results of the test correlated well with Adjuvant!, St Gallen and NIH guidelines and the NPI. For lymph node positive disease, the hazard ratios for DMFS and Breast Cancer Specific Survival (BCSS) showed a significant difference in prognosis for low versus high risk according to MammaPrint. In the remaining articles (without specific classification or LNÀ and LN+ combined) the MammaPrint was also of prognostic value; most of the results showed a significant difference in outcome between low and high risk.
With respect to OncotypeDX, most of the studies in patients with LN negative disease studied the DRFS and showed a significant difference in outcome between low, intermediate and high risk patients. Paik et al. showed a statistical different effect of chemotherapy in the three risk groups with a significant interaction term between chemotherapy and the Recurrence Score. One case-control study showed a significant difference between both groups. Besides, the study in LN+ disease also showed a significant interaction between the RS and clinical benefit of chemotherapy for the first 5 years after treatment. The remaining studies (combined LNÀ and LN+ and one study in patients with metastatic disease) showed a good discrimination between the three risk groups and a significant difference in outcome in most of the studies.
Studies that used the PAM50 showed a good discrimination, and a significant interaction between treatment and outcome in Excluded studies (n=131) • Outcome did not match one of the four criteria (n=71) one study, this was however not confirmed in Liu et al. Three studies showed a significant association with distant recurrences. For studies that used EndoPredict differences between high and low risk were associated with outcome or showed a low proportion of distant metastases in the low risk group.
Both the PAM50/Prosigna and EndoPredict have a quality B level of evidence in all of their validation studies by performing them in established clinical trials, according to Simon et al. [83] For MammaPrint one level A trial is available [22], all other studies are level C quality or lower. For OncotypeDX, there is a mix of two level A trials [8,36], some level B studies showing predictive capacities of OncotypeDX, and level C/D studies in retrospective or casecontrol studies. All level A evidence will be discussed in the next paragraphs.

MINDACT
The MINDACT trial evaluated the use of the MammaPrint together with Adjuvant Online, an online tool using clinicopathological information for risk stratification [22]. Patients with discordant risks based on the clinical and genomic assessment, were randomized between chemotherapy or no chemotherapy. The primary study subgroup were the patients with a clinical high and genomic low risk tumor who were randomly allocated to receive no chemotherapy. The distant metastasis-free survival of this group was 94.7% at 5 years, which was significantly higher compared to a pre-determined null hypothesis of 92%. Therefore, it was concluded that the prognosis of these clinically high-risk, but genomic low risk patients without chemotherapy was good enough to justify the abstention of chemotherapy.
The trial is labelled as phase 3 RCT and the results are regarded as level IA evidence. However, the design of the primary analysis is that of a cohort study, since it only assessed the patients who had a discordant risk and did not receive chemotherapy. In a secondary per-protocol analysis, comparing the c-high/g-low patients with and without chemotherapy, a HR of around 0.65 was shown in favor of chemotherapy, which was significant for DFS (90.3% vs 93.3%, p = .026), but not for DMFS (94.8 vs 96.7, p = 0.106) or OS (97.3 vs 98.8, p = 0.245). In summary, although the prognosis of this clinically high-risk group is good without chemotherapy, it is significantly better when receiving chemotherapy.
Another secondary outcome is the effect of chemotherapy in patients who were clinically assessed as low-risk, but with a genomic high risk profile. In this subgroup, no statistically significant benefit of chemotherapy was observed for either DMFS (HR 0.90 95% CI 0.40-2.01), DFS (HR 0.74 95% CI 0.40-1.39) or OS (HR 0.72, 95% CI 0.23-2.24), indicating that a high-risk MammaPrint test result does not predict an effect of chemotherapy for these low-risk patients. Although this analysis is underpowered, and no formal interaction test was performed, the authors conclude that the MammaPrint failed to show its value as a predictive biomarker, not being capable of identifying patient who would benefit from chemotherapy.

TAILORx
The TAILORx trial was designed to assess the clinical use of OncotypeDX to decide on the chemotherapy administration, especially in the intermediate risk group. For this, 10,273 patients    were enrolled, who all had ERÀ and/or PR-positive, node-negative disease but did have an indication for chemotherapy based on the NCCN-guidelines. Low-risk patients (based on Recurrence Score) received endocrine therapy only; high risk patients received both endocrine and chemotherapy. Intermediate risk-group patients were randomly allocated to either endocrine therapy alone or a combination of endocrine and chemotherapy. Until now, only the results of the low-risk patients were published [8]. A total number of 1626 patients with a low-risk OncotypeDX test received no chemotherapy. The rate of DFS at 5 years was 93.8%, the freedom from distant recurrence was 99.3% and the overall survival was 98%. Similar to the MINDACT trial, this shows that genomic testing can identify patients with a good prognosis without chemotherapy, despite a clinical indication for chemotherapy.
In a similarly designed trial (RxPonder), node-positive patients with HR+ breast cancer and a low or intermediate test result are randomly assigned to hormone therapy with or without chemotherapy [84]. Results of this trial will show whether it is safe to withhold chemotherapy based on a low or intermediate test result population despite the high-risk nodal status.

WSG PlanB
In the West German Study Group Phase III PlanB Trial, 3198 clinically high-risk patients were enrolled, including 41.1% with node-positive disease. Although originally designed to compare two regimes of chemotherapy, after inclusion of 274 patients the study was amended to omit chemotherapy in patients with a low-risk OncotypeDX test result, despite their high clinical risk [36].
In this high-risk population, 348 patients received no chemotherapy based on a low-risk Recurrence Score of <12. At 3 years of follow-up, the disease-free survival was 98.4% in this subgroup, indicating again that genomic subtyping can identify a clinically high-risk subgroup with an excellent prognosis without chemotherapy, although longer follow-up is warranted for definite conclusions. Similar to the TAILORx, this study used an alternative cut-off for low-risk scores, which needs to be considered when interpreting the results.

Clinical utility
A total of 28 studies which evaluated the clinical utility of assays has been identified, of which 22 for OncotypeDX, four for MammaPrint, and one for both Prosigna and Endopredict. Almost all studies compared the (hypothetical) application of chemotherapy for the same patient, with and without the results of the genomic test. In general, de-escalation from chemotherapy to no therapy or endocrine therapy alone was higher than the escalation towards chemotherapy, which led to a decrease in chemotherapy use for all tests. When the results were pooled per assay, the decrease in chemotherapy was the most pronounced for OncotypeDX (45.7% from chemotherapy to endocrine therapy alone or no adjuvant therapy) compared to MammaPrint (32.2% decrease) (Table 3). However, these pooled results should be interpreted carefully, since there is a large difference in the number of studies per test, the baseline patient populations and study designs.
For OncotypeDX, three other studies evaluated the use of chemotherapy in population studies [113][114][115]. Two of them observed a decrease in chemotherapy use during the designated years, and an increase in genomic testing [114,115]. However, no direct relation was observed between both results. In the study of Su et al., performed in a US medicare population between  2008 and 2011, no difference in the use of chemotherapy was observed despite an increase of assay use from 9 to 17.2% [113]. Two other studies evaluated the use of chemotherapy between patients with and without genomic testing [116,117]. In the large study performed by Ray et al. (n = 7004), 22% of chemotherapy was observed in patients without testing, whereas 26% used chemotherapy after genomic profiling. In contrast, Stemmer et al. (n = 951) observed in a node-positive population, a 70% chemotherapy use without testing and a 24.5% chemotherapy use after genomic testing.
In a similar study design, Kuijer et al. observed a 10% lower rate of chemotherapy for patients with genomic testing using Mamma-Print [118].

Economic value
Forty-four original economic evaluations were found, of which 32 on Oncotype DX, 7 on MammaPrint, 1 on EndoPredict and 4 direct comparisons between tests (Table 4). Most evaluations compared genomic testing to a variety of strategies without genomic testing; four evaluations were head-to-head comparisons between genomic policies. Of the evaluations, 5 only estimated costs (CMAs), 1 estimated life years without QALYs (CEA) and 38 estimated QALYs (CUAs).
Methodologically, only 2 evaluations (both CMA) compared measured outcomes between two actual patient groups with and without genomic testing [113,119]. The remaining 42 evaluations all used mathematical (mostly Markov) modelling to compare estimated outcomes for different policies, for the same actual or hypothetical group of patients. These mathematical models typically estimated a decrease in chemotherapy (because the shift to low risk exceeds the shift to high risk), a decrease in recurrence (because the decrease in high risk exceeds the increase in low risk), and an increase in life years and QALYs (due to the decrease in recurrence and toxicity). Total health care costs may go up or down, depending on the balance between the assay costs and savings on chemotherapy and recurrence. Three studies also included savings on productivity [120][121][122].  Fig. 2 shows the estimated impact of genomic testing on QALYs and costs, according to the 40 evaluations comparing genomic testing to a strategy without genomic testing. The horizontal axis shows the impact on QALYs: all studies but one [123] reported that genomic testing resulted in better patient outcome with a positive impact on QALYs. The vertical axis shows the impact on costs: genomic testing was cost saving in 14 (35%) evaluations and cost increasing in 26 (65%) of the evaluations. On average, total costs increased by 449 euro per patient with an improvement on patient outcome of 0.16 life years and 0.20 QALYs. In general, there were no apparent differences between the estimated outcomes for the different genomic tests. Also, the range of costs was comparable in node-negative and node-positive patients, but the estimated QALY gain was larger in node-negative patients (on average, 0.24  [141] 2015 Best practice N0 ER+ Ireland € À1361 --Cost saving Su [113] 2016 Usual care N0 HR+ HER2À US $ 400 --Cost increasing Tsoi [142] 2010

Discussion
In this systematic review, we evaluated four commercially available prognostic genomic profiles on four selected crucial aspects. On all aspects, the tests are well-studied, with multiple well-designed and well-performed studies available. It is apparent that on the level of quantity, MammaPrint and especially Onco-typeDX are more extensively studied compared to the more recently developed Endopredict and Prosigna/PAM50 assay. At this time of development, both OncotypeDX and MammaPrint are suitable assays which can be helpful in the clinical setting. However, this review also identified some caveats which will need to be addressed before genomic profiling can be optimally applied.

Assay development and methodology
The first topic for improvement is the identification of a subgroup that benefits most from genomic profiling. This has already been investigated for OncotypeDX, and to a lesser extent for Mam-maPrint. For Prosigna and Endopredict we did not identify studies that studied for which clinicopathological subtypes genomic profiling is valuable. In general, the studies show that patients with grade 3, PR-and a high Ki-67 have no benefit from testing, since they are almost always high-risk. In contrast, patients with grade 1, ER+PR+ and Ki-67 <10% have no benefit from testing either, since (almost) all of them had a low-risk result. As suggested by the flowchart build by Allison et al., all other patients would have an indication for genomic profiling [38]. However, most of these studies were performed in a node-negative cohort. MINDACT has shown that despite node-positive disease, it could be considered to withhold chemotherapy at a low genomic risk score. Therefore, it is crucial that this test-result predicting model is validated and adjusted in large trial cohorts like MINDACT and the WSG Plan-B trial.

Clinical validation
One of the most important (theoretical) benefits of a genomic profiling test is the selection of patients in which the treatment with adjuvant chemotherapy will have a significant benefit. Currently, this task of genomic profiles is mainly performed by their prognostic capacities; i.e. the ability to identify patients with a poor prognosis for recurrence or survival. However, the results of the studies in this review, especially that of MINDACT, show that this does not automatically translate into a benefit of chemotherapy for these higher-risk patients. So far, no genomic test has shown it's predictive capacities in a prospective trial design. The only evidence for a predictive value was obtained in two prospective studies conducted on archived tissue (prospectiveretrospective design) in which the OncotypeDX retrospectively identified patients that benefit more from chemotherapy to which they were randomly allocated [62,65].

Clinical utility
Currently, the clinical consensus on adjuvant chemotherapy is that we are most likely over-treating our patients, since we are not capable of identifying patients that will or will not benefit from chemotherapy using the current clinicopathological parameters [157,158]. It is no surprise that the studies evaluating the clinical utility of genomic profiling especially show a reduction in chemotherapy use. However, absolute numbers should be interpreted carefully, since some tests are less frequently studied than others, which increases the risk of bias and skewed data. Interestingly, in retrospective population-based cohorts, implementation of genomic testing did not lead to a reduction in chemotherapy use [113][114][115]. This is in accordance with Petkov et al., who retrospectively matched OncotypeDX use with SEER registry data for over 40,000 patients [159]. Although the risk categories were indeed prognostic for five-year breast-cancer-specific mortality in this real-life population, patients with node negative, HR+, HER2breast cancer which underwent testing (n = 40,134, 22.7% chemotherapy) had no lower chemotherapy use compared to patients that were not tested (n = 144,056, 22.2% chemotherapy). Therefore, conclusions about genomic profiling leading to decrease in chemotherapy cannot be drawn from these analyses.

Economic value
Our review of economic evaluations identified 44 original publications, where earlier reviews included at most 11 or 18 published evaluations [160,161]. Except for the oldest evaluation [123], all studies reported improved patient outcome in terms of QALYs. Despite estimated savings on chemotherapy, recurrence and productivity, a small majority (65%) of the evaluations estimated that genomic testing resulted in an increase in total costs. Nevertheless, most evaluations (90%) estimated that genomic testing is cost-effective, with costs that are acceptable in relation to patient outcome. These economic results should be considered with caution. Firstly, the separate evaluations should not be interpreted as independent primary studies, because the models obtain their data from overlapping sources: mostly the diagnostic data are taken from the landmark trials and then applied to the care patterns of a particular country. Secondly, the economic studies generally evaluate the use of genomic testing in large groups of women, instead of trying to combine genomic profiling with other prognostic factors to identify those individual women for whom genomic testing does not have sufficient added value or could even be harmful. And thirdly, compared to trials, economic evaluations are more likely to suffer from publication bias.

Future perspectives
In the near future, trial results from RxPonder, TAILORx and WSG plan-B will become available, contributing to understanding the role of OncotypeDX in daily practice in both node-positive and node-negative disease. Furthermore, subgroup analyses and long-term follow-up of MINDACT will follow later and help define the place for MammaPrint in the diagnostic process, and the longterm safety of withholding chemotherapy in high-risk patients, based on a low-risk test result. The OPTIMA trial, randomizing high-risk ER+HER2-patients between standard chemotherapy, or treatment directed by Prosigna test-results will be the first trial to show level A evidence for the Prosigna/PAM50 test.
Another interesting development is the use of gene expression assays for the indication of endocrine therapy. Very recently, a retrospective analysis from Sweden identified an ultra-low category within the low-risk category of MammaPrint (15% of all patients, 26% of low-risk patients) [162]. Patients with this ultra-low risk score (n = 98) had a breast cancer specific survival of 94% at 20 years without any adjuvant therapy, and 97% at 20 years with just 2 years of tamoxifen, whereas 5+ years of therapy is the current standard for these patients [163]. Upon validation, these findings could lead to the implementation of gene expression assays in the indication for adjuvant endocrine therapy.

Conclusions
In summary, in this systematic review we have evaluated the four most frequently used assays in Europe on four relevant aspects. Regarding the amount of evidence, there is a clear separation between the more established MammaPrint and OncotypeDX on one hand, and the newer Prosigna and Endopredict on the other hand. Comparing MammaPrint and OncotypeDX, both assays have shown to be a useful prognostic tests which could lead to a reduction in chemotherapy use, with in general a favourable cost-benefit ratio. Both the MammaPrint and OncotypeDX have shown in prospective trials that a patient with a low-risk result can safely forego chemotherapy, despite clinical risk factors. In contrast, the benefit of chemotherapy with a high-risk test result has so far only been shown for OncotypeDX, albeit in retrospective analyses of archived tissue of prospective trials. Therefore, there is still a need for further prospective studies on all evaluated assays.

Conflicts of interest
None.