METHOD FOR MOLECULAR TYPING OF TUMORS IN A SINGLE TARGETED NEXT GENERATION SEQUENCING EXPERIMENT
Field of the invention
The present invention concerns a new method of analyzing a cancer of a patient by detecting gene mutations, chromosomal alterations and DNA methylation status in a targeted Next generation sequencing (NGS) experiment.
The present invention applies in the medical field, particularly to improve tumor classification for each patient.
In the description below, the references between square brackets ([ ]) refer to the list of references presented at the end of the text. Background of the invention
Genomics performs high throughput detection of molecular variations, at the gene expression level (by transcriptome [1-3] and miRnome experiments [1 ,4]) which, for example, helped to distinguish tumors involving good or poor prognosis by identifying different molecular types, as for example for adrenocortical carcinoma [1 ], at the genomic DNA sequence level (by targeted/exome/whole genome sequencing ), at the chromosomal structure level (by SNP and CGH arrays, and by exome/whole genome sequencing [5-7]), and at the genomic DNA methylation level (by methylation arrays [8,9] or by DNA sequencing, after either treatment of DNA by bisulfite  or methyl cytosine immunoprecipitation ) .
Tumor tissues have been extensively studied by pangenomic approaches.
Indeed, an ever increasing number of tumor types have now been extensively screened by these techniques, with the global aim of identifying molecular subtypes and unraveling the molecular mechanisms of tumorigenesis [3,7,1 1-15]. Some of the molecular features thus identified may have important implications for patient's care. These include strong diagnostic and prognostic markers [16,17], and molecular signatures orienting towards specific treatments including targeted therapies [17,18]. In the post-genomic era, it is therefore of major importance to be able to translate the massive pangenomic features into targeted molecular measurements, compatible with clinical routine [19-21 ]. Thus pangenomic studies identified distinct molecular classes for many cancers, with major clinical applications. However, routine use requires conversion to cost effective assays.
Targeted next generation sequencing (NGS) is a powerful, robust and cost- effective technology in clinical practice [22,23]. Several applications are now emerging, including rapid screening of multiple genes in genetic diseases, identification of specific somatic mutations in different cancer types [24,25], and characterization of viruses and bacteria [26-28]. For these reasons targeted NGS is rapidly spreading aside clinical departments.
All these applications are primarily based on ability of targeted NGS to identify DNA sequence variations, namely base substitutions and indels . Unconventional uses of targeted NGS are emerging : several methods for detecting copy number based on NGS have been reported. Some show ability to detect homozygous deletions [29,30]. Other methods propose to identify variations of one DNA copy and loss of heterozygosity (LOH) [5,6,29-33], using approaches similar to methods developed for SNP arrays [34-36]. These methods have been developed for analyzing large genomic regions - e.g. whole genome sequencing (WGS) data, or whole exome sequencing (WES) - much larger than common targeted NGS panels used in clinical oncogenetics. Therefore, these methods are not suitable, and nor optimized for small targeted NGS panels.
Thus there is still a need for a method for the molecular typing of tumors compatible with clinical routine, at a limited cost increase. Such routine molecular typing might be of a particular advantage in cancer diagnostic and/or prognostic, as well as in the choice of a pertinent treatment thereof.
Detailed description of the invention
Considering the wide use of small targeted Next-Generation Sequencing (NGS) panels, the aim of the present invention was to assess whether these panels can serve for detecting specific gene mutations, calling chromosomal alterations and DNA methylation status. Those three combined analyses allowing classification of tumors according molecular typing.
The present invention especially relates to a NGS method for classifying cancerous tumors comprising using a set of genes of chromosome regions specifically identified in the art for said cancer, for which a search for mutations, an analysis of chromosomal abnormalities as well as an analysis of targeted hypermethylated regions is performed in only one run. This method thus allows analysis, in a single NGS experiment, of various events useful for typing tumors: mutations, chromosomal alterations (loss of heterozygosity, alterations, duplication, deletion) and methylation.
As shown in the experimental section, Inventors have been able to characterize chromosomal alterations in 449 tumors from 42 different cancer types through the NGS experiment of the invention, which, as shown for adrenocortical cancer, when comprising analyses of mutations and the assessment of DNA methylation led to the precise molecular typing of tumors. Hence, this invention is particularly suitable, in clinical routine, for the molecular typing and classification of tumor of each patient. Furthermore, in the same NGS experiment, a second sequencing library is added to include DNA methylation status which is of a particular advantage for oncogenetics analyses.
Accordingly, in an embodiment, the invention relates to a Next-Generation DNA Sequencing (NGS) method of analysing a cancer of a patient comprising the detection, in a sample of said patient, of:
- at least one characteristic alteration of chromosome regions identified for said cancer from a set of genes from these regions,
- specific gene mutations or at least one characteristic pattern of mutations in a set of genes identified in said cancer, and
- at least one characteristic pattern of DNA methylation status of chromosome regions identified as having an altered methylation status in said cancer,
all these three detection steps being implemented in a single NGS experiment.
As explained above, in a further embodiment, the detection step of detecting at least one characteristic alteration of chromosome regions of said method comprises identifying homozygous deletions and/or loss of heterozygosity (LOH) within the set of genes of said chromosome regions identified for said cancer.
Advantageously, in the method according to the invention, the detection step of at least one characteristic alteration of chromosome regions further comprises analysing at least 5 SNPs per chromosome arm of interest for searching heterozygous deletions or LOH (Loss of Heterozygosity), said at least 5 SNPs being known to be highly heterozygous in the general population. Inventors have indeed identified that detecting loss of one DNA copy using the method of the invention implies providing allelic ratios of heterozygous SNPs on each chromosome of interest. In another particular embodiment, said at least 5 SNPs are sequenced from patient leucocytes in addition to tumor. In a more particular embodiment, said at least 5 SNPs are sequenced from tumor only.
In a particular embodiment, the step of detecting at least one specific pattern of DNA methylation status is carried out onto bisulfite-treated DNA. In a more particular embodiment the analysis of the methylation status is implemented on CpG islands known as having an altered methylation status in said cancer. In a more particular embodiment the step of detecting at least one specific pattern of DNA methylation status is implemented on a subset of CpG islands, identified as sufficient for the cancer analysis of said patient.
Inventors have furthermore identified a method for increasing of more than 5 times the alignment efficiency over commonly used methylated sequencing methods. Therefore, methylation status analysis is advantageously operated (i) after a step of replacing the stretches of identical bases by only one corresponding base, except around the CpGs, the dinucleotides CG, TG and CA being excluded from this compression and (ii) with the alignment over the reference sequence restricted to the use of 3' primers end.
The methods of the invention are therefore suitable for detecting a mutation in a tumor suppressor gene, knowing the status of the other allele and the proportion of cells harboring the mutation. This is of particular interest for targeted therapies, as it is not yet currently assessed whether all tumor cells harbor a targeted mutation, or whether only a sub-clonal population will be targeted. Furthermore, the methods of the invention are also efficient in detecting homozygous deletions, high level amplifications which are common ways for inactivating tumor suppressor genes or activating oncogenes respectively, or even only one gain or loss of DNA copy. The methods of the invention also comprise a step of analysis DNA methylation status. DNA methylation status is also important for prognosis and potentially for treatment orientation ; indeed CpG island hypermethylation is a well-known mechanism of tumor suppressor. Then, beside calling mutations, using the method of the invention allows, in a single analysis, to detect the major determinants of molecular typing of tumors.
This is of particular advantage over the pangenomic studies which are not suitable for diagnosis or prognosis in clinical routine. Accordingly, the use of the above methods for assigning a patient to a specific group of patients corresponding to a specific molecular type of tumor is an object of the invention.
In a particular embodiment, said group of patients and/or the corresponding molecular type of tumor is indicative of the reponse of the disease to a treatment, and/or of the survival of the patient who is assigned to this particular group and/or molecular type of tumor.
In another particular embodiment the methods of the invention are used for stratifiying patients during clinical trial and/or for identifying molecular type of tumors that are indicative of a response to a treatment or allow to classify patients as a function of survival time expectancy. Consequently, in a more particular embodiment, the invention also relate a method of adapting the treatment of a cancer of a patient comprising the implementation of the steps of the targeted NGS method of the invention and a step of chosing the best therapeutic option for said patient as a fonction of the molecular type of tumor identified thereby for said patient. More particularly, "adapting the treatment of cancer" or "chosing the best therapeutic option" can comprise determining wether or not the patient is a responder to a treatment and, in a particular embodiment, thereby avoiding the administration of a useless treatment. It can comprise also chosing a targeted therapy known to be effective for the molecular type of cancer identified.
In the frame of the invention, "sample of a patient" comprises any tumor biopsy sample as incisional biopsy, excisional biopsy, or needle biopsy. "Sample of a patient" comprises also any autopsy samples, frozen samples dedicated to histologic analyses, fixed or wax embedded sample. As used herein the terms "sample of a patient" are preferably a tumor biopsy sample, but it can also be metastasis biopsy, or lymph node biopsy from a subject suffering or suspected to suffer from cancer or of cancer relapse. "Sample of a patient" can also comprise cells or cell lines or organoids or patient-derived xenografts (PDX) derived from patient tumor samples. As shown in the experimental section, methods of the invention are suitable for detecting chromosomal alterations in any type of cancer. Indeed methods of the invention have been validated for detecting chromosomal alterations in 449 tumors from 42 different cancer types (Table 1 ), beside adrenocortical carcinoma. In a particular embodiment, methods of analysing cancer of a patient of the invention are implemented for a cancer selected from breast cancer, colorectal cancer, ovarian cancer, lung cancer, pancreatic cancer, sarcoma, urothelial cancer, head and neck squamous cell carcinoma (HNSCC), adenoma carcinoma with unknown primitive tumor (ACUP), endometrial cancer, cervical cancer, oesogastric cancer, adenoid cystic carcinoma (ACC), cholangiocarcinoma, neuroendocrine tumor, melanoma, anal squamous cell carcinoma (Anal SCC), kidney cancer, uveal melanoma, germline tumor, hepatocellular carcinoma (HCC), parotid cancer, thyroid cancer, undifferentiated nasopharyngeal cancer of the cavum (UCNT CAVUM), Merkel cell carcinoma, mesothelioma, penile squamous cell carcinoma, peritoneal cancer, chemodectoma, corticosurrenaloma, desmoid tumor, epithelioid hemangiocarcinoma, meningioma, midline carcinoma, mixopapillary ependymoma, non-adenoid cystic carcinoma (ACC) salivary gland tumor, ocular adenocarcinoma, pelvic squamous cell carcinoma (pelvic SCC), pleiomorphic carcinoma of the tongue, prostate adenocarcinoma, thymic cancer, or squamous cell carcinoma of the vulva.
Cancer Number of Number of
Breast cancer 70 Parotid cancer 3
Colorectal cancer 52 Thyroid cancer 3
Ovarian cancer 49 UCNT CAVUM 3
Lung cancer 41 Merkel cell carcinoma 2
Pancreatic cancer 36 Mesothelioma 2
Penile squamous cell
Sarcoma 25 2
Urothelial cancer 17 Peritoneal cancer 2
HNSCC 16 Chemodectoma 1
ACUP 15 Corticosurrenaloma 1 Endometrial cancer 15 Desmoid tumor 1
Cervical cancer 12
Oesogastric cancer 12 Meningioma 1
ACC 11 Midine carcinoma 1
Cholangiocarcinoma 10 Mixopapillary ependymoma 1
Non-ACC salivary gland
Neuroendocrine tumor 8
Melanoma 6 Ocular adenocarcinoma 1
Anal SCC 5 Pelvic SCC 1
Pleiomorphic carcinoma of
Kidney cancer 5
the tongue 1
Uveal melanoma 5 Prostate adenocarcinoma 1
Germline tumor 4 Thymic cancer 1
HCC 4 Vulva SCC
Regarding specifically adrenocortical carcinoma, the NGS method of the invention constitutes the first predictor of survival in patients only based on DNA analysis of tumors in a single NGS experiment; said method using a "set of targets" comprising at least one gene selected specifically among , HSD3B2, CTNNB1, APC, CYP21A2, DAXX, BAH, CDKN2A, CYP17A1, MEN1, MCAM, RB1, TP53, RNF43, ZNRF3, MED12, CDK4, regarding mutations and homozygous deletions said at least one gene being distributed within about 10 regions identified as with frequent loss of heterozygosity (LOH), and about 4 CpG-rich regions selected from Cg07384961 , Cg14021073, Cg21494776, Cg23130254, Cg20312228, Cg01635061 , Cg27234090, Cg04582938, Cg06039392, Cg10167296, Cg10743104, Cg27425675, Cg16689634, Cg01 120165, Cg15284635, wherein methylation events are known to occur (hypermethylated regions in aggressive cancers). Applying the set of targets as defined above, the NGS method of the invention allows to analyze various types of abnormalities of tumoral DNA, on targeted regions, and comprises:
1 ) Analysis of recurrent mutations in at least one gene selected from HSD3B2, CTNNB1, APC, CYP21A2, DAXX, BAH, CDKN2A, CYP17A1, MEN1, MCAM, RB1, TP53, RNF43, ZNRF3, MED12, and CDK4. 2) Identification of SNP genotypes by measuring by allelic ratios and copy numbers in at least one of the above genes some genes, after capture of these genes by PCR, the amplicons coverage of the captured regions is evaluated. In comparison with the coverage of other genes, it is thus possible to identify specifically for a gene, a fall of the coverage for all amplicons of this gene. To optimize this quantification, capture of 5 to 10 SNPs with high heterozygosity is introduced on the same chromosome arm, so as to have an internal control for a better discrimination of homozygous deletions from heterozygous deletions.
3) Searching for loss of heterozygosity in at least one of the chromosomes arms selected from 1 p, 1 q, 2p, 2q, 9p, 1 1 p, 1 1 q, 17p, 18p, 18q, and 22q.
4) Analysis of the methylation status of 4 CpG-rich regions selected from Cg07384961 , Cg14021073, Cg21494776, Cg23130254, Cg20312228, Cg01635061 , Cg27234090, Cg04582938, Cg06039392, Cg10167296, Cg10743104, Cg27425675, Cg16689634, Cg01 120165 and Cg15284635.
In a particular embodiment said at least one gene includes: ZNRF3, TP53,
RB1, CDKN2A, and CDK4 because of frequent homozygous deletions (ZNRF3, TP53, RB1, CDKN2A) or amplification (CDK4) found for these genes in adrenocortical carcinoma.
In another particular embodiment, in order to better discriminate homozygous deletions from heterozygous deletions, loss of heterozygosity is also searched for chromosome arms 22q, 17p and 9p, carrying ZNRF3, TP53 and CDKN2A respectively.
In another particular embodiment, the analysed 4 CpG-rich regions are Cg07384961 , Cg14021073, Cg21494776 and Cg23130254.
As shown in the experimental section using at least 5 SNPs highly heterozygous (heterozygosity close to 0.5 in general population) on each chromosome arm of interest allows to detect gains or losses of just one DNA copy. Then, in a particular embodiment, methods of the invention further comprises analysing at least 5 SNPs per chromosome arm of interest for searching heterozygous deletions or LOH (loss of heterozygosity), said at least 5 SNPs being known to be highly heterozygous in the general population.
In a particular embodiment, the NGS method of the invention when used for the molecular typing of adrenocortical carcinoma, comprises detecting heterozygosity of 1 1 p and/or of 17p chromosom arms, as it has been shown has having a diagnostic value .
In a more particular embodiment, the NGS method of the invention comprises detecting a loss of heterozygosity of 1 p combined with the absence of loss of heterozygosity of 1 q, which is indicative of poor prognosis (data not shown).
As demonstrated in the experimental section and exposed above the method presented herein can be applied to the molecular analysis of many cancers, for which a set of targets is or can be determined by routine methods well known in the art. Furthermore combination in a single targeted NGS experiment and rapid determination of results allow rapid analysis, at high frequency and at lower cost, of patient tumors. It is of particular interest in clinical departments.
Another object of this invention are kits allowing the implementation of the method of molecular typing cancer tumors and, more specifically, of adrenocortical cancer tumors.
In a particular embodiment, invention thus also relates to a kit comprising a single NGS design for analysing (i) specific alterations of chromosome regions, (ii) specific gene mutations, and (iii) DNA methylation status of specific chromosome regions.
The terms "single NGS design" as used herein refers to a single NGS experiment comprising the preparation of two libraries, one for the analysis of the pattern of mutations and for identifying alterations of chromosome regions, and the other for the analysis of DNA methylation status of chromosomes. Advantageously, the two libraries prepared in parallel can be sequenced on the same sequencing chip, following common steps of NGS sequencing. Downstream analysis includes : i) calling mutations in a targeted set of genes, ii) calling chromosome arm alterations through Targomics method especially developed by the inventors (see below), iii) calling DNA methylation status through Targomics. Altogether these targeted measures recapitulate mutations, chromosome status and DNA methylation for each tumor, enabling individual classification into specific molecular classes. As exposed below, such a method is implemented on reduced NGS libraries, wherein only genes or DNA regions that are specific for the molecular typing of the studied tumor are analyzed, and is consequently called hereinafter "Targeted NGS experiment", in contrast to the whole genome or large genomic regions NGS experiments commonly used.
In a more particular embodiment said kit comprises a targeted NGS design comprising one or more primer sets corresponding to at least one gene selected from HSD3B2, CTNNB1, APC, CYP21A2, DAXX, BAH, CDKN2A, CYP17A1, MEN1, MCAM, RB1, TP53, RNF43, ZNRF3, MED12 and CDK4. In an even more particular embodiment said targeted NGS design comprises one or more primer sets corresponding to at least one gene selected from ZNRF3, TP53, RB1, CDKN2A, and CDK4. In more specific embodiment said targeted NGS design comprises primer sets corresponding to genes ZNRF3, TP53, RB1, CDKN2A, and CDK4.
In a particular embodiment said kit comprises a targeted NGS design comprising one primer set corresponding to at least 5 highly heterozygous SNPs located on chromosomes 1 p, 1 q, 2p, 2q, 9p, 1 1 p, 1 1 q, 17p, 18p, 18q, and/or 22q.
In a further embodiment the above mentioned kit also comprises a targeted
NGS design comprising one or more primer sets corresponding to at least one CpG island selected from Cg07384961 , Cg14021073, Cg21494776, Cg23130254, Cg20312228, Cg01635061 , Cg27234090, Cg04582938, Cg06039392, Cg10167296, Cg10743104, Cg27425675, Cg16689634, Cg01 120165 and Cg15284635. In a very particular embodiment said one or more primer sets comprises at least one CpG island selected from Cg07384961 , Cg14021073, Cg21494776 and Cg23130254. In a even more particular embodiment said primer sets corresponds to CpG islands Cg07384961 , Cg14021073, Cg21494776 and Cg23130254.
Brief description of the drawings and tables
Figure 1 : Example of chromosome arms from a tumor with various types of chromosomal alterations: obtaining informations with SNP (panels A and C) and NGS experiments (panels B and D). A. With SNP arrays, SNP genotypes are measured by allelic ratios and copy numbers. Allelic ratios (upper panel) are quantified by the BAF (B-allele frequency). BAF is the proportion of alternative - « B »- allele in each SNP genotype. For a normal chromosome arm, heterozygous SNPs (genotype AB) have a BAF of 0.5 -50% of « B » allele in the « AB » genotype. Homozygous SNPs « AA » and « BB » form bands of SNPs with BAFs of 0 -0% of « B » in « AA »-, or 1 -100% of « B » in « BB ». This is illustrated for chromosome 5q, with a band of SNPs centered on BAF=0.5. When chromosomes are lost in almost all tumor cells (6p, 9p, 1 1 q, 17p, 17q, 22q), the band of heterozygous SNPs is scattered into 2 bands with BAF close to 0 or 1 , depending on whether the B or the A allele is lost respectively. When chromosomes are lost in a subset of tumor cells (8q, 10q), these 2 bands show intermediate BAF values, due to the remaining cells with « AB » genotype. MBAF is a simplification of BAF, fitting all BAF values between 0.5 and 1 after « folding » the BAF plot along the 0.5 horizontal axis (formula: MBAF=abs(0.5-BAF) +0.5). MBAF are close to 0.5 for normal chromosomes, close to 1 when chromosomes are lost in almost all tumor cells, and intermediate when chromosomes are lost in a subset of tumor cells. Copy numbers (lower panel) are quantified by LogR ratio. B. With targeted NGS, allelic ratios can be quantified by the ratios of read counts of heterozygous SNPs, as examplified on the upper panel. MBAF values are close to MBAF values measured by SNP array. In addition copy numbers can be quantified by read counts of amplicons, as examplified in the lower panel. Read count profiles are close to LRR ratios measured by SNP array. C. Scatterplot of allelic ratios (MBAF) and DNA copy number (LRR) generated by SNP array. Three clusters of dots can be identified: cluster 1 : a group of SNPs with no allelic imbalance, corresponding to a diploid heterozygous chromosome arm, defining a normal chromosome arm. Thus with this normal chromosome arm, the number of reads corresponding to 2 copies of DNA can be unambiguously deduced; cluster 2: a group of SNPs with some allelic imbalance and a decreased number of reads, corresponding to chromosome arms with loss of one copy in a subclone (-30% of cells); cluster 3: a group of SNPs with almost complete allelic imbalance and a decreased number of reads, corresponding to chromosome arms with loss of one copy in a majority tumor cells. In this latter cluster, MBAF values close to 1 indicate a low tumor contamination by normal cells. D. Scatterplot of allelic ratios (MBAF) and DNA copy number (N Reads) generated by NGS. A similar pattern of clusters is observed as in panel C.
Figure 2: performance of allelic ratios and copy numbers measured by targeted NGS-comparison with SNP arrays. A. Correlation of allelic ratios between NGS and SNP arrays. Allelic ratios are quantified MBAFs. B. Correlation of copy numbers (CN) between NGS and SNP arrays. CN are expressed relative to ploidy. For instance for diploid cells, CN values of 0.5, 1 and 1 .5 correspond to 1 , 2 and 3 DNA copies respectively.
- Figure 3: TARGOMICs Automated detection of chromosome alterations from targeted NGS combining copy number (CN) and allelic ratio (MBAF) : A. TARGOMICs graphical output from a sample. Upper panel: MBAF of heterozygous SNPs for each gene (line: median value). Intermediate panel: read counts for each gene. Light grey line: mean of read counts. Homozygous deletions are called when read counts drop below a threshold (dashed line; default: 1/3 of mean read of read counts). Lower panel: Scatterplot of genes combining SNP allelic ratios (MBAF, x axis) and amplicon DNA copy number normalized to baseline (CN, y axis). The surface is divided into 3 regions (grey areas), each corresponding to a type of chromosome status (heterozygous diploid, gain or loss). B, C, D: performance of TARGOMICs for calling chromosome loss, diploid heterozygous chromosomes, and chromosome gains respectively. Each panel is a scatterplot of genes combining MBAF and CN. False positive, true positive and false negative calls are plotted as squares, circles and triangles respectively. E,F,G: performance of TARGOMICs as in B,C,D in the validation set of 449 tumors. Se: sensibility ; Sp: specifity ; NPV: negative predictive value ; PPV: predictive positive value.
Figure 4: measurement of DNA methylation by targeted NGS: A. Comparison of CpG coverage using BISMARK (diffuse seeding alignment), BISMARK after compressing the stretches of homopolymers, and with TARGOMICs (seeding alignment restricted to the primers 3' end, compression of homopolymers). B. Correlation between the proportion of methylated CpGs generated by BISMARK and TARGOMICs. C. Proportion of methylated CpGs in 6 tumors characterized by methylation array. Methylation array could classify tumors into high methylation (CIMP-high; black dots), intermediate methylation (CIMP- intermediate; grey dots) or not hypermethylated (non CIMP; white dots). CIMP high and CIMP intermediate showed higher proportions of methylation in most of CpG islands (x axis). D. Proportion of methylated CpGs in 26 tumors, measured either by TARGOMICs (y axis) or by MS-MLPA (x axis) are strongly correlated. Figure 5: combining allelic ratios and DNA copy number allows to detect chromosome arms alterations and tetraploidy in a tumor. A. SNP array profiles of allelic ratios (MBAF) and copy numbers (LRR) show various levels of alterations. B. With targeted NGS, SNP allelic ratios (MBAF) deduced from ratios of read counts of heterozygous SNPs, and amplicon copy numbers deduced from read counts show similar patterns compared to SNP array. C. Scatterplot of allelic ratios (MBAF) and DNA copy number (LRR) generated by SNP array. Four clusters of dots can be identified: clusters 1 and 2: two groups of SNPs with no allelic imbalance (MBAF close to 0.5), but distinct copy numbers, corresponding to a diploid heterozogous chromosome arm (cluster 1 ; genotype « AB ») and a tetraploid chromosome arm (cluster2 ; genotype « AABB »); cluster 3: a group of SNPs with some allelic imbalance and a number of reads between cluster 1 and cluster 2, corresponding to heterozygous chromosome arms with gain of one copy (3 copies of DNA ; genotypes « AAB » and « ABB »); cluster 4: a group of SNPs with almost complete allelic imbalance and a number of reads corresponding to cluster 2, corresponding to homozygous chromosome arms with 2 copies of DNA (2 copies of the same allele ; genotypes « AA » or « BB »). In this latter cluster, MBAF values close to 0,85 indicate some tumor contamination by normal cells (around 30% of cells). D. Scatterplot of allelic ratios (MBAF) and DNA copy number (N Reads) for heterozygous SNPs, generated by NGS. A similar pattern of clusters is observed as in panel C.
Table 2 represents NGS panel design.
Table 6 represents Example of chromosome status called by TARGOMICs.
« Calls » generated automatically by Targomics include « diploid heterozygous »
(for normal chromosomes), « chromosome loss » or « chromosome gain » when heterozygous SNPs are available; « 0- », « 1 - », « 2-», « 3-», or « 4- DNA copies » when no heterozygous SNP is available. DNA copy number is then inferred only from read counts and thus is less reliable; « homozygous deletion » and « high level amplification » for major shifts of DNA copy numbers.
Other columns : « CN »: copy number determined from amplicons read counts, expressed relative to baseline (1 : 2 copies, 0.5: Chromosome loss, 1 .5: chromosome gain) ; MBAF: median of MBAFs for all heterozygous SNPs of one gene ; « Call_distance »: euclidian distance to the center of each region corresponding to chromosome alterations in the MBAF/CN scatter plot (see figure 3A) ; « Proportion_of_cells »: proportion of cells with the chromosome alteration (deduced from the MBAF) ; « N_ampl icons »: number of ampl icons in the gene ; « N_snpHet »: number of heterozygous SNPs in the gene ; « Chr », « Start_37 », « End_37 »: physical positions ; « Startjnd », « Endjnd »: index position in the targeted NGS data set.
EXAMPLE 1 : MATERIAL AND METHODS
A training set of 109 adrenocortical carcinoma samples was analyzed, including 77 both sequenced by targeted NGS (tumor and leucocyte) and analyzed by SNP arrays (tumors), and 32 sequenced by NGS after bisulfite treatment, 6/32 were also analyzed by methylation array, and 20/32 were also analyzed by MS- MLPA (see below).
These tumors were snap frozen early after surgery, and kept in liquid nitrogen until use. DNA extraction was performed using standard protocols as previously described .
A validation set of 449 cancer samples, from 42 distinct cancer types was analyzed (Table 1 ), both sequenced by targeted NGS and analyzed by SNP array .
B- SNP and methylation arrays
In the training set, SNP array and methylation array experiments were performed using the lllumina HumanCore-12v1 and the lllumina Human Meth27 Beadchips respectively, following the manufacturer recommandations as previously described .
In the validation set, 449 additional SNP arrays (Affymetrix Cytoscan HD, Santa Clara) were also included as a validation cohort, generated in the SHIVA study . Chromosome alterations were called using GAP . Chromosome alteratoins were validated graphically, visualizing logR ratio (LRR) and B-allele frequency (BAF) SNPs along the genome (see figure 1A for example). Briefly, calling A and B the 2 alleles, BAF is calculated to reflect the genotype: BAF is 0 for genotypes AA, 0.5 for genotypes AB, and 1 for genotypes BB. BAF can be summarized by the following formula:
BAF = Signal / (SignaU + Signal)
For lllumina SNP arrays, BAF is directly provided. Segments were averaged into a single call for each chromosome arm. For Affymetrix SNP arrays, BAF was calculated for each segment called by GAP , combining allelic difference Y and copy number CN in the following formula:
This formula is deduced from the definition of Y and CN. Indeed, Y is the substraction of signals from allele A and allele B:
And the copy number is the addition of signals from allele A and allele B:
CN = log(A) + log(B)
CN was adjusted to be centered on 1 instead of 0 (1 : 2 copies of DNA, 0.5: 1 copy, 1 .5: 3 copies of DNA).
C- Targeted NGS sequencing
1 - Generating read counts and SNP genotypes (training set)
In the training set of 77 adrenocortical carcinoma, multiplex PCR was performed targeting a panel of 15 genes, sequenced following the manufacturer recommendations (Table 2). Briefly, these 15 genes were amplified with 442 primer pairs -covering 66,266 bp- from two pools of multiplex primers designed with the AmpliSeq Designer V2.0 (Thermofisher, Villebon sur Yvette, FRA). Libraries were massively-parallel sequenced by semiconductor sequencing technology on a PGM (ThermoFisher).
2- Generating read counts and SNP genotypes (validation set) In the validation set of 449 cancers, commercial cancer panels designed for the screening of hotspot mutations were used, either the Ion Ampliseq cancer panel V1 , or the Ion Ampliseq cancer panel V2 (ThermoFisher). Libraries were generated with the Ampliseq library kit v2 0 and sequenced on a PGM (ThermoFisher) .
3- Sequencing PCR products of bisulfite-treated DNA
For all samples, 2 g of Tumor DNA were used for the bisulfite treatment by EZ DNA Methylation -Gold Kit (Zymo Research,CA, USA) following the manufacturer protocol. Bisulfite treatment transforms unmethylated cytosines in thymines. The bisulfite treated DNA was then amplified by PCR using methyl- insensitive primers, designed by Methprimer . The list of probes is provided in Table 3 below. Bisulfite-treated DNA was amplified by PCR with the TaqGold (ThermoFisher), with the following program: 8 min at 95°C, 40 cycles of 1 min at 95°C, 1 min at 58°C, 1 min at 72°C and a final extension of 8 min at 72°C. For each tumor, the different PCR products -corresponding to different CpG islands- were mixed together. A single NGS library was then generated with the Ion Plus Fragment Library Kit (ThermoFisher) following the manufacturer recommendations, except for the replacement of end-repair procedure -the End-repair enzyme- (Thermofisher) was replaced by End-It™ DNA End-Repair Kit (Epicentre, Madison).
D- Treatment of NGS data
A set of original scripts optimized for targeted NGS data was specifically developed, gathered under the name "TARGOMICs", which makes use of read count, allelic ratio, allelic ratio for heterozygous SNPs, as well as identified methylated regions.
1 - Getting read count, allelic ratio, and allelic ratio for heterozygous SNPs. Getting read count
For each amplicon, the number of reads properly aligned were extracted using lonTorrent suite v3.6.2, using the Coverage Analysis plugin (Thermofisher).
Amplicons and samples with a mean coverage <30 reads per amplicon were discarded. For the remaining amplicons, for each sample, the 2 libraries were normalized to reach an equal mean number of reads by amplicon (see below).
Getting allelic ratios
Allelic ratios are expressed as MBAF.
For each heterozygous SNP, calling A and B the 2 alleles, the proportion of B-allele, also called "BAF" (B-allele frequency), can be calculated to reflect the genotype: BAF is 0 for genotypes AA, 0.5 for genotypes AB, and 1 for genotypes BB. BAF was determined from the read counts (Nreads), using the following formula:
BAF SNP = N reads allele B / (N reads allele A + N reads allele B )
Allelic ratios are then normalized as Mirror B Allele Frequencies (MBAF):
BAFs -which span between 0 and 1 -, were then converted into MBAF -which span from 0.5 to 1 -, by applying this formula:
MBAFSNP = Abs(BAFsnP -0.5)+0,5
Selecting heterozygous SNPs
In the training set, leucocytes were sequenced for each patient, in addition to tumors. Therefore, heterozygous SNPs were identified from the leucocyte genotypes, considering SNPs with MBAF<0.6. These SNPs were subsequently studied in the tumor, computing the MBAF from the reads in the tumor. In the validation set, no leucocyte data were available. To exclude germline homozygous SNPs, all SNPs with an MBAF>0.95 were excluded. Among the remaining SNPs, the next step was to discriminate germline heterozygous SNPs - the informative SNPs-, from somatic mutations -which are particularly numerous in cancers when using NGS panels optimized for catching somatic mutations hotspots. For that aim, only the SNPs commonly found in general population, using a >5% threshold, were considered.
2- Calling "normal", "gained" and "lost" chromosomes (TARGOMICs)
Each gene was defined as an independent chromosome segment.
Allelic ratio for each gene was obtained by averaging the MBAF of heterozygous SNPs of each gene.
Read counts for each gene were converted into relative copy number (CN), i.e. relative to a baseline CN. This baseline was determined for each sample as a set of "normal genes", i.e. with 2 copies of DNA and no allelic imbalance (the above heterozygote SNPs) . The first step was to identify the CN shared by a maximum number of genes. For that aim, the number of reads of amplicons were compared for each gene, using a Student t-test. Genes with no significant differences (p>0.05) were considered as identical in terms of CN. The maximum number of genes sharing an identical CN were identified as "baseline genes". A baseline read count was calculated as the mean read counts of "baseline genes". All read counts were subsequently divided by the baseline read count, thus generating relative CN -i.e. relative to baseline-. The second step was to check whether the "baseline genes" also showed no allelic imbalance -i.e. showed an MBAF close to 0.5 (<0.6)-. If no baseline gene verified this condition, genes with higher or lower relative CN and
MBAF close to 0.5 were sought. If found, the baseline CN was shifted to these genes: all relative CN were then divided by the relative CN of these genes.
For each sample, a scatterplot of all genes was subsequently generated, with MBAF in Y-axis and relative CN in X-axis. The scatterplot was divided into distinct regions, corresponding to each type of chromosome status: "normal" chromosomes for MBAF close to 0.5 with relative CN close to 1 ; "lost" chromosome for MBAF >0.5 with relative CN <1 ; "gained" chromosome for MBAF >0.5 and <0.67 with relative CN>1 (figure 3A). 3- Calling homozygous deletions and high level amplifications (TARGOMICs) Homozygous deletions were called for any gene with a relative CN lower than a threshold (default value: 0.3). For detecting subregions of genes with homozygous deletions, the following algorithm was applied: any region with n (default value: 9) or more consecutive amplicons reaching a relative CN lower than TARGOMICs' threshold (default value: 0.3) were identified as deleted segments.
High CN amplifications were called for any gene with a mean relative CN higher than TARGOMICs' threshold (default value: 3).
4- Aligning bisulfite-treated DNA and counting the methylated cytosines Alignment and counting with TARGOMICs
A specific alignment script was originally created. For each alignment, two specific reference sequences are generated by in silico bisulfite treatment, one for the methylated allele -cytosines of CpGs remain cytosines-, and one for the unmethylated allele -cytosines of CpGs are replaced by thymines.
For each CpG island sequenced, several steps are performed:
1 . Compression of all homopolymers into a single base, except around the CpGs - the dinucleotides CG, TG and CA are systematically excluded from this compression-. This enables to get rid of the numerous indel artefacts generated by semiconductor sequencing occurring within homopolymers and decreasing the mismatching of the sequences during the alignment.
2. Identification of reads aligned on the 3' end of each primer (by default the 15 last bases), using the forward and complementary reverse sequences. This allows for the selection of reads, their proper alignment and orientation.
3. Testing each read aligned on primers for its alignment on reference sequence, using the methylated reference -excluding the CpG positions-, tolerating a maximal error rate (10% by default).
4. Counting C and T alleles for each CpG from all properly aligned reads. The proportion of C reflects the proportion of methylated cytosins for each CpG (not shown). Alignment and counting with BISMARK
For the purpose of comparison, Aligning bisulfite-treated DNA alignment and counting of the methylated cytosins were performed using BISMARK [41 ]. Two methods were applied, based on the use of Bismark (Bismark vO.16.1 ; dependancy: Bowtie2 version 2.2.9; default settings): (i) with CpG islands sequences after in silico transformation of cytosines in thymines reflecting bisulfite DNA treatment; (ii) with similar sequences, but after compression of homopolymers (see above). Information was extracted CpG by CpG with bismark_methylation_extractor (parameters: "-scaffold ~split_by_chromosome -comprehensive -bedGraph - counts") .
Optimization of CpGs selection
The most informative CpGs were selected. For that aim, a training set of 6 tumors was used, studied both by methylation array and by NGS. Only the CpGs with a significant and positive correlation between NGS measure and the global methylation measured by methylation array (global methylation is defined as the mean M-Value calculated for each tumor from the top 1000 probes with the highest standard deviation among adrenocortical carcinomas [51 ], not shown) were considered. These CpGs were normalized and averaged for each CpG island. These CpG are Cg07384961 , Cg 10743104, Cg21494776, Cg23130254
Methylation specific-multiplex ligation-dependent probe amplification (MS- MLPA) experiments
20 adrenocortical carcinoma were analyzed by MS-MLPA to validate the proportions of methylated CpGs as determined by TARGOMICs, using the SALSA MLPA ME002 tumor suppressor-2 probe mix, combined with the SALSA MLPA EK1 -Cy5 or EK1 -FAM reagent kits (MRC-Holland). Methylation level was deduced for each sample from the average methylation level of 4 genes (GSTP1 , PYCARD, PAX6, PAX5), as previously described .
5- Bioinformatic Codes All the scripts were programmed in R fwww.r-project.org). TARGOMICs source codes are freely available for research use only but codes have been filed for commercial uses.
EXAMPLE 2: RESULTS
Informations obtained from NGS allelic ratios and read counts regarding somatic chromosomal alterations
A chromosome loss can be detected either by a decreased DNA copy number (CN), or by loss of heterozygosity (LOH; Figure 1 ). LOH is an extremum of allelic imbalance (Al), where one allele is completely lost  (Figure 1 ). Using NGS, read counts for each amplicon should somehow reflect DNA CN. Similarly, for heterozygous SNPs, ratios of read counts measured for each allele, termed allelic ratios, should reflect Al. This was tested in a training set of 77 adrenocortical carcinoma tumors genotyped both by SNP array and NGS, using SNP array data as a gold standard (Figure 1 ).
Indeed, using NGS and considering all heterozygous SNPs, allelic ratios globally were found as strongly correlated with Al measured by SNP array (Pearson r=0.88, p<10"12, Figure 2A). Similarly, read counts of NGS amplicons also correlated with CN determined by SNP array. However compared with allelic ratios, correlation coefficient was weaker (Pearson r=0.49; p<10"12, Figure 2B).
Based on these correlations, between NGS and the gold standard (SNP array), simulations have been performed to test the ability of NGS allelic ratios and read counts to discriminate Al and CN respectively. Concerning allelic ratios (normalized as MBAFs, which range from 0.5 for unaltered heterozygous SNPs to 1 for SNPs with complete LOH (Figure 1 A and 1 B)), variations of 0.2 were detectable with one single SNP, and variations of or below 0.1 were detectable with at least 4 consecutive SNPs (Table 4).
In grey : MBAF variations are properly detected in >95% of cases Concerning read counts (normalized as relative CNs, with "1 " for 2 DNA copies, and "0.5" for chromosome losses (1 DNA copy) in a pure tissue of diploid cell) variations of 0.5 were detectable with five amplicons, and variations of 0.25 with at least 20 amplicons (Table 5).
In grey : CN variations are properly detected in >95% of cases.
It was found that read counts may be efficient for detecting homozygous deletions (loss of the two DNA copies), and high level amplifications (data not shown). However read counts did not perform well for detecting chromosome losses or gains of one copy, especially in heterogeneous tissues -such as tumor samples. It was also found that allelic ratios were more robust than read counts, and should therefore be used as main information for calling chromosome gains or losses of just one DNA copy. Indeed, allelic ratios detect properly losses of one DNA copy. More specifically, MBAF (calculated from allelic ratios, see above) increases from 0.5 to 1 for one DNA copy loss is occurring in all cells. In heterogeneous tissues, MBAF increase is lower but remains important, for instance from 0.5 to 0.75 in case of losses occurring in half of cells. Of note, detecting chromosome gains was not as efficient as detecting chromosome losses, despite the use of MBAF. This is related to a more limited impact of chromosome gains on allelic ratios. Indeed a chromosome gain from two to three DNA copies is associated with a MBAF increase from 0.5 to 0.666 when the gain occurs in all cells. As soon as contaminating cells are present, this shift drops to lower values, barely detectable. For instance, for a chromosomal gain in half of the cells, MBAF shifts from 0.5 to 0.583.
Calling chromosome gains or losses of just one DNA copy implies that heterozygous SNPs are available for each chromosome of interest. Based on original simulations (Table 4), the method includes several SNPs as internal control (at least 5 to 10 per chromosome arm) with high heterozygosity in population (close to 0.5) for each chromosome arm of interest, when designing targeted NGS panels. Such a low number of SNPs will not dramatically increase the cost of the design nor of the sequencing, and will be very effective for reliable calls of chromosome alterations and particularly when talking about detecting loss of one DNA copy. In this study heterozygous SNPs were identified from sequencing patients' leucocytes -in addition to tumors-, using the same NGS panel. The advantage is to catch all heterozygous SNPs, including those with low heterozygosity in population, and independently of their allelic ratio in tumor. Alternatively, it is possible to choose 5 to 10 SNP per chromosome arm, instead of realizing leucocyte sequencing, to avoiding this cost spending step. The present analysis demonstrated that allelic ratios are precise -MBAF variations of 0.1 are detectable-. However some artefact calls can reach high allelic ratios. These artefacts may be filtered out by including a few (5 to 10) SNPs for each chromosome arm, with high heterozygosity in population. Indeed such a combination of SNPs can precisely estimate chromosome arm allelic ratio, and therefore provide an expected allelic ratio for all heterozygous SNPs on this chromosome arm.
Allelic ratios are thus more informative than read counts for detecting somatic chromosomal alterations.
To confirm these findings, correlations between NGS and SNP array were further tested in an independent cohort of 449 tumors from 42 distinct tumor types (Table 1 ). Allelic ratios measured by NGS were found to strongly correlate with Al measured by SNP array (Pearson r=0.81 , p<10"12). Similarly, read counts of NGS amplicons also correlated with CN determined by SNP array, but to a lesser extent (Pearson r=0.47; p<10"12). Integration of NGS allelic ratios and read counts into single calls of chromosomal alterations
Using SNP arrays, combining copy number and allelic imbalance into a single analysis can (1 ) identify and discriminate normal chromosomes from chromosomes with copy number alterations or and/or loss of heterozygosity, (2) estimate the proportion of tumor cells with a chromosomal alteration, with applications for determining the clonality and the proportion of normal cells in a tumor sample, and (3) determine tumor ploidy [39,43] (Figure 1 C; Figure 3C). From the method exposed above (example 1 - section D) the inventors tested if it is possible to deliver such information from targeted NGS.
Scatterplots of SNPs with allelic ratios and read counts generated by NGS showed distinct clusters of SNPs, corresponding to distinct chromosomal statuses. An instance is provided Figure 1 C and 1 D, showing a specific type of chromosome alterations (chromosome losses), the proportion of tumor cells (-85%), and a subclonal events (additional chromosome losses in 40% of cells). Tumor tetraploidy can also be deduced. Figure 3C and D).
An automated detection of chromosomal gains and losses was implemented in TARGOMICs. An original combination of allelic ratios and read counts scatterplots was used, this combination considered the better reliability of allelic ratio compared to read counts (Figure 3A). A restricted variety of chromosomal statuses was used, including "normal", "loss" or "gain". Chromosomes loss, normal and gain were detected with sensitivity and specificity of 89, 72 and 31 %, and 81 , 93, 98% respectively, using SNP arrays as a gold standard (Figure 3B, 3C and 3D, Table 6). Of note, 38% of false positive chromosome losses and 26% of false negative normal diploid heterozygous chromosomes correspond to a subset of chromosomic regions with obvious loss of heterozygosity (Figure 3B and 3C), suggesting focal events detected by NGS at the gene level, but skipped by SNP arrays at the chromosome level.
In the validation cohort of 449 tumors, TARGOMICs performance was comparable, with chromosomes loss, normal and gain detected with sensitivity and specificity of 87, 80 and 23%, and 83, 78, 96% respectively, using SNP arrays as a gold standard (Figure 3E, 3F and 3G). An automated detection of gene homozygous deletion and high level gene amplification was also implemented, based on read counts only. 18/22 homozygous deletions were detected (sensitivity: 82% ; specificity of 100% ; Figure 5, Table 4). Assessment of DNA methylation by targeted NGS
Tumoral methylation status is often evaluated for CpG islands in gene promoter regions. Determining methylation status by NGS is challenging for several reasons. First sequences are repetitive (CG > 50%). Moreover these genomic regions commonly display stretches of homopolymers. The latter are responsible for false positive indels, especially when using semiconductor targeted NGS. In addition, when determining DNA methylation status by bisulfite treatment and NGS, unmethylated cytosins are transformed into uracyls (then thymins after PCR), whereas methylated cytosins remain cytosins. Thereafter, cytosins become rare, and the 4-bases genetic code becomes a 3-bases code. For these reasons, standard aligners do not perform well.
90 CpG islands (15 distinct islands from 12 samples of adrenocortical carcinomas) were analyzed. Using BISMARK [41 ], an aligner specifically developed for NGS after bisulfite, the number of sequences properly aligned was low, with a median coverage depth of 635 reads per CpG despite high starting number of reads (median: 137,484 reads per sample for 8 CpG islands; range: 84,075 to 560,494; Figure 4A). In this context, whether getting rid of homopolymers could increase the alignment performance was tested. After homopolymer compression (and their common false positive indels), coverage depth was not increased, with a median of 528 reads per CpG (Student p=0.13; Figure 4A).
BISMARK alignment (using bowtie ) is based on multiple seeding all over reference sequences. The inventors get the idea of testing wether restricting the alignment seeding to primers 3' end to increase coverage. This original alignment technique was implemented in TARGOMICs. Using TARGOMICs, median coverage depth significantly increased to 3990 reads per CpG (Student p<10"12; Figure 4A). Of note proportion of methylated allele counted for each CpG by BISMARK and
TARGOMICs remains highly correlated (Pearson r=0.86, p<10"16; Figure 4B), thus validating TARGOMICs. For 6 tumors, pangenomic methylation array were performed. These tumors were classified as CpG methylator phenotype (CIMP) high (N=2), CIMP intermediate (N=2), or non-CIMP (N=2) depending on the global level of methylation in CpG islands. Targeted methylation measurements confirmed this classification, with proportions of methylated CpGs significantly higher in CIMP-high and CIMP- intermediate compared to non-CIMP (paired Student p=0.049 and 0.1 1 respectively; Figure 4C). Using these 6 tumors as a training set, within the CpG islands, the CpGs, with a significant and positive correlation between the global tumor methylation level measured by methylation array, and the methylation measured by TARGOMICs were looked for. 22 CpGs from 4 islands have been identified (not shown).
The performance of methylation measurements by the method of the Inventors (so called TARGOMICs) was confirmed in an extended cohort of 20 additional adrenocortical carcinomas, using the 22 selected CpGs. Proportions of methylated CpGs measured by NGS strongly correlated with methylation status measured by MS-MLPA (not shown).
Inventors have developped a targeted NGS method to detect chromosomal alterations and DNA methylation status, in addition to calling mutations using an original algorithm. Combining such independent information into a single analysis should improve tumor classification for each patient. This opens the way to fully exploit in clinical routine the recent molecular discoveries arisen from massive pangenomic analyses.
Performance of NGS for calling chromosomal alterations was assessed against SNP arrays. Especially SNP arrays were used to generate DNA copy number and allelic ratios for entire chromosome arms. Considering entire chromosome arms instead of gene regions warrants a robust and sensitive detection chromosome alterations by SNP array. Indeed, targeted NGS method developped by the inventors is even able to properly detect losses of one DNA copy, provided allelic ratios of heterozygous SNPs are considered.
In oncogenetic, DNA methyl status is important for prognosis and potentially for treatment orientation. CpG island hypermethylation is a well-known mechanism of tumor suppressor . Inventors also developed a pipeline optimized for using targeted NGS for calling methylation status which is fully integrated within the developped targeted NGS method of the invention. In terms of experimental procedure, this requires adding a second sequencing library (a single pool library from bisulfite treated DNA), to the first two-pools library required for standard targeted NGS panel and heterozygous SNPs. Including methylation analysis in the same NGS experiment provides a particular advantage over the other techniques performing methylation analyses, such as pyrosequencing MS-MLPA or MEDIPseq . Indeed, it does not increase much the time of NGS experiment, all results are available at once, no extra equipment is required, and bisulfite treatment and PCR are easy to handle. The cost of an extra-library is also limited, especially since a limited sequencing depth is sufficient. In addition, using the algorithm developped by the inventors (TARGOMICs), the alignment efficiency has been increased more than 5 times compared to BISMARK [41 ], a commonly used aligner and methylation caller.
1 . Zheng S, Cherniack AD, Dewal N, Moffitt RA, Danilova L, Murray BA, Lerario AM, Else T, Knijnenburg TA, Ciriello G, Kim S, Assie G, Morozova O, Akbani R, Shih J, Hoadley KA, Choueiri TK, Waldmann J, Mete O, Robertson AG, Wu H-
T, Raphael BJ, Shao L, Meyerson M, Demeure MJ, Beuschlein F, Gill AJ, Sidhu SB, Almeida MQ, Fragoso MCBV, Cope LM, Kebebew E, Habra MA, Whitsett TG, Bussey KJ, Rainey WE, Asa SL, Bertherat J, Fassnacht M, Wheeler DA, Hammer GD, Giordano TJ, Verhaak RGW: Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma. Cancer Cell 2016, 29:723-736.
2. Hrdlickova R, Toloue M, Tian B: RNA-Seq methods for transcriptome analysis: RNA-Seq. Wiley Interdisciplinary Reviews: RNA [Internet] 2016 [cited 2016 Jun 24], . Available from: http://doi.wiley.com/10.1002/wrna.1364
3. Tomczak K, Czerwinska P, Wiznerowicz M: Review The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Wspotczesna Onkologia
4. Hausser J, Zavolan M: Identification and consequences of miRNA-target interactions — beyond repression of gene expression. Nature Reviews Genetics 2014, 15:599-612.
5. Russo CD, Di Giacomo G, Cignini P, Padula F, Mangiafico L, Mesoraca A,
D'Emidio L, McCluskey MR, Paganelli A, Giorlandino C: Comparative study of aCGH and Next Generation Sequencing (NGS) for chromosomal microdeletion and microduplication screening. Journal of prenatal medicine 2014, 8:57.
6. Abel HJ, Duncavage EJ: Detection of structural DNA variation from next generation sequencing data: a review of informatic approaches. Cancer
Genetics 2013, 206:432^140.
7. Assie G, Jouinot A, Bertherat J: The "omics" of adrenocortical tumours for personalized medicine. Nature Reviews Endocrinology 2014, 10:215-228.
8. Weisenberger DJ: Characterizing DNA methylation alterations from The Cancer Genome Atlas. Journal of Clinical Investigation 2014, 124:17-23.
9. Shull AY, Noonepalle SK, Lee E-J, Choi J-H, Shi H: Sequencing the Cancer Methylome. In: Verma M, editor. Cancer Epigenetics [Internet], New York, NY, Springer New York, 2015 [cited 2016 Jun 24], pp. 627-651 . Available from: http://link.springer.eom/10.1007/978-1 -4939-1804-1_33
10. Sonnet M, Baer C, Rehli M, Weichenhan D, Plass C: Enrichment of methylated DNA by methyl-CpG immunoprecipitation. Methods Mol Biol 2013, 971 :201-212.
1 1 . Offit K: Decade in review— genomics: A decade of discovery in cancer genomics. Nature Reviews Clinical Oncology 2014, 1 1 :632-634.
12. Cline MS, Craft B, Swatloski T, Goldman M, Ma S, Haussler D, Zhu J: Exploring TCGA Pan-Cancer Data at the UCSC Cancer Genomics Browser. Scientific Reports [Internet] 2013 [cited 2016 Jun 24], 3. Available from: http://www.nature.com/articles/srep02652
13. Zhao Q, Shi X, Xie Y, Huang J, Shia B, Ma S: Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. Briefings in Bioinformatics 2015, 16:291 -303.
14. Neapolitan R, Horvath CM, Jiang X: Pan-cancer analysis of TCGA data reveals notable signaling pathways. BMC Cancer [Internet] 2015 [cited 2016 Jun 13], 15. Available from: http://www.biomedcentral.com/1471 - 2407/15/516
15. Nakagawa H, Wardell CP, Furuta M, Taniguchi H, Fujimoto A: Cancer whole-genome sequencing: present and future. Oncogene 2015, 34:5943-
16. Ching T, Peplowska K, Huang S, Zhu X, Shen Y, Molnar J, Yu H, Tiirikainen M, Fogelgren B, Fan R, Garmire LX: Pan-Cancer Analyses Reveal Long Intergenic Non-Coding RNAs Relevant to Tumor Diagnosis, Subtyping and Prognosis. EBioMedicine 2016, 7:62-72.
17. Vockley JG, Niederhuber JE: Diagnosis and treatment of cancer using genomics. BMJ 2015, 350:h1832-h1832.
18. Wijesinghe P, Bollig-Fischer A: Lung Cancer Genomics in the Era of Accelerated Targeted Drug Development. In: Ahmad A, Gadgeel SM, editors. Lung Cancer and Personalized Medicine: Novel Therapies and Clinical
Management [Internet], Cham, Springer International Publishing, 2016 [cited 2016 Jun 24], pp. 1-23. Available from: http://link.springer.com/10.1007/978-3- 319-24932-2 1 19. Tang B, Hsu P-Y, Huang TH-M, Jin VX: Cancer omics: From regulatory networks to clinical outcomes. Cancer Letters 2013, 340:277-283.
20. Desai A, Jere A: Next-generation sequencing: ready for the clinics? Clinical Genetics 2012, 81 :503-510.
21 . Xuan J, Yu Y, Qing T, Guo L, Shi L: Next-generation sequencing in the clinic: Promises and challenges. Cancer Letters 2013, 340:284-295.
22. Dietel M, Johrens K, Laffert MV, Hummel M, Blaker H, Pfitzner BM, Lehmann A, Denkert C, Darb-Esfahani S, Lenze D, others: A 2015 update on predictive molecular pathology and its role in targeted cancer therapy: a review focussing on clinical relevance. Cancer Gene Therapy 2015, 22:417-430.
23. Sikkema-Raddatz B, Johansson LF, de Boer EN, Almomani R, Boven LG, van den Berg MP, van Spaendonck-Zwarts KY, van Tintelen JP, Sijmons RH, Jongbloed JDH, Sinke RJ: Targeted Next-Generation Sequencing can Replace Sanger Sequencing in Clinical Diagnostics. Human Mutation 2013, 34:1035-1042.
24. Shao D, Lin Y, Liu J, Wan L, Liu Z, Cheng S, Fei L, Deng R, Wang J, Chen X, Liu L, Gu X, Liang W, He P, Wang J, Ye M, He J: A targeted next- generation sequencing method for identifying clinically relevant mutation profiles in lung adenocarcinoma. Scientific Reports 2016, 6:22338.
25. Devarajan B, Prakash L, Kannan TR, Abraham AA, Kim U,
Muthukkaruppan V, Vanniarajan A: Targeted next generation sequencing of RB1 gene for the molecular diagnosis of Retinoblastoma. BMC Cancer [Internet] 2015 [cited 2016 Jun 17], 15. Available from: http://www.biomedcentral.com/1471 -2407/15/320
26. Boonham N, Kreuze J, Winter S, van der Vlugt R, Bergervoet J,
Tomlinson J, Mumford R: Methods in virus diagnostics: From ELISA to next generation sequencing. Virus Research 2014, 186:20-31 .
27. Pak TR, Kasarskis A: How Next-Generation Sequencing and Multiscale Data Analysis Will Transform Infectious Disease Management. Clinical Infectious Diseases 2015, :civ670.
28. Barzon L, Lavezzo E, Militello V, Toppo S, Paid G: Applications of Next-Generation Sequencing Technologies to Diagnostic Virology. International Journal of Molecular Sciences 201 1 , 12:7861-7884. 29. Shen R, Seshan VE: FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Research 2016, :gkw520.
30. Johansson LF, van Dijk F, de Boer EN, van Dijk-Bos KK, Jongbloed JDH, van der Hout AH, Westers H, Sinke RJ, Swertz MA, Sijmons RH,
Sikkema-Raddatz B: CoNVaDING: Single Exon Variation Detection in Targeted NGS Data. Human Mutation 2016, 37:457-464.
31 . Boeva V, Popova T, Bleakley K, Chiche P, Cappo J, Schleiermacher G, Janoueix-Lerosey I, Delattre O, Barillot E: Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 2012, 28:423^125.
32. Duan J, Zhang J-G, Deng H-W, Wang Y-P: CNV-TV: A robust method to discover copy number variation from short sequencing reads. BMC Bioinformatics 2013, 14:150.
33. Zhao X, Wang A, Walter V, Patel NM, Eberhard DA, Hayward MC,
Salazar AH, Jo H, Soloway MG, Wilkerson MD, Parker JS, Yin X, Zhang G, Siegel MB, Rosson GB, Earp HS, Sharpless NE, Gulley ML, Week KE, Hayes DN, Moschos SJ: Combined Targeted DNA Sequencing in Non-Small Cell Lung Cancer (NSCLC) Using UNCseq and NGScopy, and RNA Sequencing Using UNCqeR for the Detection of Genetic Aberrations in NSCLC. Calogero RA, editor. PLOS ONE 2015, 10:e0129280.
34. Winchester L, Yau C, Ragoussis J: Comparing CNV detection methods for SNP arrays. Briefings in Functional Genomics and Proteomics 2009, 8:353-366.
35. Zhang X, Du R, Li S, Zhang F, Jin L, Wang H: Evaluation of copy number variation detection for a SNP array platform. BMC bioinformatics 2014, 15:1 .
36. Zhang D, Qian Y, Akula N, Alliey-Rodriguez N, Tang J, Gershon ES, Liu C, others: Accuracy of CNV detection from GWAS data. PLoS One 201 1 , 6:e1451 1 .
37. Barreau O, de Reynies A, Wilmot-Roussel H, Guillaud-Bataille M, Auzan C, Rene-Corail F, Tissier F, Dousset B, Bertagna X, Bertherat J, Clauser E, Assie G: Clinical and Pathophysiological Implications of Chromosomal Alterations in Adrenocortical Tumors: An Integrated Genomic Approach. The Journal of Clinical Endocrinology & Metabolism 2012, 97:E301 -E31 1 .
38. Assie G, Letouze E, Fassnacht M, Jouinot A, Luscap W, Barreau O, Omeiri H, Rodriguez S, Perlemoine K, Rene-Corail F, Elarouci N, Sbiera S, Kroiss M, Allolio B, Waldmann J, Quinkler M, Mannelli M, Mantero F,
Papathomas T, De Krijger R, Tabarin A, Kerlan V, Baudin E, Tissier F, Dousset B, Groussin L, Amar L, Clauser E, Bertagna X, Ragazzon B, Beuschlein F, Libe R, de Reynies A, Bertherat J: Integrated genomic characterization of adrenocortical carcinoma. Nature Genetics 2014, 46:607-612.
39. Popova T, Manie E, Stoppa-Lyonnet D, Rigaill G, Barillot E, Stern MH:
Genome Alteration Print (GAP): a tool to visualize and mine complex cancer genomic profiles obtained by SNP arrays. Genome biology 2009, 10:1 .
40. Li L-C, Dahiya R: MethPrimer: designing primers for methylation PCRs. Bioinformatics 2002, :1427-1431 .
41 . Krueger F, Andrews SR: Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 201 1 , 27:1571-1572.
42. Tariq K, Ghias K: Colorectal cancer carcinogenesis: a review of mechanisms. Cancer biology & medicine 2016, 13:120.
43. Assie G, LaFramboise T, Platzer P, Bertherat J, Stratakis CA, Eng C: SNP arrays in heterogeneous tissue: highly accurate collection of both germline and somatic genetic information from unpaired single tumor samples. Am J Hum Genet 2008, 82:903-915.
44. Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods 2012, 9:357-359.
45. Harrison CJ: Targeting signaling pathways in acute lymphoblastic leukemia: new insights. Hematology Am Soc Hematol Educ Program 2013, 2013:1 18-125.
46. Moalic-Allain V, Mercier B, Gueguen P, Ferec C: Next generation sequencing with a semi-conductor technology (Ion Torrent PGM™) for HLA typing: overall workflow performance and debate. Ann Biol Clin (Paris) 2016,
47. Witte T, Plass C, Gerhauser C: Pan-cancer patterns of DNA methylation. Genome Med 2014, 6:66. 48. Jeong HM, Lee S, Chae H, Kim R, Kwon MJ, Oh E, Choi Y-L, Kim S, Shin YK: Efficiency of methylated DNA immunoprecipitation bisulphite sequencing for whole-genome DNA methylation analysis. Epigenomics 2016, 8:1061-1077.
49. Gicquel C, Bertagna X, Gaston V, Coste J, Louvel A, Baudin E,
Bertherat J, Chapuis Y, Duclos JM, Schlumberger M, Plouin PF, Luton JP, Le Bouc Y. Molecular markers and long-term recurrences in a large cohort of patients with sporadic adrenocortical tumors. Cancer Res. 2001 , 61 (18): 6762- 6767.
50. Le Tourneau C, Delord J-P, Gongalves A, Gavoille C, Dubot C,
Isambert N, Campone M, Tredan O, Massiani M-A, Mauborgne C, Armanet S, Servant N, Bieche I, Bernard V, Gentien D, Jezequel P, Attignon V, Boyault S, Vincent-Salomon A, Servois V, Sablin M-P, Kamal M, Paoletti X, SHIVA investigators: Molecularly targeted therapy based on tumour molecular profiling versus conventional therapy for advanced cancer (SHIVA): a multicentre, open- label, proof-of-concept, randomised, controlled phase 2 trial. Lancet Oncol 2015, 16:1324-1334.
51 . Jouinot A, Assie G, Libe R, Fassnacht M, Papathomas T, Barreau O, de la Villeon B, Faillot S, Hamzaoui N, Neou M, Perlemoine K, Rene-Corail F, Rodriguez S, Sibony M, Tissier F, Dousset B, Sbiera S, Ronchi C, Kroiss M,
Korpershoek E, de Krijger R, Waldmann J, K D, Bartsch, Quinkler M, Haissaguerre M, Tabarin A, Chabre O, Sturm N, Luconi M, Mantero F, Mannelli M, Cohen R, Kerlan V, Touraine P, Barrande G, Groussin L, Bertagna X, Baudin E, Amar L, Beuschlein F, Clauser E, Coste J, Bertherat J: DNA Methylation Is an Independent Prognostic Marker of Survival in Adrenocortical
Cancer. J Clin Endocrinol Metab 2017, 102:923-932.
1. A Next-Generation DNA Sequencing (NGS) method of analysing a cancer of a patient comprising the detection, in a sample of said patient, of:
- at least one characteristic alteration of chromosome regions identified for said cancer from a set of genes from these regions,
- specific gene mutations or at least one characteristic pattern of mutations in a set of genes identified in said cancer, and
- at least one characteristic pattern of DNA methylation status of chromosome regions identified as having an altered methylation status in said cancer,
all these three detection steps being implemented in a single NGS design wherein NGS is used in all these three detection steps. 2. The NGS method according to claim 1 , wherein the detection step of detecting at least one characteristic alteration of chromosome regions comprises identifying homozygous deletions and loss of heterozygosity (LOH) within the set of genes of said chromosome regions identified for said cancer. 3. The NGS method according to the preceding claim, wherein the detection step of detecting at least one characteristic alteration of chromosome regions further comprises analysing at least 5 SNPs per chromosome arm of interest for searching heterozygous deletions or LOH (Loss of Heterozygosity), said at least 5 SNPs being known to be highly heterozygous in the general population.
4. The NGS method according to any of claims 2 and 3, wherein the step of detecting at least one characteristic alteration of chromosome regions is performing by a combined analysis of the allelic ratio and the amplicons read counts.
5. The NGS method according to any of claims 1 to 4, wherein the step of detecting at least one specific pattern of DNA methylation status is carried out onto bisulfite-treated DNA.
6. The NGS method according to claim 5, wherein the detection of at least one specific pattern of DNA methylation comprises the analysis of the methylation status of the CpG islands.
7. The NGS method according to claim 5 or 6, wherein the alignment of sequences of NGS methylation analysis is operated (i) after a step of replacing the stretches of identical bases by only one corresponding base, except around the CpGs, the dinucleotides CG, TG and CA being excluded from this compression and (ii) wherein the alignment over the reference sequence is restricted to the use of 3' primers end.
8. The NGS method according to any of the preceding claims, wherein analysing the cancer of a patient comprises the assignment of said patient to a specific group of patients corresponding to a specific molecular type of tumor.
9. The NGS method according to the preceding claim, wherein the assignment of said patient to a specific group of patients is used to prognose the response of said patient to a treatment, or to prognose the survival time of said patient.
10. The NGS method according to claim 1 , wherein the cancer is selected in the group comprising adrenocortical carcinoma, breast cancer, colorectal cancer, ovarian cancer, lung cancer, pancreatic cancer, sarcoma, urothelial cancer, head and neck squamous cell carcinoma, adenoma carcinoma with unknown primitive tumor, endometrial cancer, cervical cancer, oesogastric cancer, adenoid cystic carcinoma, cholangiocarcinoma, neuroendocrine tumor, melanoma, anal squamous cell carcinoma, kidney cancer, uveal melanoma, germline tumor, hepatocellular carcinoma, parotid cancer, thyroid cancer, undifferentiated nasopharyngeal cancer of the cavum, Merkel cell carcinoma, mesothelioma, penile squamous cell carcinoma, peritoneal cancer, chemodectoma, corticosurrenaloma, desmoid tumor, epithelioid hemangiocarcinoma, meningioma, midline carcinoma, mixopapillary ependymoma, non-adenoid cystic carcinoma salivary gland tumor, ocular adenocarcinoma, pelvic squamous cell carcinoma, pleiomorphic carcinoma of the tongue, prostate adenocarcinoma, thymic cancer, and squamous cell carcinoma of the vulva.
11. The NGS method according to claim 10, wherein the cancer is an adrenocortical carcinoma and:
the chromosome regions identified as bearing at least one characteristic patterns of alteration and at least one characteristic pattern of mutations for said cancer are the regions comprising at least one of the genes selected from HSD3B2, CTNNB1, APC, CYP21A2, DAXX, BAH, CDKN2A, CYP17A1, MEN1, MCAM, RB1, TP53, RNF43, ZNRF3, MED12 and CDK4.
the chromosome regions identified as having an altered methylation status comprise at least one of CpG region selected from Cg07384961 , Cg14021073, Cg21494776, Cg23130254, Cg20312228, Cg01635061 , Cg27234090, Cg04582938, Cg06039392, Cg10167296, Cg10743104, Cg27425675, Cg16689634, Cg01 120165 and Cg15284635.
12. A kit comprising a single NGS design for analysing (i) specific alterations of chromosome regions by using one or more primer sets corresponding to at least one gene selected from, ZNRF3, TP53, RB1 , CDKN2A, CDK4, (ii) specific gene mutations by using one or more primer sets corresponding to at least one gene selected from, ZNRF3, TP53, RB1 , CDKN2A, CDK4, and (iii) DNA methylation status of at least one CpG island selected from Cg07384961 , Cg14021073, Cg21494776 and Cg23130254 using one or more corresponding primer sets.
13. Use of a kit according to claims 12 for analysing adrenocortical carcinoma.
14. A NGS method for sequencing CpG islands comprising the steps of: i) replacing the stretches of identical bases by only one base, except around the CpGs, the dinudeotides CG, TG and CA being excluded from this compression, and
ii) performing the alignment of sequences by restricting the alignment seeding to the 3' end of each primer.