Methods Of Generating Libraries And Uses Thereof

  *US09637556B2*
  US009637556B2                                 
(12)United States Patent(10)Patent No.: US 9,637,556 B2
  et al. (45) Date of Patent:*May  2, 2017

(54)Methods of generating libraries and uses thereof 
    
(75)Inventor: ANAPTYSBIO, INC.,  San Diego, CA (US) 
(73)Assignee:AnaptysBio, Inc.,  San Diego, CA (US), Type: US Company 
(*)Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 642 days. 
  This patent is subject to a terminal disclaimer. 
(21)Appl. No.: 14/052,451 
(22)Filed: Oct.  11, 2013 
(65)Prior Publication Data 
 US 2014/0094392 A1 Apr.  3, 2014 
 Related U.S. Patent Documents 
(62) .
Division of application No. 12/070,904, filed on Feb.  20, 2008, now Pat. No. 8,603,950 .
 
(60)Provisional application No. 60/902,414, filed on Feb.  20, 2007.
 
 Provisional application No. 60/904,622, filed on Mar.  1, 2007.
 
 Provisional application No. 60/995,970, filed on Sep.  28, 2007.
 
 Provisional application No. 61/020,124, filed on Jan.  9, 2008.
 
Jan.  1, 2013 C 07 K 16 461 F I May  2, 2017 US B H C Jan.  1, 2013 C 07 K 16 00 L I May  2, 2017 US B H C Jan.  1, 2013 C 07 K 16 22 L I May  2, 2017 US B H C Jan.  1, 2013 C 07 K 16 40 L I May  2, 2017 US B H C Jan.  1, 2013 C 07 K 2317 21 L A May  2, 2017 US B H C Jan.  1, 2013 C 07 K 2317 565 L A May  2, 2017 US B H C Jan.  1, 2013 C 07 K 2317 92 L A May  2, 2017 US B H C Jan.  1, 2013 C 40 B 50 06 L A May  2, 2017 US B H C
(51)Int. Cl. C40B 050/06 (20060101); C07K 016/46 (20060101); C07K 016/00 (20060101); C07K 016/22 (20060101); C07K 016/40 (20060101)
(58)Field of Search  None

 
(56)References Cited
 
 U.S. PATENT DOCUMENTS
 3,773,919  A  11/1973    Boswell et al.     
 4,275,149  A  6/1981    Litman et al.     
 4,318,980  A  3/1982    Boguslaski et al.     
 4,356,270  A  10/1982    Itakura     
 4,683,195  A  7/1987    Mullis et al.     
 4,683,202  A  7/1987    Mullis et al.     
 4,737,456  A  4/1988    Weng et al.     
 4,816,397  A  3/1989    Boss et al.     
 4,816,567  A  3/1989    Cabilly et al.     
 4,937,190  A  6/1990    Palmenberg et al.     
 4,959,317  A  9/1990    Sauer     
 5,070,012  A  12/1991    Nolan et al.     
 5,122,464  A  6/1992    Wilson et al.     
 5,221,623  A  6/1993    Legocki et al.     
 5,464,758  A  11/1995    Gossen et al.     
 5,514,548  A  5/1996    Krebber et al.     
 5,571,698  A  11/1996    Ladner et al.     
 5,650,289  A  7/1997    Wood     
 5,654,182  A  8/1997    Wahl et al.     
 5,674,713  A  10/1997    McElroy et al.     
 5,677,177  A  10/1997    Wahl et al.     
 5,683,888  A  11/1997    Campbell     
 5,698,417  A  12/1997    Robinson et al.     
 5,698,426  A  12/1997    Huse     
 5,723,323  A  3/1998    Kauffman et al.     
 5,741,657  A  4/1998    Tsien et al.     
 5,750,335  A  5/1998    Gifford     
 5,770,359  A  6/1998    Wilson et al.     
 5,770,428  A  6/1998    Boris-Lawrie     
 5,814,618  A  9/1998    Bujard et al.     
 5,827,739  A  10/1998    Wilson et al.     
 5,843,746  A  12/1998    Tatsumi et al.     
 5,843,757  A  12/1998    Vogelstein et al.     
 5,885,793  A  3/1999    Griffiths et al.     
 5,885,827  A  3/1999    Wahl et al.     
 5,955,604  A  9/1999    Tsien et al.     
 6,027,933  A  2/2000    Huse     
 6,083,719  A  7/2000    Momparler et al.     
 6,146,894  A  11/2000    Nicolaides et al.     
 6,291,158  B1  9/2001    Winter et al.     
 6,291,159  B1  9/2001    Winter et al.     
 6,291,160  B1  9/2001    Lerner et al.     
 6,291,161  B1  9/2001    Lerner et al.     
 6,294,353  B1  9/2001    Pack et al.     
 6,300,064  B1  10/2001    Knappik et al.     
 6,331,415  B1  12/2001    Cabilly et al.     
 6,455,253  B1  9/2002    Patten et al.     
 6,576,468  B1  6/2003    Nicolaides et al.     
 6,610,477  B1  8/2003    Haseltine et al.     
 6,645,492  B2  11/2003    Levitt et al.     
 6,653,068  B2  11/2003    Frisch et al.     
 6,656,736  B2  12/2003    Nicolaides et al.     
 6,675,105  B2  1/2004    Hogarth et al.     
 6,696,248  B1  2/2004    Knappik et al.     
 6,699,658  B1  3/2004    Wittrup et al.     
 6,706,484  B1  3/2004    Knappik et al.     
 6,713,279  B1  3/2004    Short     
 6,723,433  B2  4/2004    Bacon, Jr.     
 6,737,268  B1  5/2004    Nicolaides et al.     
 6,740,506  B2  5/2004    Short et al.     
 6,806,079  B1  10/2004    McCafferty et al.     
 6,808,894  B1  10/2004    Nicolaides et al.     
 6,815,194  B2  11/2004    Honjo et al.     
 6,825,038  B2  11/2004    Nicolaides et al.     
 6,828,422  B1  12/2004    Achim et al.     
 6,835,753  B2  12/2004    Baell et al.     
 6,893,845  B1  5/2005    Huse     
 6,900,370  B2  5/2005    Nicolaides et al.     
 6,919,183  B2  7/2005    Fandl et al.     
 6,921,666  B2  7/2005    Nicolaides et al.     
 6,969,586  B1  11/2005    Lerner et al.     
 7,083,966  B2  8/2006    Honjo     
 7,112,715  B2  9/2006    Chambon et al.     
 7,122,339  B2  10/2006    Sale et al.     
 7,314,621  B2  1/2008    Honjo et al.     
 8,603,950  B2  12/2013    Bowers et al.     
 2002//0051976  A1  5/2002    Patten et al.     
 2002//0155453  A1  10/2002    Sale et al.     
 2002//0164743  A1  11/2002    Honjo et al.     
 2003//0087236  A1  5/2003    Sale et al.     
 2003//0096401  A1  5/2003    Huse et al.     
 2003//0108889  A1  6/2003    Sale et al.     
 2003//0119190  A1  6/2003    Wang et al.     
 2003//0153038  A1  8/2003    Ohlin et al.     
 2003//0198971  A1  10/2003    Balint et al.     
 2004//0038317  A1  2/2004    Balint     
 2004//0115695  A1  6/2004    Grasso et al.     
 2004//0132066  A1  7/2004    Balint     
 2004//0158886  A1  8/2004    Nicolaides et al.     
 2004//0175756  A1  9/2004    Kolkman et al.     
 2004//0219144  A1  11/2004    Shelton et al.     
 2004//0228862  A1  11/2004    Shelton et al.     
 2004//0237124  A1  11/2004    Pons et al.     
 2004//0242517  A1  12/2004    Cascalho et al.     
 2004//0253244  A1  12/2004    Shelton et al.     
 2005//0008625  A1  1/2005    Balint et al.     
 2005//0014761  A1  1/2005    Hoffinann et al.     
 2005//0022686  A1  2/2005    Wessels et al.     
 2005//0026246  A1  2/2005    Sale et al.     
 2005//0037421  A1  2/2005    Honda et al.     
 2005//0048051  A1  3/2005    Reynaud et al.     
 2005//0048512  A1  3/2005    Kolkman et al.     
 2005//0048578  A1  3/2005    Zhang     
 2005//0048621  A1  3/2005    Grasso et al.     
 2005//0053973  A1  3/2005    Kolkman et al.     
 2005//0054073  A1  3/2005    Honjo et al.     
 2005//0074821  A1  4/2005    Wild et al.     
 2005//0089932  A1  4/2005    Kolkman et al.     
 2005//0095712  A1  5/2005    Martin et al.     
 2005//0106667  A1  5/2005    Fellouse et al.     
 2005//0119455  A1  6/2005    Fuh et al.     
 2005//0142106  A1  6/2005    Wittrup et al.     
 2005//0164301  A1  7/2005    Kolkman et al.     
 2005//0188428  A1  8/2005    Nicolaides et al.     
 2005//0220795  A1  10/2005    Wittrup et al.     
 2005//0221384  A1  10/2005    Kolkman et al.     
 2005//0255552  A1  11/2005    Flynn et al.     
 2005//0255555  A1  11/2005    Johns et al.     
 2005//0265994  A1  12/2005    Shelton et al.     
 2005//0266000  A1  12/2005    Bond et al.     
 2006//0003334  A1  1/2006    Achim et al.     
 2006//0003387  A1  1/2006    Peele et al.     
 2006//0019262  A1  1/2006    Petersen-Mahrt et al.     
 2006//0052585  A1  3/2006    Grawunder et al.     
 2006//0080745  A1  4/2006    Bergsagel et al.     
 2006//0088884  A1  4/2006    Seifer et al.     
 2006//0099679  A1  5/2006    Tsien et al.     
 2006//0134098  A1  6/2006    Bebbington et al.     
 2006//0147450  A1  7/2006    Shelton     
 2007//0111260  A1  5/2007    Gao et al.     
 2007//0186292  A1  8/2007    Buerstedde et al.     
 2009//0093024  A1  4/2009    Bowers et al.     
 2011//0287485  A1  11/2011    Bowers et al.     

 
 FOREIGN PATENT DOCUMENTS 
 
       EP       0 404 097       A2                6/1990      
       EP       1 174 509       A1                1/2002      
       EP       1 345 495       A4                9/2003      
       EP       1 556 508       A2                7/2005      
       EP       1 572 935       A1                9/2005      
       EP       1 572 971       A2                9/2005      
       JP       2004-33137       A                2/2004      
       WO       WO 92/08796       A1                5/1992      
       WO       WO 93/11161       A1                6/1993      
       WO       WO 93/12228       A1                6/1993      
       WO       WO 94/28143                         12/1994      
       WO       WO 95/12689       A1                5/1995      
       WO       WO 00/22111       A1                4/2000      
       WO       WO 00/73346       A1                12/2000      
       WO       WO 02/100998       A2                12/2002      
       WO       WO 03/095636       A3                11/2003      
       WO       WO 20/04/055182       A1                7/2004      
       WO       WO 20/05/011735       A1                2/2005      
       WO       WO 20/05/014642       A1                2/2005      
       WO       WO 20/05/023865                         3/2005      
       WO       WO 20/05/023865       A2                3/2005      
       WO       WO 20/05/056599       A2                6/2005      
       WO       WO 20/05/080431       A2                9/2005      
       WO       WO 20/06/053021       A2                5/2006      

 OTHER PUBLICATIONS
  
  Adetugbo et al., “Molecular analysis of spontaneous somatic mutants,” Nature 265:299-304 (1977).
  Aggarwal et al., “Synthesis and Screening of a Random Dimeric Peptide Library Using the One-Bead-One-Dimer Combinatorial Approach,” Bioconj. Chem., 17:335-340 (2006).
  Akamatsu et al., “Construction of a Human Ig Combinatorial Library from Genomic V Segments and Synthetic CDR3 Fragments,” J. Immunol., 151:4651-4659 (1993).
  Akamatsu et al., “Whole IgG surface display on mammalian cells: Application to isolation of neutralizing chicken monoclonal anti-IL-12 antibodies,” J. Immunol. Methods, 327:40-52 (2007).
  Akselband et al., “Isolation of Rare Isotype Switch Variants in Hybridoma Cell Lines Using an Agarose Gel Microdrop-Based Protein Secretion Assay,” Assay and Drug Development Technologies, 1(5):619-626 (2003).
  Alla et al., “Extracellular Domains of the Bradykinin B2 Receptor Involved in Ligand Binding and Agonist Sensing Defined by Anti-peptide Antibodies,” J. Biol. Chem., 271 (3):1748-1755 (1996).
  Alt et al., “Immunoglobulin heavy-chain expression and class switching in a murine leukaemia cell line,” Nature, 296(5855):325-331 (1982).
  Amit et al., “Three-Dimensional Structure of an Antigen-Antibody Complex at 2.8 A Resolution,” Science, 233:747-753 (1986).
  Anaptys Presentation at Bio-Europe 2007, Nov. 12-14, Congress Center Hamburg (CCH) Germany, 29 pages.
  Andersen et al., “Screening for Epitope Specificity Directly on Culture Supernatants in the Early Phase of Monoclonal Antibody Production by an ELISA with Biotin-Labeled Antigen,” J. Immunoassay & Immunochemistry, 25(2):147-157 (2004).
  Andersson et al., “Affinity selection and repertoire shift: paradoxes as a consequence of somatic mutation?” Immunological Reviews, 162:173-182 (1998).
  Arakawa et al., “Requirement of the Activation-induced Deaminase (AID) Gene for Immunoglobuin Gene Conversion,” Science, 295:1301-1306 (2002).
  Atanasiu et al., “ORC binding to TRF2 stimulates OriP replication,” EMBO Reports, 7(7):716-721 (2006).
  Atochina et al., “Comparison of results using the gel microdrop cytokine secretion assay with ELISPOT and intracellular cytokine staining assay,” Cytokine, 27:120-128 (2004).
  Ayriss et al., “High-Throughput Screening of Single-Chain Antibodies Using Multiplexed Flow Cytometry,” J. Proteome Res., 6:1072-1082 (2007).
  Azuma, “Somatic hypermutation in mouse λ chains,” Immunological Reviews, 162:97-105 (1998).
  Babcock et al., “Ligand Binding Characteristics of CXCR4 Incorporated Into Paramagnetic Proteoliposomes,” J. Biol. Chem., 276(2):38433-38440 (2001).
  Bachl et al., “Increased transcription levels induce higher mutation rates in a hypermutating cell line,” J. Immunol., 166(8):5051-5057 (2001).
  Bahler et al., “Clonal evolution of a follicular lymphoma: Evidence for antigen selections,” PNAS USA, 89:6770-6774 (1992).
  Barbas et al., “In vitro Evolution of a Neutralizing Human Antibody to Human Immunodeficiency Virus Type I to Enhance Affinity and Broaden Strain Cross-Reactivity,” PNAS, 91:3809-3813 (1994).
  Bass et al., “Hormone Phage: An Enrichment Method for Variant Proteins With Altered Binding Properties,” Proteins, 8:309-314 (1990).
  Batista et al., “Affinity Dependence of the B Cell Response to Antigen: A Threshold, a Ceiling, and the Importance of Off-Rate,” Immunity, 9:751-759 (1998).
  Becker et al., “Ultra-high-throughput screening based on cell-surface display and fluorescence-activated cell sorting for the indentification of novel biocatalysts.” Curr. Op. Biotech., 15(4):323-329 (2004).
  Becker et al., “A Three-Hybrid Approach to Scanning the Proteome for Targets of Small Molecule Kinase Inhibitors,” Chem. Biol., 11:211-223 (2004).
  Bemark et al., “The c-MYC allele that is translocated into the IgH locus undergoes constitutive hypermutation in a Burkitt's lymphoma line,” Oncogene, 19(30):3404-3410 (2000).
  Berek et al., “The dynamic nature of the antibody repertorie,” Immunol. Rev., 105:5-26 (1988).
  Berger et al., “Secreted placental alkaline phosphatase: a powerful new quantitative indicator of gene expression in eukaryotic cells,” Gene, 66: 1-10 (1988).
  Besmer et al., “The transcription elongation complex directs activation-induced cytidine deaminase-mediated DNA deamination,” Mol. Cell. Biol., 26(11):4378-85 (2006).
  Betz et al., “Discriminating intrinsic and antigen-selected mutational hotspots in immunoglobulin V genes,” Immunol. Today, 14:405-411 (1993).
  Betz et al., “Elements Regulating Somatic Hypermutation of an Immunoglobulin K Gene: Critical Role for the Intron Enhancer/Matrix Attachment Region,” Cell, 77:239-248 (1994).
  Bezzubova et al., “Reduced X-ray resistance and homologous recombination frequencies in a RAD54-I-mutant of the chicken DT40 cell line,” Cell, 89:185-193 (1997).
  Bichet et al., “The ‘Bringer’ Strategy,” Applied Biochem & Biotech., 117:115-122 (2004).
  Bird et al., “Single-Chain Antigen-Binding Proteins,” Science, 242:423-426 (1988).
  Blanden et al., “The signature of somatic hypermutation appears to be written into the germline IgV segment repertoire,” Immunological Reviews, 162:117-132 (1998).
  Boder et al., “Yeast surface display for screening combinatorial polypeptide libraries” Nat. Biotech., 15:553-557 (1997).
  Boder et al., “Optimal Screening of Surface-Displayed Polypeptide Libraries,” Biotechnol. Prog., 14:55-62 (1998).
  Boder et al., “Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity,” PNAS, 97:10701-10705 (2000).
  Boder et al., “Yeast Surface Display for Directed Evolution of Protein Expression, Affinity, and Stability,” Meth. Enzymol., 328:430-444 (2000).
  Bonfield et al., “A new DNA-sequence assembly program,” Nucleic Acids Res. 23:4992-4999 (1995).
  Borth et al., “Efficient Selection of High-Producing Subclones During Gene Amplification of Recombinant Chinese Hamster Ovary Cells by Flow Cytometry and Cell Sorting,” Biotech. Bioeng., 71(4):266-273 (2000-2001).
  Bouvet et al., “From natural polvreactive autoantibodies to a la carte monoreactive antibodies to infectious agents: is it a small world after all?” Infect. Immun., 66:1-4 (1998).
  Boyle, “Harnessing Somatic Hypermutation,” Anaptys Biosciences, Inc., Dec. 13, 2007 Presentation, 27 pages.
  Braeuninger et al., “Hodgkin and Reed-Sternberg cells in lymphocyte predominant Hodgkin disease represent clonal populations of germinal center-derived tumor B cells,” PNAS USA, 94:9337-9342 (1997).
  Bransteiter et al., “Activation-induced cytidine deaminase deaminates deoxycytidine on single-stranded DNA but requires the action of RNase,” PNAS, 100(7):4102-4107 (2003).
  Brar et al., “Activation-induced Cytosine Deaminase (AID) Is Actively Exported out of the Nucleus but Retained by the Induction of DNA Breaks,” J. Biol. Chem., 279(25):26395-26401 (2004).
  Brenneman et al., “XRCC3 is required for efficient repair of chromosome breaks by homologous recombination,” Mutat. Res., 459:89-97 (2000).
  Bross et al., “DNA double-strand breaks in immunoglobulin genes undergoing somatic hypermutaion,” Immunity, 13:589-597 (2000).
  Bruggemann et al., “Immunoglobulin V region variants in hybridoma cells. 1. Isolation of a variant with altered idiotypic and antigen binding specificity,” EMBO J., 1:629-634 (1982).
  Buerstedde et al., “Light chain gene conversion continues at high rate in an ALV-induced cell line,” EMBO J., 9:921-927 (1990).
  Canfield et al., “The Binding Affinity of Human IgG for its High Affinity Fe Receptor Is Determined by Multiple Amino Acids in the Ch2 Domain and Is Modulated by the Hinge Region,” J. Exp. Med., 173:1483-1491 (1991).
  Capizzi et al., “A table for the estimation of the spontaneous mutation rate of cells in culture,” Mutat. Res., 17:147-148 (1973).
  Carpentier et al., “Limiting factors governing protein expression following polyethylenimine-mediated gene transfer in HEK293-EBNA1 cells,” J. Biotech., 128:268-280 (2007).
  Casali et al., “Structure and function of natural antibodies,” Curr. Top. Microbiol. Immunol., 10:167-179 (1996).
  Ceccarelli et al., “Functional Analyses of the EBNA1 Origin DNA Binding Protein of Epstein-Barr Virus,” J. Virol., 74(11):4939-4948 (2000).
  Chang et al., “A Sequence Analysis of Human Germline Ig Vh and VL Genes,” Ann. N.Y. Acad. Sci., 170-179 (1994) Elsevier Science Ltd. 0167-699/94/507.00.
  Chang et al., “The CDR1 sequences of a major proportion of human germline Ig VH genes are inherently susceptible to amino acid replacement,” Imm. Today Jeanette Greenspan Laboratory for Cancer Research (1994).
  Chapman et al., “In vitro selection of catalytic RNAs,” Curr. Op. Struct. Biol., 4:618-622 (1994).
  Chapman et al. “Analysis of VH genes used by neoplastic B cells in endemic Burkitt's lymphoma shows somatic hypermutation and intraclonal heterogeneity,” Blood, 85:2176-2181 (1995).
  Chao et al., “Isolating and engineering human antibodies using yeast surface display,” Nature Protocols, 1(2):755-768 (2006).
  Chappel et al., “Identification of the Fcγ receptor class I binding site in human IgG through the use of recombinant IgGIIIgG2 hybrid and point-mutated antibodies,” PNAS USA, 88:9036-9040 (1991).
  Chau et al., “Dynamic Chromatin Boundaries Delineate a Latency Control Region of Epstein-Barr Virus,” J. Virol., 78(22):12308-12319 (2004).
  Chen et al., “Identification of Key Amino Acid Residues in a Thyrotropin Receptor Monoclonal Antibody Epitope Provides Insight Into Its Inverse Agonist and Antagonist Properties,” Endocrinology, published ahead of print Apr. 3, 2008 doi: 10.1210/en.2008-0207.
  Chothia et al., “Canonical Structures for the Hypervariable Regions of Immunoglobulins,” J. Mol. Biol., 196:901-917 (1987).
  Chothia et al., “Structural Repertoire of the Human VH Segments,” J. Mol. Biol., 227:799-817 (1992).
  Chui et al., .“A reporter gene to analyse the hypermutation of immunoglobulin genes,” J. Mol. Biol., 249:555-563 (1995).
  Clackson et al., “Making antibody fragments using phase display libraries,” Nature, 352:624-628 (1991).
  Clackson et al., “In vitro selection from protein and peptide libraries,” Trends Biotechnol., 12:173-184 (1994).
  Coffino et al., “Rate of somatic mutation in immunoglobulin production by mouse myeloma cells,” PNAS USA, 68(1):219-223 (1971).
  Cohen et al., “Generation of a Monoclonal Antibody Agonist to Toll-Like Receptor 4,” Hybridoma, 24(1):27-35 (2005).
  Colby et al., “Engineering Antibody Affinity by Yeast Surface Display,” Hereditary Disease Foundation NIH CA96504 grant (2004).
  Coker et al., “Genetic and In Vitro Assays of DNA Deamination,” Meth. Enzymol., 408:156-170 (2006).
  Conese et al., “Gene Therapy Progress and Prospects: Episomally maintained selfreplicating systems,” Gene Therapy, 11:1735-1741 (2004).
  Conticello et al., “Evolution of the AID/APOBEC family of polynucleotide (deoxy)cytidine deaminases,” Mol. Biol. Evol., 22(2):367-377 (2005).
  Craenenbroeck et al, “Orientation-dependent gene expression with Epstein-Barr virus-derived vectors,” FEBS Lett., 555:489-494 (2003).
  Cui et al., “The XRCC2 and XRCC3 repair genes are required for chromosome stability in mammalian cells,” Mutat. Res., 434:75-88 (1999).
  Cull et al, “Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor,” PNAS USA, 89:1865-1869 (1992).
  Cumbers et al., “Generation and iterative affinity maturation of antibodies in vitro using hypermutating B-cell lines,” Nat. Biotech., 20(11):1129-1134 (2002).
  Daugherty et al, “Flow cytometric screening of cell-based libraries,” J. Immunol. Methods, 243(1-2):211-227 (2000).
  Davidson et al, “Apolipoprotein B: mRNA Editing, Lipoprotein Assembly, and Presecretory Degradation,” Ann. Rev. Nutr., 20:169-193 (2000).
  Davies et al., “Interactions of protein antigens with antibodies,” PNAS USA, 93:7-12 (1996).
  Deans et al., Xrcc2 is required for genetic stability, embryonic neurogenesis and viability in mice, EMBO J., 19:6675-6685 (2000).
  Denepoux et al., “Induction of somatic mutation in a human B cell line in vitro,” Immunity, 6:35-46 (1997).
  Deng et al., “Telomere Repeat Binding Factors TRFI, TRF2, and hRAP1 Modulate Replication of Epstein-Barr Virus OriP,” J. Virol., 77(22):11992-12001 (2003).
  Deng et al., “Inhibition of Epstein-Barr Virus OriP Function by Tankyrase, a Telomere-Associated Poly-ADP Ribose Polymerase That Binds and Modifies EBNAI.,” J. Virol., 79(8):4640-4650 (2005).
  Diaz et al., “Evolution of somatic hypermutation and gene conversion in adaptive immunity,” Immunological Reviews, 162:13-24 (1998).
  Diaz et al., “Evolution lind the molecular basis of somatic hypermutation of antigen receptor genes,” Philos. Trans. R. Soc. Lond. B Biol. Sci., 356:67-72 (2001).
  Di Scala et al., “Conformational state of human cardiac 5-HT4(g) receptors influences the functional effects of polyclonal anti-5-HT4 receptor antibodies,” Biochem. Pharmacol., 73:964-971 (2007).
  Dmitriev et al., “Analysis of Bispecific Monoclonal Antibody Binding to Immobilized Antigens Using an Optical Biosensor,” Biochem, 67(12):1356-1365 (2002).
  Dorner et al., “Analysis of the targeting of the hypermutational machinery and the impact of subsequent selection on the distribution of nucleotide changes in human VHDJH rearrangements,” Immunological Reviews, 162:161-171 (1998).
  Drake et al., “Rates of spontaneous mutation,” Genetics, 148:1667-1686 (1998).
  Drummond et al., “Why High-error-rate Random Mutagenesis Libraries are Enriched in Functional and Improved Proteins,” J. Mol. Biol., 350:806-816 (2005).
  Duquette et al., “AID binds to transcription-induced structures in c-MYC that map to regions associated with translocation and hypermutation,” Oncogene, 24:5791-5798 (2005).
  Durandy et al., “Activation-Induced Cytidine Deaminase: Structure-Function Relationship as Based on the Study of Mutants,” Hum. Mutat., 27(12):1185-1191 (2006).
  Eglin, “An Overview of High Throughput Screening at G Protein Coupled Receptors,” Frontiers Drug Design Disc., 1:97-111 (2005).
  Elies et al., “Immunochemical and functional characterization of an agonist-like monoclonal antibody against the M2 acetylcholine receptor,” Eur. J. Biochem., 251:659-666 (1998).
  Ellington et al., “In vitro selection of RNA molecules that bind specific ligands,” Nature, 346:818-822 (1990).
  Ewert et al., “Stability Improvement of antibodies for extracellular and interacellular applications: CDR grafting to stable frameworks and structure-based framework engineering,” Methods, 34:184-199 (2004).
  Farinas et al., “Fluorescence Activated Cell Sorting for Enzymatic Activity,” Comb.Chern. High Throughput Screen, 9(4):321-328 (2006).
  Feldhaus et al., “Flow-cytometric isolation of human antibodies from a nonimmune Saccharomyces cerevisiae surface display library,” Nature Biotech., 21(2):163-170 (2003).
  Fellouse et al., “Synthetic antibodies from a four-amino-acid code: a dominant role for tyrosine in antigen recognition,” PNAS USA, 101:12467-12472 (2004).
  Fellouse et al., “Molecular recognition by a binary code,” J. Mol. Biol., 348:1153-1162 (2005).
  Foote et al., “Breaking the affinity ceiling for antibodies and T cell receptors,” PNAS, 97(20):10679-10681 (2000).
  Fruton et al., “IUPAC-IUB Commission on Biochemical Nomenclature Symbols for Amino-Acid Derivatives and Peptides Recommendations,” Biochem., 11(9):1726-1732 (1972).
  Fuhrmann-Benzakein, “Inducible and irreversible control of gene expression using a single transgene,” Nucl. Acid Res., 28(23):e99 (2000).
  Gearhart et al., “Emerging Links Between Hypermutation of Antibody Genes and DNA Polymerases,” Nature Rev. Immunol., (12):187-192 (2001).
  Gearhart, “Antibody Wars: Extreme Diversity,” J. Immunol., 177:4235-4236 (2006).
  Geddie et al., “High Throughput Microplate Screens for Directed Protein Evolution,” Meth. Enzymol., 388:134-145 (2004).
  Ghosh et al., “Design, synthesis, and progress toward optimization of potent small molecule antagonists of CC chemokine receptor 8 (CCR8),” J. Med. Chem., 49(9):2669-2672 (2006).
  Gift et al., “FACS-based isolation of slowly growing cells: Double encapsulation of yeast in gel microdrops,” Nature Biotechnol., 14(7):884-887 (1996).
  Giordano et al., “Biopanning and rapid analysis of selective interactive ligands,” Nat. Med., 7(11):1249-1253 (2001).
  Gold et al., “Diversity of Oligonucleotide Functions,” Annual Rev. Biochem., 64:763-797 (1995).
  Gomez-Gonzalez et al., “Activation-induced cytidine deaminase action is strongly stimulated by mutations of the THO complex,” PNAS, 104(20):8409-8414 (2007).
  Gonzalez-Fernandez et al., “Analysis of somatic hypermutation in mouse Peyer's patches using immunoglobulin K light-chain transgenes,” PNAS USA, 90:9862-9866 (1993).
  Goodman et al., “Identifying protein-protein interactions in somatic hypermutation,” JEM, 201(4):493-496 (2005).
  Goossens et al., “Frequent occurrence of deletions and duplications during somatic hypermutation: Implications for oncogene translocations and heavy chain disease,” PNAS USA, 95:2463-2468 (1998).
  Goshorn et al., “Common Structural Features among Monoclonal Antibodies Binding the Same Antigenic Region of Cytochrome C*,” J. Biol. Chem., 266(4):2134-2142 (1991).
  Goyenechea et al., “Cells strongly expressing Igk transgenes show clonal recruitment of hypermutation: a role for both MAR and the enhancers,” EMBO J., 16(13):3987-3994 (1997).
  Graff et al., “Directed evolution of an anti-carcinoembryonic antigen scFv with a 4-day monovalent dissociation half-time at 370 C,” Protein Eng. Des. Selection, 17(4):293-304 (2004).
  Green et al., “Selection of a Ribozyrne That Functions as a Superior Template in a Self-Copying Reaction,” Science, 258(5090):1910-1915 (1992).
  Green et al., “Ig V Region Hypermutation in B Cell Hybrids Mimics In Vivo Mutation and Allows for Isolation of Clonal Variants,” Mol. Immunol., 34(15):1095-1103 (1997).
  Green et al., “Immunoglobulin hypermutation in cultured cells,” Immunological Reviews, 162:77-87 (1998).
  Gribskov et al., “The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression,” Nucl. Acids Res., 12(1):539-549 (1984).
  Griffin et al., “Mammalian recombination-repair genes XRCC2 and XRCC3 promote correct chromosome segregation,” Nat. Cell Biol., 2:757-761 (2000).
  Gronowicz et al., “Surface Ig isotypes on cells responding to Lipopolysaccharide by IgM and IgG secretion,” J. Immunol., 123(5):2049-2056 (1979).
  Gunneriusson et al., “Surface Display of a Functional Single-Chain Fv Antibody on Staphylococci,” J. Bacteriol., 78(5):1341-1346 (1996).
  Gupta et al., “Conformation State-Sensitive Antibodies to G-protein-coupled Receptors,” J. Biol. Chem., 282(8):5116-5124 (2007).
  Gupta et al., “Post-activation-mediated Changes in Opioid Receptors Detected by N-terminal Antibodies,” J. Biol. Chem., 283(16):10735-10744 (2008).
  Gurevich et al., “How and why do GPCRs dimerize?” Trends in Pharmacol. Sciences, 29(5): 234-240 (2008) doi:10.1016/j.tips.2008.02.004.
  Gustafsson et al., “Codon bias and heterologous protein expression,” Trends Biotech., 22(7):346-353 (2004).
  Haas et al., “Continuous Autotropic Signaling by Membrane-expressed Tumor Necrosis Factor,” J. Biol. Chem., 274(25):18107-18112 (1999).
  Harris et al., “AID is Essential for Immunoglobulin V Gene Conversion in a Cultured B Cell Line,” Curr. Biol., 12:493-503 (2001).
  Hawkins et al., “The Contribution of Contact and Non-contact Residues of Antibody in the Affinity of Binding to Antigen,” J. Mol. Biol., 234:958-964 (1993).
  Hebner et al., “The spacing between adjacent binding sites in the family of repeats affects the functions of Epstein-Barr nuclear antigen 1 in transcription activation and stable plasmid maintenance,” Virology, 311:263-274 (2003).
  Heinzel et al., “Use of Simian Virus 40 Replication to Amplify Epstein-Barr Virus Shuttle Vectors in Human Cells,” J. Virol., 62(10):3738-3746 (1988).
  Hershberg et al., “Differences in potential for amino acid change after mutation reveals distinct strategies for K and Alight-chain variation,” PNAS, 103(43):15963-15968 (2006).
  Hiral. et al., “Replication Licensing of the EBV oriP Minichromosome,” Curr. Top. Microbiol. Immunol., 258:13-33 (2001).
  Hoffmann et al., “Rapid translation system: A novel cell-free way from gene to protein,” Biotechnol. Annu. Rev., 10:1-30 (2004).
  Ho et al., “Genetically Engineered Saccharomyces yeast Capable of Effective Cofermentation of Glucose and Xylose,” Appl. Environ. Microbiol., 64(5):1852-1859 (1998).
  Holliger et al., Diabodies: Small bivalent and bispecific antibody fragments, PNAS USA 90:6444-6448 (1993).
  Holliger et al., “Engineered antibody fragments and the rise of single domains,” Nature Biotech., 23(9):1126-1136 (2005).
  Holowaty et al., “Protein Profiling with Epstein-Barr Nuclear Antigen-I Reveals an Interaction with the Herpesvirus-associated Ubiquitin-specific Protease HAUSPIUSP7,” J. Biol. Chem., 278(32):29987-29994 (2003).
  Holmes et al., “Improved cell line development by a high throughput affinity capture surface display technique to select for high secretors,” J. Immunol. Methods, 230:141-147 (1999).
  Hoogenboom, “Selecting and screening recombinant antibody libraries,” Nature Biotech., 23 (9):1105-1116 (2005).
  Hsu, “Mutation, selection, and memory in B lymphocytes of exothermic vertebrates,” Immunological Reviews, 162:25-36 (1998).
  Huang et al., “Notch-Induced E2A Degradation Requires CHIP and Hsc70 as Novel Facilitators of Ubiquitination,” Mol. Cell Biol., 24(20):8951-8962 (2004).
  Huston et al., “Protein engineering of antibody binding sites: Recovery of specific activity in an anti-digoxin single-chain Fv analogue produced in Escherichia coli,” PNAS USA, 85:5879-5883 (1988).
  Iglesias-Ussel et al., “Forced expression of Aid facilitates the isolation of class switch variants from hybridoma cells,” J. Immunol. Methods, 316:59-66 (2006).
  Indra et al., “Temporally-controlled site-specific mutagenesis in the basal layer of the epidermis: comparison of the recombinase activity of the tamoxifen-inducible Cre-ERT2 recombinases,” Nuc. Acid Res., 27(22):4324-4327 (1999).
  Ito et al., “Activation-induced cytidine deaminase shuttles between nucleus and cytoplasm like apoliprotein B mRNA editing catalytic polypeptide 1,” PNAS, 101(7):1975-1980 (2004).
  Jacobs et al., “Hypermutation of Immunoglobulin Genes in Memory B Cells of DNA Repair-deficient Mice,” J. Exp. Med., 187(11):1735-1743 (1998).
  Jain et al., “A potential role for antigen selection in the clonal evolution of Burkitt's lymphoma,” J. Immunol., 153:45-52 (1994).
  Jankelevich et al., “A nuclear matrix attachment region organizes the Epstein-Barr viral plasmid in Raji cells into a single DNA domain,” EMBO J., 11(3):1165-1176 (1992).
  Johnson et al., “Mammalian XRCC2 promotes the repair of DNA double-strand breaks by homologous recombination,” Nature, 401:397-399 (1999).
  Jolly et al., “The targeting of somatic hypermutation,” Semin. Immunol., 8:159-168 (1996).
  Jolly et al., “Rapid methods for the analysis of immunoglobulin gene hypermutation: application to transgenic and gene targeted mice,” Nucleic Acids Research, 25(10):1913-1919 (1997).
  Jones et al., “Replacing the complementarity-determining regions in a human antibody with those from a mouse,” Nature, 321:522-525 (1986).
  Jones et al., “Regulation of cancer cell migration and bone metastasis by RANKL,” Nature, 440:692-696 (2006).
  Joyce, “In vitro evolution of nucleic acids,” Curr. Op. Struct., Biol., 4:331-336 (1994).
  Jung et al., “Selection for Improved Protein Stability by Phage Display,” J. Mol. Biol., 294:163-180 (1999).
  Kabat et al., “Attempts to locate complementarity-determining residues in-the variable positions of light and heavy chains,” Ann. NY Acad. Sci., 190:382-393 (1971).
  Kabat et al., “Unusual distributions of amino acids in complementarity-determining (hypervariable) segments of heavy and light chains of immunoglobulins and their possible roles in specificity of antibody combining sites,” J. Biol. Chem., 252(19):6609-6616 (1977).
  Kabat, “Antibody Diversity Versus Antibody Complementarity,” Pharmacol. Rev., 34(1):23-38 (1982).
  Kallberg et al., “Somatic mutation of immunoglobulin V genes in vitro,” Science, 271(5253):1285-1289 (1996).
  Kapoor et al., “Reconstitution of Epstein-Barr virus-based plasmid partitioning in budding yeast,” EMBO J., 20(1-2):222-230 (2001).
  Kapoor et al., “Methods for measuring the replication and segregation of Epstein-Barr virus-based plasmids,” Methods Mol. Biol., 292:247-266 (2005).
  Kavli et al., “Uracil in DNA—General mutagen, but normal intermediate in acquired immunity,” DNA Repair.doi.10-1016/j.dnarep.2006.10.014, 12 pages (2006).
  Kawahara et al., “A Growth Signal with an Artificially Induced Erythropoietin Receptor-gp130 Cytoplasmic Domain Heterodimer,” J. Biochem., 130(i):305-312 (2001).
  Kawahara et al., “Bypassing antibiotic selection: positive screening of genetically modified cells with an antigen-dependent proliferation switch,” Nucl. Acids Res., 31(7):e32 (2003).
  Kawahara et al., “Improved growth response of antibody/receptor chimera attained by the engineering of transmembrane domain,” Protein Eng. Des. Sel., 17(10):715-719 (2004).
  Kawamura et al., “DNA polymerase theta is preferentially expressed in lymphoid tissues and upregulated in human cancers,” Int. 1. Cancer 109(1):9-16 (2004).
  Keitel et al., “Crystallographic Analysis of Anti-p24 (HIV-I) Monoclonal Antibody Cross-Reactivity and Polyspecificity,” Cell, 91:811-820 (1997).
  Kim et al., “Ongoing diversification of the rearranged immunoglobulin light-chain gene in a bursal lymphoma cell line,” Mol. Cell Biol., 10(6):3224-3231 (1990).
  Kinoshita et al., Linking Class-Switch Recombination with Somatic Hypermutation, Mol. Cell Biol., 2:493-503 (2001).
  Kirchmaier et al., Plasmid Maintenance of Derivatives of oriP of Epstein-Barr Virus, J. Virol., 69(2):1280-1283 (1995).
  Kitamura et al., “Nuclear Import of Epstein-Barr Virus Nuclear Antigen 1 Mediated by NP-I (Importin as) Is Up-and Down-Regulated by Phosphorylation of the Nuclear Localization Signal for Which Lys379 and Arg380 Are Essential,” J. Virol., 80(4):1979-1991 (2006).
  Klein et al., “An EBV-genome-negative cell line established from Ii American Burkitt lymphoma; receptor characteristics, EBV infectibility and permanent conversion into EBV-positive sublines by in vitro infection,” Intervirology, 5:319-334 (1975).
  Klein et al., “Somatic hypermutation in normal and transformed human B cells,” Immunological Reviews, 162:261-280 (1998).
  Klionsky et al.,“A Polyclonal Antibody to the Prepore Loop of Transient Receptor Potential Vanilloid Type I Blocks Channel Activation,” J. Pharmacol. Exp. Ther., 319(1): 192-198 (2006).
  Klix et al., “Multiple sequences from downstream of the Jx cluster can combine to recruit somatic hypermutation to a heterologous, upstream mutation domain,” Eur. J. Immunol., 28:317-326 (1998).
  Knappik et al., “Fully Synthetic Human Combinatorial Antibody Libraries (HuCAL) Based on Modular Consensus Frameworks and CDRs Randomized with Trinucleotides,” J. Mol. Biol., 296:57-86 (2000).
  Knight et al., “Somatic diversification of IgH genes in rabbit,” Immunological Reviews, 162:37-47 (1998).
  Kobrin et al., The Somatic Instability of Immunoglobulin Genes in Cultured cells, pp. 11-28 in Ch. 2 of Somatic hypermutation in V regions (ed. Steele, EJ.), CRC Press, Boca Raton, FL (1990).
  Komori et al., “Biased dA/dT somatic hypermutation as regulated by the heavy chain intronic iEu enhancer and 3′ E alpha enhancers in human lymphoblastoid B cells,” Mol. Immunol., 43:1817-1826 (2006).
  Kong et al., “Recombination-based mechanisms for somatic hypermutation,” Immunological Reviews, 162:67-76 (1998).
  Kosmas et al., “Somatic hypermutation of immunoglobulin variable region genes: focus on follicular lymphoma and multiple myeloma,” Immunological Reviews, 162:281-292 (1998).
  Kou et al., “Expression of activation-induced cytidine deaminase in human hepatocytes during hepatocarcinogenesis,” Int. J. Cancer, published online Oct. 25, 2006 http://www3.interscience.wiley.com.revproxy.brown.edu/cgi-bin/fulltext/113441207/main.html.joumal.
  Kramer, “Transgene Control Engineering in Mammalian Cells,” Methods Mol. Biol., 308: 123-144 (2005).
  Krause et al., “The cytidine deaminases AID and APOBEC-1 exhibit distinct functional properties in a novel yeast selectable system,” Mol. Immunol., 43(4):295-307 (2006).
  Kronick, “The use of phycobiliproteins as fluorescent labels in immunoassay,” J. Immunol. Methods, 92:1-13 (1986).
  Kunaparaju et al., “Epi-CHO, an Episomal Expression System for Recombinant Protein Production in CHO Cells,” Biotechnol. Bioeng., 91(6):670-677 (2005).
  Kuppers et al., “Mechanisms of chromosomal translocations in B cell lymphomas,” Oncogene, 20:5580-5594 (2001).
  Kuriyan, “Allostery and Coupled Sequence Variation in Nuclear Hormone Receptors,” Cell, 116(3):354-356 (2004).
  Lagerstrom et al., “Structural diversity of G protein-coupled receptors and significance for drug discovery,” Nature Reviews/Drug Discovery, 7:339-357 (2008).
  Langle-Rouault et al., “Up to 100-fold Increase of Apparent Gene Expression in the Presence of Epstein-Barr Virus oriP Sequences and EBNA1: Implications of the Nuclear Import of Plasmids,” J. Virol., 72(7):6181-6185 (1998).
  Lantto et al., “Uneven distribution of Repetitive Trinucleotide Motifs in Human Immunoglobulin Heavy Variable Genes,” J. Mol. Evol., 54:346-353 (2002).
  Larijani et al., “Methylation protects cytidines from AID-mediated deamination,” Mol. Immunol., 42:599-604 (2005).
  Larijani et al.; “AID Associates with Single-Stranded DNA with High Affinity and a Long Complex Half-Life in a Sequence-Independent Manner,” Mol. Cell Biol., 27(1):20-30 (2007).
  Lazorchak et al., “E2A and IRF-4/Pip Promote Chromatin Modification and Transcription of the Immunoglobulin K Locus in Pre-B Cells,” Mol. Cell Biol., 26(3):810-821 (2006).
  Leight et al., “Establishment of an oriP Replicon Is Dependent upon an Infrequent,. Epigenetic Event,” Mol. Cell. Biol., 21(13):4149-4161 (2001).
  Leight et al., “The cis-Acting Family of Repeats Can Inhibit as well as Stimulate Establishment of an oriP Replicon,” J. Virol., 75(22):10709-10720 (2001).
  Li et al., “Rad51 expression and localization in B cells carrying out class switch recombination,” PNAS USA, 93:10222-10227 (1996).
  Li et al., “The generation of antibody diversity through somatic hypermutation and class switch recombination,” Genes & Dev., 18:1-11 (2004).
  Lin et al., “Sequence dependent hypermutation of the immunoglobulin heavy chain in cultured B cells,” PNAS USA, 94(10):5284-5289 (1997).
  Lin et al., “The effects of E-mu, 3′-alpha (hs 1,2) and 3′-kappa: enhancers on mutation of an Ig-VDJ-C-gamma-2a Ig immunoglobulin heavy gene in cultured B cells,”, Intl. Immunol., 10(8):1121-1129 (1998).
  Lindner et al., “The plasmid replicon of Epstein-Barr virus: Mechanistic insights into efficient, licensed, extrachromosomal replication in human cells,” Plasmid, 58:1-12 (2007).
  Lingbeck et al., “E12 and E47 modulate cellular localization and proteasome-mediated degradation of MyoD and Idl,” Oncogene, 24:6376-6384 (2005).
  Lipovsek et al., “Selection of Horseradish Peroxidase Variants with Enhanced Enantioselectivity by Yeast Surface Display,” Chem. Biol., 14:1176-1185 (2007).
  Lippow et al., “Computational design of antibody-affinity improvement beyond in vivo maturation,” Nature Biotech., 25(10):1171-1176 (2007).
  Liu et al., “XRCC2 and XRCC3, new human Rad51-family members, promote chromosome stability and protect against DNA cross-links and other damages,” Mol. Cell, 1:783-793 (1998).
  Lopez et al., “A single VH family and long CDR3s are the targets for hypermutation in bovine immunoglobulin heavy chains,” Immunological Reviews, 162:55-66 (1998).
  Lluis et al., “E47 phosphorylation by p38 MAPK promotes MyoD/E47 association and muscle-specific gene transcription,” EMBO J., 24:974-984 (2005).
  Lundberg et al., “High-fidelity amplification using a thermostable DNA polymerase isolated from Pyrococcus furiosus,” Gene, 108:1-6 (1991).
  Luria et al., “Mutations of bacteria from virus sensitivity to virus resistance,” Genetics, 28:491-511 (1943).
  Mage, “Diversification of rabbit VH genes by gene-conversion-like and hypermutation mechanisms,” Immunological Reviews, 162:49-54 (1998).
  Maelicke et al., “Epitope Mapping Employing Antibodies Raised against Short Synthetic Peptides: A Study of the Nicotinic Acetylcholine Receptor,” Biochem., 28:1396-1405 (1989).
  Maizels, “Somatic hypermutation: how many mechanisms diversify V region sequences?” Cell, 83:9-12 (1995).
  Manser et al., “The roles of antibody variable region hypermutation and selection in the development of the memory B-cell compartment,” Immunological Reviews, 162:182-196 (1998).
  Mantyh et al., “Rapid endocytosis of a G protein-coupled receptor: Substance P-evoked internalization of its receptor in the rat striatum in vivo,” PNAS, 92:2622-2626 (1995).
  Manz et al., “Analysis and sorting of live cells according to secreted molecules, relocated to a cell-surface affinity matrix,” PNAS USA, 92:1921-1925 (1995).
  Marchalonis et al., “Exquisite specificity and peptide epitope recognition promiscuity, properties shared by antibodies from sharks to humans,” J. Mol. Recognition, 14:110-121 (2001).
  Margolskee et al., “Epstein-Barr Virus Shuttle Vector for Stable Episomal Replication of cDNA Expression Libraries in Human Cells,” Mol. Cell. Biol., 8(7):2837-2847 (1988).
  Marks et al., “By-Passing Immunization Human Antibodies from V-gene Libraries Displayed on Phage,” J. Mol. Biol., 222:581-597 (1991).
  Martin et al., “AID and mismatch repair in antibody diversification,” Nat. Rev. Immunol., 2(8):605-614 (2002).
  Martin et al., “Activation-induced cytidine deaminase turns on somatic hypermutation in hybridomas,” Nature, 415:802-806 (2002).
  Martin et al., “Somatic hypermutation of the AID transgene in B and non-B cells,” PNAS, 99(19):12304-12308 (2002).
  Mason et al., “The Kinetics of Antibody Binding to Membrane Antigens in Solution and at the Cell Surface,” Biochem. J. 187: 1-20 (1980).
  Mastrobattista et al., “High-Throughput Screening of Enzyme Libraries: In Vitro Evolution of a β-Galactosidase by Fluorescence-Activated Sorting of Double Emulsions,” Chem. Biol., 12(12):1291-1300 (2005).
  Masuda et al., “Absence of DNA polymerase 0 results in decreased somatic hypermutation frequency and altered mutation patterns in Ig genes,” DNA Repair doi: 10: 1016/j.dnarep.2006.06.006, 8 pages (2006).
  Mattheakis et al., “An in vitro polysome display system for identifying ligands from very large peptide libraries,” PNAS USA, 91:9022-9026 (1994).
  Matthias et al., “Eukaryotic expression vectors for the analysis of mutant proteins,” NAR, 17:6418 (1989).
  Mattes, “Binding parameters of antibodies reacting with multivalent antigens: functional affinity or pseudo-affinity,” J. Immunol. Methods, 202:97-101 (1997).
  Max et al., “The Nucleotide Sequence of a 5.5-kilobase DNA Segment Containing the Mouse k Immunoglobulin J and C Region Genes,” J. Biol. Chem., 256:5116-5120 (1981).
  Mazor et al., “Isolation of engineered, full-length antibodies from libraries expressed in Escherichia coli,” Nat. Biotech., 25(5):563-565 (2007).
  McBride et al., “Somatic hypermutation is limited by CRM1-dependent nuclear export of activation-induced deaminase,” J. Exp. Med., 199(9):1235-1244 (2004).
  McBride et al., “Regulation of hypermutation by activation-induced cytidine deaminase phosphorylation,” PNAS, 103(23):8798-8803 (2006).
  McCafferty et al., “Phage antibodies: filamentous phage displaying antibody variable domains,” Nature, 348:552-554 (1990).
  McCormack et al., “Germ line maintenance of the pseudogene donor pool for somatic immunoglobulin gene conversion in chickens,” Mol. Cell Biol., 13(2):821-830 (1993).
  McIntosh et al., “Somatic hypermutation in autoimmune thyroid disease,” Immunological Reviews, 162:219-231 (1998).
  McKean et al., “Generation of antibody diversity in the immune response of BALB/c mice to influenza virus hemagglutinin,” PNAS USA, 81:3180-3184 (1984).
  Meyer et al., “The immunoglobulin x locus contains a second, stronger B-cell-specific enhancer which is located downstream of the constant region,” EMBO J., 8(7): 1959-1964 (1989).
  Mian et al., “Structure, Function and Properties of Antibody Binding Sites,” J. Mol. Biol., 217:133-151 (1991).
  Midlefort et al., “Context-dependent mutations predominate in an engineered high-affinity single chain antibody fragment,” Protein Science, 15:324-334 (2006).
  Monteiro et al., “Molecular methods for the detection of mutations,” Teratog. Carcinog. Mutagen., 20(6):357-386 (2000).
  Moore, “Exploration by lamp light,” Nature 374:766-767 (1995).
  Morino et al., “Antibody fusions with fluorescent proteins: a versatile reagent for profiling protein expression,” J. Immunol. Methods, 257:175-184 (2001).
  Moza et al., “Long-range cooperative binding effects in a T cell receptor variable domain,” PNAS, 103(26):9867-9872 (2006).
  Muramatsu et al. “Class Switch Recombination and Hypermutation Require Activation-Induced Cytidine Deaminase (Alb), Potential RNA Editing Enzyme,” Cell, 102:553-563(2000).
  Muschen et al., “Somatic Mutation of the CD95 Gene in Human B Cells as a Side-Effect of the Germinal Center Reaction,” J. Exp. Med., 192(12):1833-1839 (2000).
  Muto et al., “Negative regulation of activation-induced cytidine deaminase in B cells,” PNAS, 103(8):2752-2757 (2006).
  Muyldermans et al., “Sequence and structure of VH domain from naturally occurring camel heavy chain immunoglobulins lacking light chains,” Protein Eng., 7(9):1129-1135 (1994).
  Nakamura et al., Codon usage tabulated from international DNA sequence databases: status for the year 2000, Nucl. Acid Res., 28(1):292 (2000).
  Nakayama et al., “A limited number of genes are involved in the differentiation of germinal center B cells,” J. Cell. Biol., Published online Jun. 22, 2006 DOI: 10.1002/jcb.20952.
  Navaratnam et al., “An Overview of Cytidine Deaminases,” Intl. J. Hematol., 83:195-200 (2006).
  Neuberger et al., “Somatic hypermutation,” Curr. Op. Immunol., 7:248-254 (1995).
  Neuberger et al., “Monitoring and interpreting the intrinsic features of somatic hypermutation,” Immunological Reviews, 162:107-116 (1998).
  Neuberger et al., “Somatic hypermutation at A T pairs: polymerase error versus dUTP incorporation,” Nature Rev. Immunol., 5(2):171-178 (2005).
  Ng et al., “The immunology of AIDS-associated lymphomas,” Immunological Reviews, 162:293-298 (1998).
  Nie et al., “Notch-induced E2A ubiquitination arid degradation are controlled by MAP kinase activities,” EMBO J., 22(21):5780-5792(2003).
  Nussinov, “Eukaryotic Dinucleotide Preference Rules and Their Implications for Degenerate Codon Usage,” J. Mol. Biol., 149:125-131 (1981).
  Odegard et al. Histone modifications associated With somatic hypermutation, Immunity, 23:101-110 (2005).
  Odegard et al., “Targeting of somatic hypermutation,” Nature Rev. Imm., 6:573-583 (2006).
  Ohki et al., “Telomere-bound TRF1 and TRF2 stall the replication fork at telomeric repeats,” Nucl. Acids Res., 32(5):1627-1637 (2004).
  Okragly et al., “Elevated Tryptase, Nerve Growth Factor, Neurotrophin-3 and Glial Cell Line-Derived Neurotrophic Factor Levels in the Urine of Interstitial Cystitis and Bladder Cancer Patients,” J. Urol., 161:438-442 (1991).
  Olsen et al., “High-Throughput FACS Method for Directed Evolution of Substrate Specificity,” Meth. Mol. Biol., 230:329-342 (2003).
  Omori et al., “Regulation of Class-Switch Recombination and Plasma Cell Differentiation by Phosphatidylinositol 3-Kinase Signaling,” Immunity, 25:1-13 (2006).
  Osbourn et al., “Directed selection of MIP-1α neutralizing CCR5 antibodies from a phage display human antibody library,” Nat. Biotechnol., 16:778-781 (1998).
  Otte et al., “Molecular basis for the binding polyspecificity of an anti-cholera toxin peptide 3 monoclonal antibody,” J. Mol. Recognition, 19:49-59 (2006).
  Okazaki et al., “The AID enzyme induces class switch recombination in fibroblasts,” Nature, 416:340-345 (2002).
  Papavasiliou et al., “Cell-cycle-regulated DNA double-stranded breaks in somatic hypermutation of immunoglobulin genes,” Nature, 408:116-221 (2000).
  Papavasiliou et al., “Somatic hypermutation of immunoglobulin genes: merging mechanisms for genetic diversity,” Cell, 109(Supp1.):S35-S44 (2002).
  Parham, ed., Immunol. Reviews, “Somatic hypermutation of immunoglobulin genes,” vol. 162, Apr. 1998 (Copenhagen, Denmark: Munksgaard).
  Pasqualucci et al., “BCL-6 mutations in normal germinal center B cells: Evidence of somatic hypermutation acting outside Ig loci,” Immunol., 95:11816-11821 (1998).
  Pasqualucci et al., “Hypermutation of multiple proto-oncogenes in B-cell diffuse large-cell lymphomas,” Nature, 412:341-346 (2001).
  Pasqualucci et al., “PKA-mediated phosphorylation regulates the function of activation-induced deaminase (AID) in B cells,” PNAS, 103(2):395-400 (2006).
  Perini et al., “In vivo transcriptional regulation of N-Myc target genes is controlled by E-box methylation,” PNAS, 102 (34):12117-12122 (2005).
  Persson et al., “A Focused Antibody Library for Improved Hapten Recognition,” J. Mol. Biol., 357:607-620 (2006).
  Peter et al., “Antibodies against the melanocortin-4 receptor act as inverse agonists in vitro and in vivo,” Am. J. Physiol. Regul. Integr. Compo Physiol., 292:R2151-R2158 (2007).
  Peters et al., “Somatic Hypermutation of Immunoglobulin Genes is Linked to Transcription Initiation,” Immunity, 4:57-65 (1996).
  Petersen-Mart et al., “AID mutates E. coli suggesting a DNA deamination mechanism for antibody diversification,” Nature, 418:99-104 (2002).
  Pham et al., “Impact of Phosphorylation and Phosphorylation-null Mutants on the Activity and Deamination Specificity of Activation-induced Cytidine Deaminase,” JBC Papers in Press published Apr. 16, 2008, Manuscript M802121200:doi/10/1074/3bc.M802-121200.
  Phung et al., “Hypermutation in Ig V genes from mice deficient in the MLH1 mismatch repair protein,” J.Immunol., 162(6):3121-3124(1999).
  Pierce et al., “XRCC3 promotes homology-directed repair of DNA damage in mammalian cells,” Genes Dev., 13:2633-2638 (1999).
  Pioszak et al., “Molecular recognition of parathyroid hormone by its G protein-coUpled receptor,” PNAS, 105(13):5034-5039 (2008).
  Pluckthun, “Antibody Engineering: Advances from the Use of Escherichia coli Expression Systems,” Biotechnology, 9:545-551 (1991).
  Polonskaya et al., “Role for a region of helically unstable DNA within the Epstein-Barr virus latent cycle origin of DNA replication oriP in origin function,” Virology, 328:282-291 (2004).
  Poltoratsky et al., “Error-prone Candidates Vie for Somatic Mutation,” J. Exp. Med., 192(10):F27-F30 (2000).
  Poltoratsky, “Down regulation of DNA polymerase beta accompanies somatic hypermutation in human BL2 cell lines,” DNA Repair, 6(7):244-253 (2007).
  Poltoratsky et al., “Negligible impact of pol L expression on the alkylation sensitivity of poll β-deficient mouse fibroblast cells,” DNA Repair, 7:830-833 (2008).
  Pons et al., “Energetic analysis of an antigen/antibody interface: alanine scanning mutagenesis and double mutant cycles on the HyHEL-10/lysozyme interaction,” Protein Science, 8:958-968 (1999).
  Presta, “Antibody Engineering,” Curr. Op. Struct. Biol., 2:593-596 (1992).
  Rada et al., “Hot spot focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of mutational targeting,” Immunity, 9:135-141 (1998).
  Rada et al. “The intrinsic hypermutability of antibody heavy and light chain genes decays exponentially,” EMBO J., 20(16):4570-4576 (2001).
  Rakestraw et al., “A Flow Cytometric Assay for Screening Improved Heterologous Protein Secretion in Yeast,” Biotechnol. Prog., 22:1200-1208 (2006).
  Ratech, “Rapid cloning of rearranged immunoglobulin heavy chain genes from human B-cell lines using anchored polymerase chain reaction,” Biochem. Biophys. Res. Commun., 182(3):1260-1263 (1992).
  Reason et al., “Codon insertion and deletion functions as a somatic diversification mechanism in human antibody repertoires,” Biol. Direct., 1:24-45 (2006).
  Ren et al., “Establishment and Applications of Epstein-Barr Virus-Based Episomal Vectors in Human Embryonic Stem Cells,” Stem Cells, 24:1338-1347 (2006).
  Revy et al., “Activation-Induced Cytidine Dearninase (AID) Deficiency Causes the Autosomal Recessive Form of the Hyper-IgM Syndrome (HIGM2),” Cell, 102(5):565-575 (2000).
  Reynaud et al., “A hyperconversion mechanism generates the chicken light chain preimmune repertoire,” Cell, 48:379-388 (1987).
  Reynaud et al., “Somatic hyperconversion diversifies the single VH gene of the chicken with a high incidence in the D region,” Cell, 59:171-183 (1989).
  Robey et al., “Specificity mapping of human anti-T cell receptor monoclonal natural antibodies: defining the properties of epitope recognition promiscuity,” FASEB J., 16:642-652 (2002).
  Rogozin et al., “Somatic hypermutagenesis of immunoglobulin genes. II. Influence of neighbouring base sequences on mutagenesis,” Biochem. Biophys. Acta, 1171:11-18 (1992).
  Rogozin et al., “Cutting Edge: DGYW/WRCH Is a Better Predictor of Mutability at G:C Bases in Ig Hypermutation Than the Widely Accepted RGYW/WRCY Motif and Probably Reflects a Two-Step Activation-induced Cytidine Deaminase Triffered Process,” J. Immunol., 172:3382-3384 (2004).
  Rogozin et al., “The cytidine deaminase AID exhibits similar functional properties in yeast and mammals,” Mol. Immunol., 43:1481-1484 (2006).
  Romanow et al., “E2A and EBF Act in Synergy with the V(D)J Recombinase to Generate a Diverse Immunoglobulin Repertoire in Nonlymphoid Cells,” Mol. Cell, 5:343-353 (2000).
  Ronai et al., “Complex regulation of somatic hypermutation by cis-acting sequences in the endogenous IgH gene in hybridoma cells,” PNAS USA, 102(33):11829-11834 (2005).
  Rooney et al., “Paired Epstein-Barr virus-carrying lymphoma and lymphoblastoid cell lines from Burkitt's lymphoma patients: comparative sensitivity to non-specific and to allo-specific cytotoxic responses in vitro,” Int. J. Cancer, 34:339-348 (1984).
  Rowe et al., “Differences in B cell growth phenotype reflect novel patterns of Epstein-Barr virus latent gene expression in Burkitt's lymphoma cells,” EMBO J., 6(9):2743-2751 (1987).
  Rucci et al. “Tissue-specific sensitivity to AID expression in transgenic mouse models,” Gene, 377:150-158 (2006).
  Ruckerl et al., “Activation induced cytidine deaminase fails to induce a mutator phenotype in the human pre-B cell line Nalm6,” Eur. J. Immunol., 35:290-298 (2005).
  Ruckerl et al., “Episomal vectors to monitor and induce somatic hypermutation in human Burkitt-Lymphoma cell lines,” Mol. Immunol., 43(10):1645-1652 (2006).
  Sagawa et al., “Thermodynamic and kinetic aspects of antibody evolution during the immune response to hapten,” Mol. Immunol., 39:801-808 (2003).
  Saini et al., “Exceptionally long CDR3H region with multiple cysteine residues in functional bovine IgM antibodies,” Eur. J. Immunol., 29:2420-2426 (1999).
  Salazar et al., “Evaluating a Screen and Analysis of Mutant Libraries,” Methods Mol. Biol., 230:85-97 (2003).
  Sale et al., “TdT-accessible breaks are scattered over the immunoglobulin V domain in a constitutively hypermutating B cell line,” Immunity, 9:859-869 (1998).
  Sale et al., “Ablation of XRCC2/3 transforms immunoglobulin V gene conversion into somatic hypermutation,” Nature, 412(6850):921-926 (2001).
  Santa-Marta et al., “HIV-1 vif protein blocks the cytidine deaminase activity of B-cell specific AID in the E. coli by a similar mechanism of action,” Mol. Imm., 44:583-590 (2006).
  Schoetz et al., “E2A Expression Stimulates Ig Hypermutation,” J. Immunol., 177:395-400 (2006).
  Schoonbroodt et al., “Oligonucleotide-assisted cleavage and ligation: a novel directional DNA cloning technology to capture cDNAs. Application in the construction of a human immune antibody phage-display library,” Nucl. Acids Res., 33(9):e81 (2005).
  Sciammas et al., “Graded Expression of Interferon Regulatory Factor-4 Coordinates Isotype Switching with Plasma Cell Differentiation,” Immunity, 25:225-236 (2006).
  Sears et al., “Metaphase Chromosome Tethering Is Necessary for the DNA Synthesis and Maintenance of oriP Plasmids but Is Insufficient for Transcription Activation by Epstein-Barr Nuclear Antigen 1,” J. Virol., 77(21):11767-11780 (2003).
  Sharpe et al., “Somatic hypermutation of immunoglobulin K may depend on sequences 3′ of CK and occurs on passenger transgenes,” EMBO J., 10(8):2139-2145 (1991).
  Shen et al., Mutation ofBCL-6 Gene in Normal B Cells by the Process of Somatic Hypermutation of Ig Genes, Science, 280:1750-1752 (1998).
  Shen et al., The TATA binding protein, c-Myc and survivin genes are not somatically hypermutated, while Ig and BCL6 genes are hypermutated in human memory B cells, Intl. Immunol., 12(7):1085-1093 (2000).
  Shen et al., “Somatic hypermutation and class switch recombination in Msh6-/-Ung-/-double-knockout mice,” J. Immunol., 177:5386-6392 (2006).
  Shinkura et al., “Separate domains of AID are required for somatic hypermutation and class-switch recombination,” Nat. Immunol., 5(7):707-712 (2004).
  Shire et al., EBP2, a Human Protein That Interacts with Sequences of the Epstein-Barr Virus Nuclear Antigen 1 Important for Plasmid Maintenance, J. Virol., 73(4):2587-2595 (1999).
  Shire et al., “Regulation of the EBNAI Epstein-Barr Virus Protein by Serine Phosphorylation and Arginine Methylation,” J. Virol., 80(11):5261-5272 (2006).
  Siehler, “Cell-based assays in GPCR drug discovery,” Biotech. J., 3: 1-13 (2008).
  Silverman, “Multivalent avimer proteins evolved by exon shuffling of a family of human receptor domains,” Nature Biotech., 23:1493-1494 (2005).
  Sitaraman et al., “A novel cell-free protein synthesis system,” J. Biotechnol., 110(3):257-263 (2004).
  Storb et al., “Cis-acting sequences that affect somatic hypermutation of Ig genes,” Immunological Reviews, 162:153-160 (1998).
  Smit et al., “Antigen receptors and somatic hypermutation in B-cell chronic lymphocytic leukemia with Richter's transformation,” Haematologica, 91(7):903-911 (2006).
  Smith, “Filamentous Fusion Phage: Novel Expression Vectors that Display Cloned Antigens on the Virion Surface,” Science, 228(4705):1315-1317 (1985).
  Smith-Gill et al., “VLVH Expression by Monoclonal Antibodies Recognizing Avian Lysozyme,” J. Immunol., 132(2):963-967 (1984).
  Sonderegger et al., “Evolutionary Engineering of Saccharomyces cerevisiae for Anaerobic Growth on Xylose,” Appl. Environ. Microbiol., 69(14):1990-1998 (2003).
  Song et al., “Antibody feedback and somatic mutation in B cells: regulation of mutation by immune complexes with IgG antibody,” Immunological Reviews, 162:211-218 (1998).
  Spencer et al., “Characteristics of Sequences Around Individual Nucleotide Substitutions in IgVH Genes Suggest Different GC and AT Mutators,” J. Immunol., 162:6596-6601 (1999).
  Spillmann et al., “Endogenous Expression of Activation-Induced Cytidine Deaminase in Cell Line WEHI-231,” J. Immunol., 173:1858-1867 (2004).
  Stevenson et al., “Insight into the origin and clonal history of B-cell tumors as revealed by analysis of immunoglobulin variable region genes,” Immunological Reviews, 162:247-259 (1998).
  Storb, “Progress in understanding the mechanism and consequences of somatic hypermutation,” Immunological Reviews, 162:5-11 (1998).
  Storb et al., “Somatic hypermutation of immunoglobulin arid non-immunoglobulin genes,” Phil. Trans. R. Soc., Lond. B 356:13-19 (2001).
  Storb et al., “The E Box Motif CAGGTG Enhances Somatic Hypermutation without Enhancing Transcription,” Immunity, 19:235-242 (2003).
  Steele et al. “Computational analyses show A-to-G mutations correlate with nascent mRNA hairpins at somatic hypermutation hotspots,” DNA Repair doi:10.1016/j.dnarep.2006.06.002 (2006).
  Stemmer, “DNA shuffiing by random fragmentation and reassembly: In vitro recombination for molecular evolution,” PNAS USA, 91(22):10747-10751 (1994).
  Taddei et al., “Role of mutator alleles in adaptive evolution,” Nature, 387:700-702 (1997).
  Takata et al., “Homologous recombination and non-homologous end-joining pathways of DNA double-strand break repair have overlapping roles in the maintenance of chromosomal integrity in vertebrate cells,” Embo J., 17(18):5497-5508 (1998).
  Takata et al., “The Rad51 paralog Rad51B promotes homologous recombinational repair,” Mol. Cell Biol., 20(17):6476-6482 (2000).
  Takata et al., “Chromosome Instability and Defective Recombinational Repair in Knockout Mutants of the Five Rad51 Paralogs,” Mol. Cell Biol., 21(8):2858-2866 (2001).
  Tam, “Synthetic peptide vaccine design: Synthesis and properties of a high-density multiple antigenic peptide system,” PNAS USA, 85:5409-5413 (1988).
  Teh et al., “The 1.48 A Resolution Crystal Structure of the Homotetrameric Cytidine Deaminase from Mouse,” Biochem., 45:7825-7833 (2006).
  Teng et al., “MicroRNA-155 Is a Negative Regulator of Activation-Induced Cytidine Deaminase,” Immunity, 28:621-629 (2008).
  Terskikh et al., “Peptabody: A new type of high avidity binding protein,” PNAS, 94:1663-1668 (1997).
  Tomlinson, (1997) “V Base database of human antibody genes;” Medical Research Council, Centre for Protein Engineering, UK Confirmation No. 149990 http://www.mrc-cpe.cam.ac.uk/.
  Tuerk et al., “Systematic Evolution of Ligands by Exponential Enrichment: RNA Ligands to Bacteriophage T4 DNA Polymerase,” Science, 249(4968):505-510 (1990).
  Tumas-Brundage et al., “The Transcriptional Promoter Regulates Hypermutation of the Antibody Heavy Chain Locus,” J. Exp. Med., 185(2):239-250 (1997).
  Turner, “Directed evolution of enzymes for applied biocatalysis,” Trends Biotech., 21(11):474-478 (2003).
  Unniraman et al., “Strand-Biased Spreading of Mutations During Somatic Hypermutation,” Science, 317:1227-1230 (2007).
  Vanantwerp et al., “Fine Affinity Discrimination by Yeast Surface Display and Flor Cytometry,” Biotechnol. Prog., 16:31-37 (2000).
  Wabl et al., “Hypermutation at the immunoglobulin heavy chain locus in a pre-B-cell line,” PNAS USA, 82:479-482 (1985).
  Wagner et al., “Codon bias targets mutation,” Nature, 376:732 (1995).
  Wang et al., “Enhancement of scFv fragment reactivity with target antigens in binding assays following mixing with anti-tag monoclonal antibodies,” J. Immunol. Meth., 294:23-35 (2004).
  Wang et al., “Evolution of new nonantibody proteins via iterative somatic hypermutation,” PNAS USA, 101(19):16745-16749 (2004).
  Wang et al., “Genome-wide somatic hypermutation,” PNAS USA, 101(19):7352-7356 (2004).
  Wang et al., “Hypermutation Rate Normalized by Chronological Time,” J. Immunol., 174(9):5650-5654 (2005).
  Wang et al., “Mutant Library Construction in Directed Molecular Evolution,” Mol. Biotech., 34:55-68 (2006).
  Ward et al., “Binding activities of a repertoire of single immunoglobulin variable domains secreted from Escherichia coli,” Nature, 341:544-546 (1989).
  Watanabe et al., “Rad18 guides pol eta to replication stalling sites through physical interaction and PCNA monoubiquitination,” EMBO J., 23(19):3886-3896 (2004).
  Weaver et al., “Gel microdrop technology for rapid isolation of rare and high producer cells,” Nature Medicine, 3(5):583-585 (1997).
  Weill et al., “Rearrangement/hypermutation/gene conversion: when, where and why?” Immunol. Today, 17(2):92-97 (1996).
  Wendelburg et al., “An enhanced EBNA1 variant with reduced IR3 domain for long-term episomal maintenance and transgene expression of oriP-based plasmids in human cells,” Gene Therapy, 5:1389-1399 (1998).
  Werthen et al., “Cooperativity in the antibody binding to surface-adsorbed antigen,” BBA, 1162:326-332 (1993).
  White et al., “Sequences Adjacent to oriP Improve the Persistence of Epstein-Barr Virus-Based Episomes in B Cells,” J. Virol., 75(22):11249-11252 (2001).
  Wiens et al., “Harmful somatic mutations: lessons from the dark side,” Immunological Reviews, 162:197-209 (1998).
  Wilson et al., “Amino acid insertions and deletions contribute to diversify the human Ig repertoire,” Immunological Reviews, 162: 143-151 (1998).
  Wilson et al., “Somatic hypermutation introduces insertions and deletions into immunoglobulin V genes,” J. Exp. Med., 187:59-70 (1998).
  Wilson et al., “MSH2-MSH6 stimulates DNA polymerase eta, suggesting a role for A:T mutations in antibody genes,” J. Exp. Med., 201(4):637-645 (2005).
  Winter et al.Making Antibodies by Phage Display Technology, Ann. Rev. Immunol., 12:433-455 (1994).
  Winter et al., “Dual enigma of somatic hypermutation of immunoglobulin variable genes: targeting and mechanism,” Immunological Reviews, 162:89-96 (1998).
  Wolfe et al., “Beyond the ‘Recognition Code’: Structures of Two Cys2His2 Zinc Finger/TATA Box Complexes,” Structure, 9(8):717-723 (2001).
  Wu et al., “An Analysis of the Sequences of the Variable Regions of Bence Jones Proteins and Myeloma Light Chains and Their Implications for Antibody Complementarity,” J. Exp. Med., 132:211-250 (1970).
  Wu et al., “A human follicular lymphoma B cell line hypermutates its functional immunoglobulin genes in vitro,” Eur. J. Immunol., 25:3263-3269 (1995).
  Wu et al., “The Somatic Hypermutation Activity of Follicular Lymphoma Links to Large Insertions and Deletions of Immunoglobulin Genes,” Scand. J. Immunol., 42:52-59 (1995).
  Wu et al., “Separation of the DNA Replication, Segregation, and Transcriptional Activation Functions of Epstein-Barr Nuclear Antigen 1,” J. Viral., 76(5):2480-2490 (2002).
  Wysocki et al., “Somatic origin of T-cell epitopes within antibody variable regions: significance to monoclonal therapy and genesis of systemic autoimmune disease,” Immunological Reviews, 162:233-246 (1998).
  Xie et al., “The structure of a yeast RNA-editing deaminase provides insight into the fold and function of activation-induced deaminase and APOBEC-1,” PNAS, 101(21):8114-8119 (2004).
  Xu et al., “Two monoclonal antibodies to precisely the same epitope of type II collagen select non-crossreactive phage clones by phage display: implications for autoimmunity and molecular mimicry,” Mol. Immunol., 41:411-419 (2004).
  Yamaguchi-Iwai et al., “Homologous recombination, but not DNA repair, is reduced in vertebrate cells deficient in RAD52,” Mol. Cell Biol., 18(11):6430-6435 (1998).
  Yang et al., “Activation-Induced Cytidine Deaminase (AID)-Mediated Sequence Diversification Is Transiently Targeted to Newly Integrated DNA Substrates,” JBC Papers in Press, published Jul. 5, 2007 Manuscript M704231200.
  Yang et al., “Targeting of AID-Mediated Sequence Diversification by cis-Acting Determinants,” Advances in Immunol., 94:109-125 (2007).
  Yates et al., “The Minimal Replicator of Epstein-Barr Virus oriP,” J. Viral., 74(10):4512-4522 (2000).
  Yelamos et al., “Targeting of non-Ig sequences in place of V segment by somatic hypermutation,” Nature, 376:225-229 (1995).
  Yoshikawa et al., “AID Enzyme-Induced Hypermutation in an Actively Transcribed Gene in Fibroblasts,” Science, 296(5574):2033-2036 (2002).
  Zan et al., “Induction of Ig Somatic Hypermutation and Class Switching in a Human Monoclonal IgM+ IgD+ B Cell Line in Vitro:Definition of the Requirements and Modalities of Hypermutation,” J. Immunol., 162:3437-3447 (1999).
  Zan et al., “B Cell Receptor Engagement and T Cell Contact Induce bcl-6 Somatic Hypermutation in Human B Cells: Identity with Ig Hypermutation,” J. Immunol., 165(2):830-839 (2000).
  Zan et al., “The translesion DNA polymerase 0 plays a dominant role in immunoglobulin gene somatic hypermutation,” EMBO J., 24:3757-3769 (2005).
  Zemlin et al., “Expressed Murine and Human CDR-H3 Intervals of Equal Length Exhibit Distinct Repertoires that Differ in their Amino Acid Composition and Predicted Range.of Structures,” J. Mol. Biol., 334:733-749 (2003).
  Zeng et al., “DNA polymerase eta is an A-T mutator in somatic hypermutation of immunoglobulin variable genes,” Nat. Immunol., 2(6):537-541 (2001).
  Zhao et al., “Directed evolution of enzymes and pathways for industrial biocatalysis,” Curr. Op. Biotechnol., 13(2):104-110 (2002).
  Zhang et al., “Clonal instability of V region hypermutation in the Ramos Burkitt's lymphoma cell line,” Int. Immunol., 13(9):1175-1184 (2001).
  Zhang et al., “Broadly cross-reactive HIV neutralizing human monoclonal antibody Fab selected by sequential antigen panning of a phage display library,” J. Immunol. Methods, 283:17-.25 (2003).
  Zhang et al., “Identification and Characterization of a New Cross-Reactive Human Immunodeficiency Virus Type 1-Neutralizing Human Monoclonal Antibody,” J. Virology, 78(17):9233-9242 (2004).
  Zhang et al., “The development of anti-CD79 monoclonal antibodies for treatment of B-cell neoplastic disease,” Ther. Immunol., 2:191-202 (1995).
  Zheng et al., “Immunoglobulin gene hypermutation in germinal centers is independent of the RAG-I V(D)J recombinase,” Immunological Reviews, 162:133-141 (1998).
  Zheng et al., “Intricate targeting of immunoglobulin somatic hypermutation maximizes the efficiency of affinity maturation,” JEM, 201(9):1467-1478 (2005).
  Zhou et al., “Cell cycle regulation of chromatin at an origin of DNA replication,” EMBO J., 24(7):1406-1417 (2005).
  Zhu et al., “A well-differentiated B cell line is permissive for somatic mutation of a transfected immunoglobulin heavy-chain gene,” PNAS USA, 92:2810-2814 (1995).
  Zou et al., “Subtle differences in antibody responses and hypermutation of a light chains in mice with a disrupted χ constant region,” Eur. J. Immunol., 25:2154-2162 (1995).
  United States Patent and Trademark Office, International Search Report issued in International Application No. PCT/US08/02397 (Jun. 16, 2008).
  United States Patent and Trademark Office, International Search Report issued in International Application No. PCT/US08/02396 (Jun. 26, 2008).
  Barbas III et al., “Synthetic Human Antibodies: Selecting and Evolving Functional Proteins,” Methods: A Companion to Methods in Enzymology, 8:94-103 (1995).
  Barbas III et al., “Semisynthetic combinatorial antibody libraries: A chemical solution to the diversity problem,” Proc. Natl. Acad. Sci. USA, 89:4457-4461 (1992).
  De Haard et al., “A Large Non-immunized Human Fab Fragment Phage Library That Permits Rapid Isolation and Kinetic Analysis of High Affinity Antibodies,” Journal of Biological Chemistry, 274(26):18218-18230 (1999).
  Rader et al., “A phage display approach for rapid antibody humanization: Designed combinatorial V gene libraries,” Proc. Natl. Acad. Sci. USA, 95:8910-8915 (1998).
  Japanese Patent Office, Office Action in Japanese Patent Application No. 550935/2009 (Jun. 3, 2014).
  Japanese Patent Office, Office Action in Japanese Patent Application No. 094218/2013 (Sep. 9, 2014).
  European Patent Office, Office Action in European Patent Application No. 08 725 984.2 (Sep. 11, 2014).
  Chowdhury, P.S. “Targeting Random Mutations to Hotspots in Antibody Variable Domains for Affinity Improvement,” Methods in Molecular Biology, 178: 269-285 (Dec. 1, 2001).
  Ho et al., “In Vitro Antibody Evolution Targeting Germline Hot Spots to Increase Activity of an Anti-CD22 Immunotoxin,” The Journal of Biological Chemistry, 280(1): 607-617 (Jan. 7, 2005).
  Indian Patent Office, Office Action in Indian Patent Application No. 5743/DELNP/2009 (Jan. 14, 2015).
  De Krulf et al., “Selection and Application of Human Single Chain Fv Antibody Fragments from a Semi-synthetic Phage Antibody Display Library with Designed CDR3 Regions,” J. Mol. Biol. 248(1): 97-105 (Jan. 1, 1995).
  European Patent Office, Office Action in European Patent Application No. 08 725 984.2 (Oct. 8, 2015).
  Hoogenboom et al., “By-passing Immunisation Human Antibodies from Synthetic Repertoires of Germline VH Gene Segments Rearranged in Vitro,” J. Mol. Biol., 227(2): 381-388 (Sep. 20, 1992).
  Nissim et al., “Antibody fragments from a ‘single pot’ phage display library as immunochemical reagents,” The EMBO Journal, 13(3): 692-698 (Jan. 1, 1994).
  U.S. Appl. No. 12/070,904, filed Feb. 20, 2008.
  U.S. Appl. No. 12/070,905, filed Feb. 20, 2008.
  U.S. Appl. No. 12/972,328, filed Dec. 17, 2010.
  U.S. Appl. No. 13/109,106, filed May 17, 2011.
  U.S. Appl. No. 13/272,065, filed Oct. 12, 2011.
  U.S. Appl. No. 14/181,096, filed Feb. 14, 2014.
 
     Primary Examiner —Christian Boesen
     Art Unit — 1639
     Exemplary claim number — 1
 
(74)Attorney, Agent, or Firm — Leydig, Voit & Mayer, Ltd.

(57)

Abstract

This invention relates to methods for the generation of humanized antibodies, particularly a humanized antibody heavy chain protein and a humanized antibody light chain protein. The method comprises using cells that express or can be induced to express Activation Induced Cytidine Deaminase (AID).
7 Claims, 79 Drawing Sheets, and 76 Figures


CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional of co-pending U.S. patent application Ser. No. 12/070,904, filed Feb. 20, 2008, which claims the benefit of U.S. Provisional Application No. 60/902,414, filed Feb. 20, 2007, U.S. Provisional Application No. 60/904,622, filed Mar. 1, 2007, U.S. Provisional Application No. 60/995,970, filed Sep. 28, 2007, and U.S. Provisional Application No. 61/020,124, filed Jan. 9, 2008, each of which applications is incorporated herein by reference in its entirety.

INCORPORATION-BY-REFERENCE OF MATERIAL SUBMITTED ELECTRONICALLY

[0002] Incorporated by reference in its entirety herein is a computer-readable nucleotide/amino acid sequence listing submitted concurrently herewith and identified as follows: One 254,400 Byte ASCII (Text) file named “714404_ST25.TXT,” created on Oct. 10, 2013.

FIELD OF THE INVENTION

[0003] This invention relates to methods for the generation of polynucleotide seed libraries and the use of these libraries in generating novel mutants of recombinant proteins and, more particularly, for generating focused libraries of recombinant human antibodies and screening for their affinity binding with target antigens.

BACKGROUND OF THE INVENTION

[0004] The market for the use of recombinant protein therapeutics has increased steadily for the last quarter century. In 2005, six of the top 20 drugs were proteins, and overall, biopharmaceutical drugs accounted for revenues of approximately $40 billion, of which approximately $17 billion was based on the sales of monoclonal antibodies.
[0005] Monoclonal antibodies represent a distinct class of biotherapeutics with a great deal of promise. The antibody scaffold is well tolerated in the clinic, and glycosylated IgG molecules have favorable pharmacokinetic and pharmacodynamic properties. Comparison of the sequences of the approved antibody drugs, as well as those in development, demonstrates that some of the individual drug molecules are strikingly similar to each other, differing only by a few variations of amino acid residues located in the variable region of the immunoglobulin.
[0006] Typical monoclonal antibodies, like naturally occurring antibodies, have the appearance of a “Y”-shaped structure and the antigen binding portion being located at the end of both short arms of the Y. The typical antibody molecule consists of four polypeptides—two identical copies of a heavy (H) chain and two copies of a light (L) chain, forming a general formula H2 L2. It is known that each of the heavy chains contains one N-terminal variable (VH) plus three C-terminal constant (CH1, CH2 and CH3) regions and light chains contain one N-terminal variable (VL) and one C-terminal constant (CO region each. The different variable and constant regions of either heavy or light chains are of roughly equal length (about 110 amino residues per region). Each light chain is linked to a heavy chain by disulphide bonds and the two heavy chains are linked to each other by disulphide bonds. Each heavy chain has at one end a variable domain followed by a number of constant domains, and each light chain has a variable domain at one end and a constant domain at the other end. The light chain variable domain is aligned with the variable domain of the heavy chain. The light chain constant domain is aligned with the first constant domain of the heavy chain. The remaining constant domains of the heavy chains are aligned with each other. The constant domains in the light and heavy chains are not involved directly in binding the antibody to the antigen.
[0007] Antibodies are typically divided into different classes on the basis of the structure of the constant region. In humans for example, five major structural classes can be identified immunoglobulin G or IgG, IgM, IgA, IgD and IgE. Each class is distinguished on the basis of its physical and biological characteristics which relate to the function of the immunoglobulin in the immune system. IgGs can be further divided into four subclasses: IgG1, IgG2, IgG3 and IgG4, based on differences in the heavy chain amino acid composition and in disulphide bridging, giving rise to differences in biological behavior. A description of the classes and subclasses is set out in “Essential Immunology” by Ivan Roitt, Blackwell Scientific Publications.
[0008] The variable domains of each pair of light and heavy chains form the antigen binding site. They have the same general structure with each domain comprising a framework of four regions, whose sequences are relatively conserved, connected by three complementarity determining regions (CDRs). The four framework regions (FWs or FRs) largely adopt a beta-sheet conformation and the CDRs form loops connecting, and in some cases comprising part of, the beta-sheet structure. The CDRs are held in close proximity by the framework regions and, with the CDRs from the other domain, contribute to the formation of the antigen binding site.
[0009] The vertebrate immune system has evolved unique genetic mechanisms that enable it to generate an almost unlimited number of different light and heavy chains in a remarkably economical way by joining separate gene segments together before they are transcribed. The antibody chains are encoded by genes at three separate loci on different chromosomes. One locus encodes the heavy chain isotypes and there are separate loci for the kappa (κ) and lambda (λ) light isotypic chains, although a B-lymphocyte only transcribes from one of these light chain loci. For each type of Ig chain—heavy chains, lambda (λ) light chains, and kappa (κ) light chain—there is a separate pool of gene segments from which a single peptide chain is eventually synthesized. Each pool is on a different chromosome and usually contains a large number of gene segments encoding the V region of an Ig chain and a smaller number of gene segments encoding the C region. More specifically, the variable region of an H-chain comprises three gene fragments, i.e., V, D and J gene fragments, while the variable region of an L-chain comprises two gene fragments, i.e., J and V gene fragments, regardless of whether the L-chain belongs to a lambda (λ) or kappa (κ) chain. During B cell development a complete coding sequence for each of the two Ig chains to be synthesized is assembled by site-specific genetic recombination, bringing together the entire coding sequences for a V region and the coding sequence for a C region.
[0010] The large number of inherited V, J and D gene segments available for encoding Ig chains makes a substantial contribution on its own to antibody diversity, but the combinatorial joining of these segments greatly increases this contribution. Further, imprecise joining of gene segments and somatic mutations introduced during the V-D-J segment joining at the pre-B cell stage greatly increases the diversity of the V regions
[0011] In addition to these structural characteristics, analyses of natural antibody sequences together with structural studies have been instrumental in revealing how antibodies work (Chothia et al., 1992, J. Mol. Biol., 227: 799-817; Kabat, 1982, Pharmacological Rev., 34: 23-38; Kabat, 1987, Sequences of Proteins of Immunological Interest (National Institutes of Health, Bethesda, Md.)). These studies have shown that antigen recognition is primarily mediated by complementarity determining regions (CDRs) that are located at one end of the antibody variable domain and are connected by a β-sheet framework (Wu & Kabat, 1970, J. Exp. Med., 132: 211-250; Kabat & Wu, 1971, Annals New York Acad. Sci., 190: 382-393).
[0012] The sequence diversity of natural antibodies shows that the CDRs are hypervariable in comparison with the framework, and it is the CDR sequences that determine the antigen specificity of a particular antibody (Jones et al., 1986, Nature, 321: 522-5; Amit et al., 1986, Science, 233: 747-53). These studies have also revealed that the natural sequence diversity at most CDR positions is not completely random, as biases for particular amino acids occur in both a site-specific manner and in terms of overall CDR composition (Davies & Cohen, 1996, Proc. Natl. Acad. Sci. USA, 93: 7-12; Kabat et al., 1977, J. Biol. Chem., 252: 6609-16; Zemlin et al., 2003, J. Mol. Biol., 334: 733-49; Mian et al., 1991, J. Mol. Biol., 217: 133-51; Padlan, 1994, Mol. Immunol., 31: 169-217).
[0013] In contrast to traditional small molecule based approaches, therapeutic antibodies have significant advantages, including (i) their ability to be generated and validated quickly; (ii) therapeutic antibodies exhibit fewer side effects and have improved safety profiles, (iii) therapeutic antibodies have well understood pharmacokinetic characteristics, and they can be optimized to create long half-life products with reduced dosing frequency; iv) therapeutic antibodies are versatile and exhibit flexibility in drug function; v) therapeutic antibody scale-up and manufacturing processes are robust and well-understood; and vi) they have a proven track record of clinical and regulatory success.
[0014] Even given the success of monoclonal antibodies, the antibody-as-drug modality is continuing to evolve, and subject to inefficiency. Further, intrinsic biological bias within the native immune system often works against the more rapid development of improved therapeutics. These limitations include, i) the long development time for the isolation of biologically active antibodies with affinity constants of therapeutic caliber, ii) the inability to raise antibodies to certain classes of protein targets (intractable targets), and iii) the intrinsic affinity ceiling inherent in immune system based affinity selection.
[0015] Specifically there is a need for methods to more rapidly develop antibodies with improved pharmacokinetics, cross-reactivity, safety profiles and superior dosing regimens. Central to this need is the development of methods that enable the systematic analysis of potential epitopes with a protein, and enable the selective development of antibodies with the desired selectivity profiles.
[0016] An approach used by a number of companies includes the use of random or semi random mutagenesis (for example the use of error prone PCR), in conjunction with in vitro molecular evolution. This approach is based on the creation of random changes in protein structure and the generation of huge libraries of mutant polynucleotides that are subsequently screened for improved variants, usually through the expression of the encoded proteins within a living cell. From these libraries a few improved proteins may be selected for further optimization.
[0017] Such in vitro mutation approaches are generally limited by the inability to systematically search a significant fraction of sequence space, and by the relative difficulty of detecting very rare improvement mutants at heavy mutagenesis loads. This fundamental problem arises because the total number of possible mutants for a reasonably sized protein is massive. For example, a 100 amino acid protein has a potential diversity of 20100 different sequences of amino acids, while existing high throughput screening methodologies are typically limited to a maximum screening capacity of 107-108 samples per week. Additionally such approaches are relatively inefficient because of redundant codon usage, in which up to around 3100 of the nucleotide sequences possible for a 100 amino acid residue protein actually encode for the same amino acids and protein, (Gustafsson et al. (2004) Codon Bias and heterologous protein expression Trends. Biotech. 22 (7) 346-353).
[0018] A more sophisticated approach uses a mixture of random mutagenesis with recombination between protein domains in order to select for improved proteins (Stemmer Proc. Natl. Acad. Sci. (1994) 91 (22) 10747-51). This approach exploits natural design concepts inherent in protein structures across families of proteins, but again requires significant recombinant DNA manipulation and screening capacity of a large number of sequences to identify rare improvements. Both approaches require extensive follow-up mutagenesis and analysis to understand the significance of each mutation, and to identify the best combination of the many thousands or millions of mutants identified.

SUMMARY OF THE INVENTION

[0019] The present invention meets the foregoing and related needs by providing methods for the generation of polynucleotide libraries, including synthetic, semi-synthetic and/or seed libraries, and the use of these libraries in generating novel mutants of recombinant proteins. In certain embodiments, the methods provided herein are useful for generating focused libraries of recombinant human antibodies and screening for their affinity binding with target antigens. In one aspect, a synthetic gene is one that does naturally undergo SHM when expressed in a B cell (i.e., an antibody gene). In another aspect, a synthetic gene is one that does not naturally undergo SHM when expressed in a B cell (i.e., a non-antibody gene). In certain embodiments, the methods provided herein are useful for generating focused libraries of recombinant non-antibody proteins and screening for enhanced function or reduced susceptibility to somatic hypermutation.
[0020] In certain aspects of the present invention, provided herein are compositions of matter comprising a seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest that have at least one region of interest of a protein of interest, wherein the seed library of polynucleotides comprise at least one synthetic nucleic acid sequence that encodes said at least one region of interest and has been modified to act as a substrate for AID mediated somatic hypermutation.
[0021] In certain aspects of the present invention, provided herein are compositions of matter comprising a seed library of polynucleotides encoding one or more proteins, wherein said seed library of polynucleotides comprises at least one synthetic polynucleotide that has been optimized for SHM by insertion of one or more preferred SHM codons. In other aspects, at least one synthetic polynucleotide has been optimized for SHM by reducing the density of non-preferred codons. Synthetic polynucleotides can be made resistant to SHM or made susceptible to SHM using the methods described herein.
[0022] In certain aspects, the compositions of the present invention can comprise a synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of somatic hypermutation motifs. In one embodiment, the synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more preferred SHM codons. In another embodiment, the synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more WAC motif, WRC motif or a combination thereof.
[0023] In certain other aspects, the compositions of the present invention comprise a seed library of polynucleotides encoding a protein of interest that is an antibody. In one embodiment, the protein of interest is an antibody heavy chain or fragment thereof. In another embodiment, the antibody heavy chain comprises a variable region selected from those set forth in FIG. 20A. In still another embodiment, the antibody heavy chain comprises a variable region selected from the group consisting of IGHV6-1, IGHV4-34, IGHV4-59, IGHV3-30-3, IGHV3-7, IGHV3-23, IGHV5-51, IGHV1-2, or IGHV1-69.
[0024] In other embodiments, the protein of interest is an antibody light chain or fragment thereof. In one embodiment, the antibody light chain comprises a variable region selected from set forth in FIG. 20B. In still another embodiment, the antibody light chain comprises a κ light chain variable region selected from the group consisting of IGKV2D-30, IGKV4-1, IGKV1-33, IGKV1D-39, or IGKV3-20. In yet another embodiment, the antibody light chain comprises a variable region selected from set forth in FIG. 20C. In yet still another embodiment, antibody light chain comprises a λ light chain variable region selected from the group consisting of IGKLV7-43, IGLV1-40, IGLV2-11, or IGLV3-21.
[0025] In certain embodiments, the compositions of the present invention comprise at least one region of interest comprising an antibody heavy or light chain CDR1, CDR2 or CDR3 domain. In other embodiments, the compositions comprise at least one said region of interest comprising an antibody heavy or light chain CDR3.
[0026] In certain other aspects, the compositions of the present invention comprise a protein of interest that is a receptor. In other aspects, the protein of interest is an enzyme. In still other aspects, the protein of interest is a co-factor. In yet other aspects, the protein of interest is a transcription factor.
[0027] The present invention also provides a method of making a protein of interest with a desired property, the method comprising the steps of: a. synthesizing a seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest that have at least one region of interest of a protein of interest, wherein the seed library of polynucleotides comprise at least one synthetic nucleic acid sequence that encodes at least one region of interest and has been modified to act as a substrate for AID mediated somatic hypermutation; b joining in operable combination a seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest of a protein of interest into an expression vector; c. transforming a host cell with the expression vector, so that the protein of interest is produced by expression of the seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest of a protein of interest; and wherein the host cell expresses AID, or can be induced to express AID via the addition of an inducing agent; d. optionally inducing AID activity, or allowing AID mediated mutagenesis to occur on the seed library; e. identifying a cell or cells within the population of cells which expresses a mutated protein having a desired property, and f. establishing one or more clonal populations of cells from the cell or cells identified in step (e).
[0028] In other embodiments, provided herein is a method of making a protein of interest with a desired or identified property, said method comprising the steps of: (a) synthesizing a seed library of polynucleotides encoding one or more proteins, wherein said seed library of polynucleotides comprises at least one synthetic polynucleotide that has been optimized for SHM; (b) joining in operable combination said seed library of polynucleotides into an expression vector; (c) transforming a host cell with said expression vector, so that said one or more proteins is produced by expression of said seed library of polynucleotides; and wherein said host cell expresses AID activity or can be induced to express AID activity via the addition of an inducing agent; (d) if needed, inducing AID activity; (e) identifying a cell or cells within the population of cells which express(es) one or more mutated proteins having said desired or identified property, and (f) establishing one or more clonal populations of cells from the cell or cells identified in step (e).
[0029] In other embodiments, provided herein is a method of making an antibody or antigen-binding fragment thereof with a desired property, the method comprising the steps of: a. synthesizing a seed library of polynucleotides encoding a plurality of one or more antibody heavy chain proteins or fragments that have at least one CDR, wherein the polynucleotides comprise at least one synthetic nucleic acid sequence that encodes the at least one CDR and has been modified to act as a substrate for AID mediated somatic hypermutation; b. synthesizing a seed library of polynucleotides encoding a plurality of one or more antibody light chain proteins or fragments that have at least one CDR, wherein the seed library of polynucleotides comprise at least one synthetic nucleic acid sequence that encodes the at least one CDR and has been modified to act as a substrate for AID mediated somatic hypermutation; c. joining in operable combination the seed library of polynucleotides encoding the plurality of antibody heavy chain proteins or fragments thereof and the seed library of polynucleotides encoding the plurality of antibody light chain proteins or fragments thereof into expression vectors; d. transforming a host cell with the expression vectors, so that an antibody or an antigen-binding fragment thereof is produced by coexpression of a heavy chain sequence from the seed library of polynucleotides encoding a plurality of antibody heavy chain proteins or fragments thereof and a light chain sequence from the seed library of polynucleotides encoding a plurality of antibody light chain proteins or fragments thereof, either on the same or different expression vectors; and wherein the host cell expresses AID, or can be induced to express AID via the addition of an inducing agent; e. optionally inducing AID activity, or allowing AID mediated mutagenesis to occur on the seed libraries of polynucleotides; f. identifying a cell or cells within the population of cells which expresses a mutated antibody or an antigen-binding fragment thereof having the desired property, and g. establishing one or more clonal populations of cells from the cell or cells identified in step (f).
[0030] In other embodiments, provided herein is a method of making an antibody or antigen-binding fragment thereof with a desired or identified property, said method comprising the steps of: (a) synthesizing a first seed library of first polynucleotides encoding a plurality of one or more antibody heavy chain proteins or fragments thereof that have at least one heavy chain CDR, wherein said first seed library of polynucleotides comprises at least one first synthetic polynucleotide that has been optimized for SHM; (b) synthesizing a second seed library of second polynucleotides encoding said plurality of one or more antibody light chain proteins or fragments thereof that have at least one light chain CDR, wherein said second seed library of polynucleotides comprises at least one second synthetic polynucleotide that has been optimized for SHM; (c) joining in operable combination said first and second seed libraries of polynucleotides into expression vectors; (d) transforming a host cell with said expression vectors, so that an antibody or an antigen-binding fragment thereof is produced by coexpression of a heavy chain sequence from said first seed library of polynucleotides and a light chain sequence from said second seed library of polynucleotides (either on the same or different expression vectors); and wherein said host cell expresses AID activity or can be induced to express AID activity via the addition of an inducing agent; (e) if needed, inducing AID activity; (f) identifying a cell or cells within the population of cells which expresses one or more mutated antibodies or antigen-binding fragments thereof having the desired or identified property, and (g) establishing one or more clonal populations of cells from the cell or cells identified in step (f).
[0031] In still other embodiments, provided herein is a method of co-evolving a plurality of proteins, the method comprising the steps of: a. synthesizing a first seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest that have at least one region of interest of a first protein of interest, wherein the seed library of polynucleotides comprise at least one synthetic nucleic acid sequence that encodes the at least one region of interest and has been modified to act as a substrate for AID mediated somatic hypermutation; b. synthesizing a second seed library of polynucleotides encoding a plurality of one or more polypeptide species of interest that have at least one region of interest of a second protein of interest, wherein the seed library of polynucleotides comprise at least one synthetic nucleic acid sequence that encodes the at least one region of interest and has been modified to act as a substrate for AID mediated somatic hypermutation; c joining in operable combination the seed library of polynucleotides encoding the plurality of polypeptide species of interest of the first protein of interest and the seed library of polynucleotides encoding the plurality of polypeptide species of interest of the second protein of interest into expression vectors; d. transforming a host cell with the expression vectors, so that the first and second proteins of interest are produced by coexpression of the first and second seed libraries of polynucleotides, either on the same or different expression vectors; and wherein the host cell expresses AID, or can be induced to express AID via the addition of an inducing agent; e. optionally inducing AID activity, or allowing AID mediated mutagenesis to occur on the seed libraries of polynucleotides; f. identifying a cell or cells within the population of cells which expresses a mutated first or second protein of interest having the desired property, and g. establishing one or more clonal populations of cells from the cell or cells identified in step (f).
[0032] In one aspect, provided herein is a method of co-evolving a plurality of proteins, said method comprising the steps of: (a) synthesizing a first seed library of polynucleotides encoding one or more proteins, wherein said first seed library of polynucleotides comprise at least one first synthetic polynucleotide that has been optimized for SHM; (b) synthesizing a second seed library of polynucleotides encoding one or more proteins, wherein said second seed library of polynucleotides comprise at least one second synthetic polynucleotide that has been optimized for SHM; (c) joining in operable combination said first and second seed libraries of polynucleotides into expression vectors; (d) transforming a host cell with said expression vectors, so that said one or more first and second proteins are produced by coexpression of said first and second seed libraries of polynucleotides, either on the same or different expression vectors; and wherein said host cell expresses AID activity or can be induced to express AID activity via the addition of an inducing agent; (e) if needed, inducing AID activity; (f) identifying a cell or cells within the population of cells which expresses one or more mutated proteins having the desired or identified property, and (g) establishing one or more clonal populations of cells from the cell or cells identified in step (f).
[0033] In certain aspects, the methods described herein comprise at least one synthetic nucleic acid sequence that has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of somatic hypermutation motifs. In certain embodiments, the at least one synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more preferred SHM codons. In other embodiments, the at least one synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more WAC motif, WRC motif, or a combination thereof.
[0034] In one embodiment of any of these methods, the identified codon may be replaced with a preferred (canonical) SHM codon or preferred (canonical) hot spot SHM codon which introduces a conservative amino acid substitution, compared to either the wild-type or AID modified codon. In another embodiment of any of these methods, the identified codon may be replaced with a preferred SHM codon or preferred hot spot SHM codon which introduces a semi-conservative mutation at the amino acid level, compared to either the wild-type or AID modified codon. In another embodiment of any of these methods, the identified codon may be replaced with a preferred SHM codon or preferred hot spot SHM codon which introduces a non-conservative mutation at the amino acid level compared to either the wild-type or AID modified codon. In one embodiment, insertion of one or more preferred SHM codons is by insertion of one or more amino acids substitutions in said region of interest, said amino acid substitutions being silent, conservative, semi-conservative, non-conservative or a combination thereof. Modifications to polynucleotides made using the methods described herein can render at least one polynucleotide sequence susceptible or resistant to SHM.
[0035] In certain embodiments, the methods described herein comprise a host cell that is a prokaryotic cell. In one embodiment, the prokaryotic cell is an E. coli cell.
[0036] In certain other embodiments, the methods described herein comprise a host cell that is a eukaryotic cell. In one embodiment, the eukaryotic cell is a mammalian cell. In another embodiment, the host is a mammalian cell that is a Chinese hamster ovary cell (CHO), a human embryonic kidney (HEK) 293 cell, 3T3 cell, a HEK 293T cell, a PER.C6™ cell, or a lymphoid derived cell. In still other embodiments, the host cell is a lymphoid derived cell that is a RAMOS(CRL-1596) cell, a Daudi (CCL-213) cell, an EB-3 (CCL-85) cell, a DT40 (CRL-2111) cell, an 18-81 cell, a Raji (CCL-86), or derivatives thereof.
[0037] In another embodiment, the methods described herein comprise a host cell that is a eukaryotic cell that is a yeast cell.
[0038] The present invention further provides a method for humanizing a non human antibody, the method comprising the steps of: a. determining the sequence of the heavy and light chains of the non human antibody to be humanized; b. synthesizing a seed library of polynucleotides encoding a plurality of one or more human antibody heavy chain protein scaffolds comprising at least one synthetic nucleic acid sequence which encodes at least one CDR, or a portion thereof, derived from the non human antibody heavy chain protein, wherein the nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation; c. synthesizing a seed library of polynucleotides encoding a plurality of one or more human antibody light chain protein scaffolds comprising at least one synthetic nucleic acid sequence which encodes at least one CDR, or a portion thereof, derived from the non human antibody light chain protein, wherein the nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation; d. joining in operable combination the seed library of polynucleotides encoding the plurality of antibody heavy chain protein scaffolds and the seed library of polynucleotides encoding the plurality of antibody light chain protein scaffolds into expression vectors; e. transforming a host cell with the expression vectors, so that an antibody or an antigen-binding fragment thereof is produced by coexpression of a heavy chain sequence from the seed library of polynucleotides encoding the plurality of antibody heavy chain protein scaffolds and a light chain sequence from the seed library of polynucleotides encoding the plurality of antibody light chain protein scaffolds, either on the same or different expression vectors; and wherein the host cell expresses AID, or can be induced to express AID via the addition of an inducing agent; f. optionally inducing AID activity, or allowing AID mediated mutagenesis to occur on the seed libraries; g. identifying a cell or cells within the population of cells which expresses a humanized antibody having binding characteristic of the non-human antibody, and h. establishing one or more clonal populations of cells from the cell or cells identified in step (g).
[0039] In certain embodiments, the method for humanizing a non-human antibody comprises human antibody heavy chain protein scaffolds comprising a variable region selected from FIG. 20A. In other embodiments, the human antibody heavy chain protein scaffolds comprise a variable region selected from FIG. 20A, wherein said selected variable region exhibits the highest amino acid homology to said non human antibody. In still other embodiments, the antibody heavy chain protein scaffolds comprise a variable region selected from the group consisting of IGHV6-1, IGHV4-34, IGHV4-59, IGHV3-30-3, IGHV3-7, IGHV3-23, IGHV5-51, IGHV1-2 or IGHV1-69.
[0040] In certain other embodiments, the method for humanizing a non-human antibody comprises human antibody light chain protein scaffolds comprise a variable region selected from FIG. 20B. In other embodiments, the human antibody light chain protein scaffolds comprise a variable region selected from FIG. 20B, wherein said selected variable region exhibits the highest amino acid homology to said non human antibody. In still other embodiments, the antibody light chain protein scaffolds comprise a variable region selected from the group consisting of IGKV2D-30, IGKV4-1, IGKV1-33, IGKV1D-39, or IGKV3-20.
[0041] In certain other embodiments, the method for humanizing a non-human antibody comprises human antibody light chain protein scaffolds comprise a variable region selected from FIG. 20C. In other embodiments, the human antibody light chain protein scaffolds comprise a variable region selected from FIG. 20C, wherein said selected variable region exhibits the highest amino acid homology to said non human antibody. In still other embodiments, the antibody light chain protein scaffolds comprise a variable region selected from the group consisting of IGKLV7-43, IGLV1-40, IGLV2-11, or IGLV3-21.
[0042] In other aspects, the method for humanizing a non-human antibody described herein comprise at least one synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of somatic hypermutation motifs. In other aspects, the at least one synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more preferred SHM codons. In still other aspects, the at least one synthetic nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation by the insertion of one or more WAC motif, WRC motif, or a combination thereof.
[0043] In other embodiments, the method for humanizing a non-human antibody described herein comprise a plurality of one or more human antibody heavy chain protein scaffolds comprise a synthetic nucleic acid sequence which encodes a CDR3 domain derived from said non human antibody heavy chain protein, wherein said nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation.
[0044] In still other embodiments, the method for humanizing a non-human antibody described herein comprise a plurality of one or more human antibody light chain protein scaffolds comprise a synthetic nucleic acid sequence which encodes a CDR3 domain derived from said non human antibody light chain protein, wherein said nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation.
[0045] In yet other embodiments, the method for humanizing a non-human antibody described herein comprise a plurality of one or more human antibody heavy chain protein scaffolds comprise a synthetic nucleic acid sequence which encodes a portion of a CDR3 domain derived from said non human antibody heavy chain protein, wherein said nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation.
[0046] In still yet other embodiments, the method for humanizing a non-human antibody described herein comprise a plurality of one or more human antibody light chain protein scaffolds comprise a synthetic nucleic acid sequence which encodes a portion of a CDR3 domain derived from said non human antibody light chain protein, wherein said nucleic acid sequence has been modified to act as a substrate for AID mediated somatic hypermutation.

INCORPORATION BY REFERENCE

[0047] All publications and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF THE DRAWINGS

[0048] A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
[0049] FIG. 1 and FIG. 2 show the 20 most common codon transitions, observed in CDRs and FWs during SHM mediated affinity maturation and demonstrate how simple frame shifts can determine the two radically different patterns of mutagenesis seen in CDRs and FWs. These observations lead directly to a hypothesis that both functional selection during affinity maturation and the reading frame context determines the amino acid diversity generated at SHM hot spot codons.
[0050] FIG. 1—Shows that within CDRs, (the codons AGC, TAT, and TAC (encoding tyrosine and serine amino acids), feed a directed flow of primary, secondary and tertiary SHM events generating amino acid diversity. Within CDRs, the most common codon transition observed is AGC to AAC (785 instances), leading to a serine to asparagine conversion. While that transitions are also common in framework regions (354 instances), a simple frame shift of the same mutation in the same hotspot motif ( . . . TACAGCTAT . . . ; SEQ ID NO: 1) context leads to a CAG to CAA silent mutation that is common in framework regions (288 instances) but not commonly observed in CDRs.
[0051] FIG. 2—In contrast to FIG. 1, the most commonly observed codon (amino acid) transition events in frame work regions generate silent mutations (FIG. 2).
[0052] FIG. 3—A histogram of all possible 6-mer nucleotide z-scores describing their ability to attract (positive z-score) or repel (negative z-score) SHM-mediated mutations. Also shown (at the corresponding z-score) on the distribution are nucleotide sequences found in the WAC library. The dotted line indicates the boundary for the top 5% of all SHM recruiting hotspot motifs. As seen in the figure, nucleotide sequences contained in the WAC library provide a high density of hot spots. The assembly of degenerate codons (WACW) results in a subset of possible 4-mer hot spots described by Rogozin et al. (WRCH), where R=A or G, H=A or C or T, and W=T or A.
[0053] FIG. 4—Preferred SHM hot spot codons AAC and TAC, which can be the basis for a synthetic library, e.g. a seed library, can result in a set of primary and secondary mutation events that create considerable amino acid diversity, as judged by equivalent SHM mutation events observed in Ig heavy chains antibodies. From these two codons, basic amino acids (histidine, lysine, arginine), an acidic amino acid (Aspartate), hydrophilic amino acids (serine, threonine, asparagine, tyrosine), hydrophobic amino acids (Alanine, and phenylalanine), and glycine are generated as a result of SHM events.
[0054] FIG. 5—A histogram of all possible 6-mer nucleotide z-scores describing their ability to attract (positive z-score) or repel (negative z-score) SHM-mediated mutations. Also shown (at the corresponding z-score) on the distribution are nucleotide sequences found in the WRC library. The dotted line indicates the boundary for the top 5% of all SHM recruiting hotspot motifs. As seen in the figure, nucleotide sequences contained in the WRC library provide a high density of hot spots. The assembly of degenerate codons (WRCW) results in a subset of possible 4-mer hot spots described by Rogozin et al. (WRCH), where R=A or G, H=A or C or T, and W=T or A.
[0055] FIG. 6—The series of mutation events that lead to the creation of amino acid diversity, starting from “preferred SHM hot spot codons” AGC and TAC, as observed in affinity matured IGV heavy chain sequences. 4200 primary and secondary SHM mutation events identified and analyzed from the NCBI database, starting from codons encoding asparagine and tyrosine, lead to a set of functionally diverse amino acids.
[0056] FIG. 7—Illustrates the convergence of sequence optimization with progressive iterations of replacement using the program SHMredesign. The figure shows both optimization toward an idealized hot and cold sequence, in this case starting with native canine AID nucleotide sequence.
[0057] FIG. 8—Provides the amino acid (A; SEQ ID NO: 2), and polynucleotide sequence (B; SEQ ID NO: 3) of native blasticidin gene. Also shown is the initial analysis of hot spots (C), cold spots (D) and occurrences of CpGs (E).
[0058] FIG. 9—Provides the amino acid (A; SEQ ID NO: 2), and polynucleotide sequence (B; SEQ ID NO: 4) of a synthetic, SHM resistant version of the blasticidin gene. Also shown is the analysis of hot spots (C), cold spots (D) and occurrences of CpGs (E) in the synthetic sequence.
[0059] FIG. 10—Provides the amino acid (A; SEQ ID NO: 2), and polynucleotide sequence (B; SEQ ID NO: 5) of a synthetic, SHM susceptible version of the blasticidin gene. Also shown is the analysis of hot spots (C), cold spots (D) and occurrences of CpGs (E) in the synthetic sequence.
[0060] FIG. 11—Provides a sequence comparison of activation-induced cytidine deaminase (AID) from Homo sapiens (human; SEQ ID NO: 6), Mus musculus (mouse; SEQ ID NO: 7), Canis familiaris (dog; SEQ ID NO: 8), Rattus norvegicus (norv-) (rat; SEQ ID NO: 9) and Pan troglodytes (chimpanzee; SEQ ID NO: 10). Variations between the species are represented by bold amino acids.
[0061] FIG. 12—Provides the amino acid (A; SEQ ID NO: 11), and polynucleotide sequence (B; SEQ ID NO: 12) of native canine cytidine deaminase (AID) (L198A). Also shown is the analysis of hot spots (C), cold spots (D) and occurrences of CpGs (E) in the native sequence.
[0062] FIG. 13—Provides the polynucleotide sequence (A; SEQ ID NO: 13) of a synthetic SHM susceptible form of canine AID. Also shown is the analysis of hot spots (B), cold spots (C) and occurrences of CpGs (D).
[0063] FIG. 14—Provides the polynucleotide sequence (A; SEQ ID NO: 14) of a synthetic SHM resistant form of canine AID. Also shown is the analysis of hot spots (B), cold spots (C) and occurrences of CpGs (D).
[0064] FIG. 15—Provides a comparison of cDNA sequences of Canis familiaris (dog; SEQ ID NO: 15) and SHM-optimized (cold) Canis familiaris (dog; SEQ ID NO: 16), Homo sapiens (human; SEQ ID NO: 17) and Mus musculus (mouse; SEQ ID NO: 18) mRNA activation-induced cytidine deaminase (AID) sequences. GAG sequences are illustrated by bold, underlining Variations between the sequences are illustrated by bold amino acid residues.
[0065] FIG. 16—Shows the predicted effect of AID activity on reversion frequency using a protein containing a mutable stop codon such as a fluorescent protein (16A). FIG. 16B shows the actual rates of loss of fluorescence achieved (shown as GFP extinction) with cells transfected with two different concentrations of an expression vector capable of expressing AID, and stably expressing GFP. FIG. 16C shows the initial rates of GFP reversion mediated by wild type human AID, and cold canine AID. Also shown is the effect of Ig enhancers on reversion rate.
[0066] FIG. 17—Provide schematics of Vector Formats 1 (17A) and 2 (17B).
[0067] FIG. 18—Provide schematics of Vector Format 3 (18A) and 4 (18B).
[0068] FIG. 19—Provide schematics of Vector Format 5 (19A) and AB184 (19B).
[0069] FIG. 20—Shows the frequency with which various immunoglobulin heavy variable (IgVH) genes are found in the Genbank and PDB databases (20A). FIGS. 20B and 20C provide the same data for the kappa and lambda light chain variable regions, respectively.
[0070] FIG. 21—Illustrates the steps for generating the (A) heavy chain, (B) kappa and (C) lambda light chain libraries.
[0071] FIG. 22—Shown is a synthetic CDR3 that contains two circularly permuted ideal hot spots (AGCTAC; SEQ ID NO: 19) contained between 2 nonameric ideal cold spots (GTCGTCGTC; SEQ ID NO: 20). Here “V” represents variable domain derived sequences, “D” represents the synthetic polynucleotide sequence that has been optimized for SHM, but are naturally derived from CDR3 in the corresponding wild type antibody, “J” represents junction domain derived sequences, and “C” represents constant domain derived sequences. The synthetic CDR3 is placed within the context of the human IGHV4-34, IGHJ1, IgG1 germline sequence as more fully described in Examples 4-7. The nucleotide and amino acid sequences of FR3, CDR3, FR4 and a portion of the constant region are set forth in SEQ ID NO: 21 and 24, respectively. Alternate CDR3 nucleotide sequences are set forth as SEQ ID NOS: 22 and 23. Hot spots are underlined and are contained within 2 nonameric ideal cold spots (italics). Alternate amino acid sequences are set forth as SEQ ID NOS: 25 and 26.
[0072] FIG. 23—Provides a diagram of the synthesis and maturation of Nisin (23A) illustrating amino acid sequences set forth as SEQ ID NOS: 27-30.
[0073] FIG. 24—Provide the polynucleotide sequence of native NisB (SEQ ID NO: 31). Also shown is the analysis of hot spots, cold spots and occurrences of CpGs in the native sequence.
[0074] FIG. 25—Provides the polynucleotide sequence of a SHM resistant form of NisB (SEQ ID NO: 32). Also shown is the analysis of hot spots, cold spots and occurrences of CpGs in the synthetic sequence.
[0075] FIG. 26—Provides the polynucleotide sequence of native NisP (SEQ ID NO: 33). Also shown is the analysis of hot spots, cold spots and occurrences of CpGs in the native sequence.
[0076] FIG. 27—Provides the polynucleotide sequence of a SHM resistant form of NisP (SEQ ID NO: 34). Also shown is the analysis of hot spots, cold spots and occurrences of CpGs in the synthetic sequence.
[0077] FIG. 28—Provides the polynucleotide sequence of native NisT (28A; SEQ ID NO: 35), and SHM resistant form of NisT (28B; SEQ ID NO: 36).
[0078] FIG. 29—Provides the polynucleotide sequence of native NisA (29A; SEQ ID NO: 37), as well as the initial analysis of hot spots (29B), and cold spots (29C). Also shown is a synthetic form of NisA (29D; SEQ ID NO: 38) showing areas of SHM resistant sequence (underlined) and SHM susceptible sequence, and the analysis of hot (29E) and cold spots (29F).
[0079] FIG. 30—Provides the polynucleotide sequence of native NisC (SEQ ID NO: 39), as well as the initial analysis of hot spots (30B) and cold spots (30C).
[0080] FIG. 31—Shows a synthetic form of NisC (31A; SEQ ID NO: 40) showing the analysis of hot (31B) and cold spots (31C).
[0081] FIG. 32—Provides a schematic of a three zinc-finger protein making contacts to a DNA sequence. Each finger is composed of a small beta sheet and alpha helix that coordinate a zinc metal ion. While two histidines and two cysteines bind the zinc, the sidechains of key amino acids emanate from the beginning of the alpha helix to make base specific contacts. These positions may be targeted as SHM hotspots where mutations creating amino acid diversity are desirable. Structural and zinc binding positions of the finger should correspondingly be made cold. ATCGGCGGC (SEQ ID NO:41); TAGCCGCCG (SEQ ID NO: 42).
[0082] FIG. 33—Provides a schematic of an individual finger with structurally conserved positions shown in bold, and residues contacting DNA shown with a gray background (SEQ ID NO: 43). Portions of the amino acid sequence to be made hot or cold are shown, along with all possible corresponding nucleic acid sequences.
[0083] 
[00001] [TABLE-US-00001]
   
    V C   SEQ ID NO   E H   SEQ ID NO
   
    GTATGC   44   GAACAC   52
   
    GTATGT   45   GAACAT   53
   
    GTCTGC   46   GAGCAC   54
   
    GTCTGT   47   GAGCAT   55
   
    GTGTGC   48
   
    GTGTGT   49
   
    GTTTGC   50
   
    GTTTGT   51
   

The accompanying z-score for each nucleotide sequence indicates the degree to which that sequence recruits or repels SHM machinery to that site. Individual sequences from these lists may be chosen to enhance or limit SHM-mediated mutations at each site.
[0084] FIG. 34 The 3-mer nucleotide motif AGC represents a preferred site for somatic hypermutation events. In the Figure, we see the number of mutations observed in our analysis (line graph) at each position of the AGC motif found in framework (FR) and complementarity-determining regions (CDR) for the heavy and light chains of antibodies. The font size for each nucleotide position of the motif shows how often each nucleotide serves as the first position of the codon reading frame. Within framework regions, no one reading frame dominates, whereas within CDRs, the first position (A) of the AGC motif is almost universally used as the first position of the codon.
[0085] FIG. 35 shows the 20 most hot spot codon hypermutation transition events within the FR and CDR regions of heavy chain antibodies, where the numbers labeling the arrows indicate how often a codon transition event was observed. The codons AGC and AGT (serine), and to a lesser extent TAC and TAT (tyrosine), account for ˜50% of the originating mutations observed in affinity matured antibodies. Use of these hot spot codons within the correct reading frame, combined with affinity maturation leads to many fewer observed silent mutations within CDRs compared to framework regions (highlighted by dotted circles in the figure).
[0086] FIGS. 36A-36D are tables which show numerical values of transition frequencies for a representative SHM system.
[0087] FIG. 37 shows the evolution of the codon AGC (serine), a preferred SHM codon, and the resulting codon frequencies over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0088] FIG. 38 shows the evolution of the codon AGC (serine), a preferred SHM codon, and the resulting amino acid frequencies encoded by the codons produced in situ, over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0089] FIG. 39 shows the evolution of the codon TCG (serine), a non-preferred SHM codon, and the resulting codon frequencies over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0090] FIG. 40 shows the evolution of the codon TCG (serine), a non-preferred SHM codon, and the resulting amino acid frequencies encoded by the codons produced in situ, over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0091] FIG. 41 shows the evolution of the codons AGC/TAC, the “WRC motif” (comprising preferred SHM codons encoding serine and tyrosine) and the resulting codon frequencies over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0092] FIG. 42 shows the evolution of the codons AGC/TAC, the “WRC motif” (comprising preferred SHM codons encoding serine and tyrosine) and the resulting amino acid frequencies encoded by the codons produced in situ, over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0093] FIG. 43 shows the evolution of the GGT codon (glycine), a preferred SHM codon, and the resulting codon frequencies over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model. The figure shows the immediate evolution of codons arising from single mutation events, such as GAT (aspartate), GCT (alanine), and AGT (serine). Secondary mutation events acting on these new codons give rise to a tertiary set of codons. For instance, both AGT and GGT under SHM produce the codon AAT, leading to acquisition of asparagine at this position.
[0094] FIG. 44 shows the evolution of a GGT codon (glycine), and the immediate evolution of amino acids arising from single mutation events, such as GAT (aspartate), GCT (alanine), and AGT (serine) over 50 rounds of SHM-mediated mutagenesis, as calculated in the Markov chain model.
[0095] FIG. 45 HEK-293 cells transfected with a low affinity anti-HEL antibody (comprising the light chain mutation N31G) and an constitutive AID expression vector either after stable transfection and selection (panels A and C) or transiently with the addition of re-transfected AID expression vector (panels B and D) were incubated with either 50 pM HEL-FITC (A and B) or 500 pM HEL-FITC (C and D) and living HEL-FITC-binding cells were sorted and expanded in culture for another round of selection and sequence analysis.
[0096] FIG. 46 Previously sorted HEK-293 cells expressing anti-HEL antibodies and constitutive canine AID either after stable transfection and selection (A and C) or transiently with the addition of re-transfected AID expression vector (panels B and D) were incubated with either 50 pM HEL-FITC (A and B) or 500 pM HEL-FITC (C and D) and living HEL-FITC-binding cells were sorted and expanded in culture for another round of selection and sequence analysis.
[0097] FIG. 47 HEK-293 cells transfected with a low affinity anti-HEL antibody and evolved over 4 rounds of selection and evolution were analyzed by incubation with 50 pM HEL-FITC, as described in Example 13. Panel A shows that over 4 rounds of evolution, a clear increase in positive cells is evident in both the FACS scatter plot (panel A), as well as total number of positive cells gated (panel B).
[0098] FIG. 48 Panel A shows a selection of amino sequences around the HyHEL10 light chain CDR1 (SEQ ID NOS: 56, 57 and 58), illustrating the evolved sequence around the site of the Asn 31 mutation introduced in the starting constructs. Panel B shows the corresponding nucleic acid sequences (SEQ ID NOS: 59, 60 and 61). Panel C shows a representation of the measured affinity of the evolved mutants.
[0099] FIG. 49. Shows FACS scattergrams for the isolation of antibodies to NGF selected via the use of intact protein over 5 rounds of selection, as described in Example 15. Panels A and B show FACS results using NGF coupled to beads, and panels C, D and E show FACS scattergrams obtained using 50 nM (panel C) or 20 nM (panels D or E) NGF. Inserts to the graphs show control incubations performed with control cells. In these graphs, the X-axis indicates the extent of IgG expression of the cells and the Y-axis specifies the magnitude of bead binding by cells as described in the Examples.
[0100] FIG. 50. Shows the results of Biacore analysis of a representative antibody isolated from screening of the surface displayed antibody library with NGF as described in Example 15. A multivariate fit of these data produce a predicted dissociation constant of (Kd) of 670 nM.
[0101] FIG. 51 Provides the polynucleotide sequence (A; SEQ ID NO: 458) of a unmodified form of the Teal Fluorescent Protein (TFP). Also shown is the analysis of hot spots (B) and cold spots (C) as illustrated by bold capital letters. 40 CpG methylation sites were present (data not shown).
[0102] FIG. 52 Provides the polynucleotide sequence (A; SEQ ID NO: 459) of a synthetic SHM susceptible (hot) form of the Teal Fluorescent Protein (TFP). Also shown is the analysis of hot spots (B) and cold spots (C) as illustrated by bold capital letters. 14 CpG methylation sites were present (data not shown).
[0103] FIG. 53 Provides the polynucleotide sequence (A; SEQ ID NO: 460) of a synthetic SHM resistant (cold) form of the Teal Fluorescent Protein (TFP). Also shown is the analysis of hot spots (B) and cold spots (C) as illustrated by bold capital letters. 21 CpG methylation sites were present (data not shown).
[0104] FIG. 53D shows the mutations for a representative segment of the hot and cold TFP constructs. The central row shows the amino acid sequence of TFP (residues 59 thru 87) in single letter format (SEQ ID NO: 461), and the “hot” and “cold” starting nucleic acid sequences encoding the two constructs are shown above (hot; SEQ ID NO: 462) and below (cold) the amino acid sequence (SEQ ID NO: 463). Mutations observed in the hot sequence are aligned and stacked top of the gene sequences, while mutations in the cold TFP sequence are shown below. The results illustrate how “silent” changes to the coding sequences generate dramatic changes in observed AID-mediated SHM rates, demonstrating that engineered sequences can be effectively optimized to create fast or slow rates of SHM.
[0105] FIG. 53E shows that the spectrum of mutations generated by AID in the present in vitro tissue culture system mirror those observed in other studies and those seen during in vivo affinity maturation. FIG. 53E shows the mutations generated in the present study (Box (i) upper left, n=118), and compares them with mutations observed by Zan et al. (box (ii) upper right, n=702), Wilson et al. (lower left, n=25000; box (iii)), and a larger analysis of IGHV chains that have undergone affinity maturation (lower right, n=101,926; box (iv)). The Y-axis in each chart indicates the starting nucleotide, the X-axis indicates the end nucleotide, and the number in each square indicates the percentage (%) of time that nucleotide transition is observed. In the present study, the frequency of mutation transitions and transversions was similar to those seen in other data sets. Mutations of C to T and G to A are the direct result of AID activity on cytidines and account for 48% of all mutation events. In addition, mutations at bases A and T account for ˜30% of mutation events (i.e., slightly less than frequencies observed in other datasets).
[0106] FIG. 53F shows that mutation events are distributed throughout the SHM optimized nucleotide sequence of the hot TFP gene, with a maximum instantaneous rate of about 0.08 events per 1000 nucleotides per generation centered around 300 nucleotides from the beginning of the open reading frame. Stable transfection and selection of a gene with AID (for 30 days) produces a maximum rate of mutation of 1 event per 480 nucleotides. As a result, genes may contain zero, one, two or more mutations per gene.
[0107] FIG. 53G Illustrates the distribution of SHM-mediated events observed in hot TFP sequenced genes compared to the significantly reduced pattern of mutations seen in cold TFP (FIG. 53H).

DETAILED DESCRIPTION OF THE INVENTION

I. Somatic Hypermutation Systems

[0108] In vitro somatic hypermutation (SHM) systems as described in related priority application U.S. Provisional Application No. 60/902,414, entitled “SOMATIC HYPERMUTATION SYSTEMS,” filed on Feb. 20, 2007, involve the use of in vitro somatic hypermutation in conjunction with directed evolution and bioinformatic analysis to create integrated systems that include, but are not limited to, optimized, controlled systems for library design, screening, selection and integrated systems for the data mining. These systems include:
[0109] I. An expression system designed to create SHM susceptible and or SHM resistant DNA sequences, within a cell or cell-free, environment. The system enables the stable maintenance of a mutagenesis system that provides for high level targeted SHM in a gene template of interest, while significantly preventing non-specific mutagenesis of structural proteins, transcriptional control regions and selectable markers.
[0110] II. Polynucleotide libraries that are focused in size and specificity. These libraries can be synthetic libraries, semi-synthetic libraries, and/or seed libraries. In certain aspects, the polynucleotide libraries can be enriched for SHM to seed in situ diversity creation. In one such embodiment, a polynucleotide library can be enriched for SHM wherein the library comprises a plurality of polynucleotides having a nucleic acid sequence encoding a functional portion of a protein of interest that is modified to act as a substrate for SHM.
[0111] III. A process based on computational analysis of protein structure, intra-species and inter-species sequence variation, and the functional analysis of protein activity for selecting optimal epitopes that provide for the selection of antibodies with superior selectivity, cross species reactivity, and blocking activity.
[0112] The overall result of the integration of these approaches is an integrated system for creating targeted diversity in situ, and for the automated analysis and selection of proteins with improved traits.
[0113] In certain embodiments, the present invention is based in part of an improved understanding of the context of multiple rounds of SHM within the reading frame of a polynucleotide sequence, and the underlying logic relationships inherent within codon usage patterns.
[0114] In particular, the above systems for in vitro SHM provide new design possibilities for the creation of “seed” libraries that can efficiently serve as the substrate for SHM for the evolution and selection of improved proteins.

i. Definitions

[0115] As used herein and in the appended claims, the terms “a,” “an” and “the” can mean, for example, one or more, or at least one, of a unit unless the context clearly dictates otherwise. Thus, for example, reference to “an antibody” includes a plurality of such antibodies and reference to “a variable regions” includes reference to one or more variable regions and equivalents thereof known to those skilled in the art, and so forth. In the event that there is a plurality of definitions for a term herein, those in this section prevail unless stated otherwise.
[0116] Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
[0117] The terms “comprise” or “comprising” are used in their open, non-limiting sense, that is to say permitting the presence of one or more features or components in addition to the recited feature or features.
[0118] The term “consisting essentially of” refers to a product, particularly a peptide sequence, of a defined number of residues which is not covalently attached to a larger product. In the case of the peptide of the invention referred to above, those of skill in the art will appreciate that minor modifications to the N- or C-terminal of the peptide may however be contemplated, such as the chemical modification of the terminal to add a protecting group or the like, e.g. the amidation of the C-terminus.
[0119] The term “isolated” refers to the state in which specific binding members or other specific proteins of the invention, or nucleic acids encoding such binding members or proteins will be, in accordance with the present invention. Binding members or other proteins, and nucleic acids encoding them will be free or substantially free of material with which they are naturally associated such as other polypeptides or nucleic acids with which they are found in their natural environment, or the environment in which they are prepared (e.g. cell culture) when such preparation is by recombinant DNA technology practiced in vitro or in vivo. It is to be understood, however, that binding members or other proteins, and nucleic acids encoding them may be formulated with diluents or adjuvants and still for practical purposes be isolated—for example binding members will normally be mixed with gelatin or other carriers if used to coat microtitre plates for use in immunoassays, or will be mixed with pharmaceutically acceptable carriers or diluents when used in diagnosis or therapy. Specific binding members or other specific proteins can be glycosylated, either naturally or by systems of heterologous eukaryotic cells, or they can be (for example if produced by expression in a prokaryotic cell) unglycosylated.
[0120] The term “selection” refers to the separation of one or more members, such as polynucleotides, proteins or cells from a library of such members. Selection can involve both detection and selection, for example where cells are selected by use of a fluorescence activated cell sorter (FACS) that detects a reporter gene and then sorts the cells accordingly.
[0121] As used herein, “pg” means picogram, “ng” means nanogram, “ug” or “μg” mean microgram, “mg” means milligram, “ul” or “μl” mean microliter, “ml” means milliliter, “l” means liter, “kb” means kilobases, “uM” or “μM” means micromolar, “nM” means nanomolar, “pM” means picomolar, “fM” means femtomolar.
[0122] The phrase “pharmaceutically acceptable” refers to molecular entities and compositions that are physiologically tolerable and do not typically produce an allergic or similar untoward reaction, such as gastric upset, dizziness and the like, when administered to a human.

Antibody Terminology

[0123] The term “antibody” describes an immunoglobulin whether natural or partly or wholly synthetically produced. The term also covers any polypeptide or protein having a binding domain which is, or is homologous to, an antigen-binding domain. CDR grafted antibodies are also contemplated by this term.
[0124] “Native antibodies” and “native immunoglobulins” are usually heterotetrameric glycoproteins of about 150,000 daltons, composed of two identical light (L) chains and two identical heavy (H) chains. Each light chain is typically linked to a heavy chain by one covalent disulfide bond, while the number of disulfide linkages varies among the heavy chains of different immunoglobulin isotypes. Each heavy and light chain also has regularly spaced intrachain disulfide bridges. Each heavy chain has at one end a variable domain (“VH”) followed by a number of constant domains (“CH”). Each light chain has a variable domain at one end (“VL”) and a constant domain (“CL”) at its other end; the constant domain of the light chain is aligned with the first constant domain of the heavy chain, and the light-chain variable domain is aligned with the variable domain of the heavy chain. Particular amino acid residues are believed to form an interface between the light- and heavy-chain variable domains.
[0125] The term “variable domain” refers to protein domains that differ extensively in sequence among family members (i.e. among different isoforms, or in different species). With respect to antibodies, the term “variable domain” refers to the variable domains of antibodies that are used in the binding and specificity of each particular antibody for its particular antigen. However, the variability is not evenly distributed throughout the variable domains of antibodies. It is concentrated in three segments called hypervariable regions both in the light chain and the heavy chain variable domains. The more highly conserved portions of variable domains are called the “framework region” or “FR.” The variable domains of native heavy and light chains each comprise four FRs (FR1, FR2, FR3 and FR4, respectively), largely adopting a β-sheet configuration, connected by three hypervariable regions, which form loops connecting, and in some cases forming part of, the β-sheet structure. The hypervariable regions in each chain are held together in close proximity by the FRs and, with the hypervariable regions from the other chain, contribute to the formation of the antigen-binding site of antibodies (see Kabat et al., Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health, Bethesda, Md. (1991), pages 647-669). The constant domains are not involved directly in binding an antibody to an antigen, but exhibit various effector functions, such as participation of the antibody in antibody-dependent cellular toxicity.
[0126] The term “hypervariable region” when used herein refers to the amino acid residues of an antibody which are responsible for antigen-binding. The hypervariable region comprises amino acid residues from three “complementarity determining regions” or “CDRs,” which directly bind, in a complementary manner, to an antigen and are known as CDR1, CDR2, and CDR3 respectively.
[0127] In the light chain variable domain, the CDRs typically correspond to residues 24-34 (CDRL1), 50-56 (CDRL2) and 89-97 (CDRL3), and in the heavy chain variable domain the CDRs typically correspond to residues 31-35 (CDRH1), 50-65 (CDRH2) and 95-102 (CDRH3); Kabat et al., Sequences of Proteins of Immunological Interest, 5th Ed. Public Health Service, National Institutes of Health, Bethesda, Md. (1991)) and/or those residues from a “hypervariable loop” (i.e. residues 26-32 (L1), 50-52 (L2) and 91-96 (L3) in the light chain variable domain and 26-32 (H1), 53-55 (H2) and 96-101 (H3) in the heavy chain variable domain; Chothia and Lesk J. Mol. Biol. 196:901 917 (1987)).
[0128] As used herein, “variable framework region” or “VFR” refers to framework residues that form a part of the antigen binding pocket or groove and/or that may contact antigen. In some embodiments, the framework residues form a loop that is a part of the antigen binding pocket or groove. The amino acids residues in the loop may or may not contact the antigen. In an embodiment, the loop amino acids of a VFR are determined by inspection of the three-dimensional structure of an antibody, antibody heavy chain, or antibody light chain. The three-dimensional structure may be analyzed for solvent accessible amino acid positions as such positions are likely to form a loop and/or provide antigen contact in an antibody variable domain. Some of the solvent accessible positions can tolerate amino acid sequence diversity and others (e.g. structural positions) will be less diversified. The three dimensional structure of the antibody variable domain may be derived from a crystal structure or protein modeling. In some embodiments, the VFR comprises, consist essentially of, or consists of amino acid positions corresponding to amino acid positions 71 to 78 of the heavy chain variable domain, the positions defined according to Kabat et al., 1991. In some embodiments, VFR forms a portion of Framework Region 3 located between CDRH2 and CDRH3. Preferably, VFR forms a loop that is well positioned to make contact with a target antigen or form a part of the antigen binding pocket.
[0129] Depending on the amino acid sequence of the constant domain of their heavy chains, immunoglobulins can be assigned to different classes. There are five major classes of immunoglobulins: IgA, IgD, IgE, IgG, and IgM, and several of these may be further divided into subclasses (isotypes), e.g., IgG1, IgG2, IgG3, IgG4, IgA, and IgA2. The heavy-chain constant domains (Fc) that correspond to the different classes of immunoglobulins are called α, δ, ε, γ, and μ, respectively. The subunit structures and three-dimensional configurations of different classes of immunoglobulins are well known.
[0130] The “light chains” of antibodies (immunoglobulins) from any vertebrate species can be assigned to one of two clearly distinct types, called kappa or (“κ”) and lambda or (“λ”), based on the amino acid sequences of their constant domains.
[0131] The terms “antigen-binding portion of an antibody,” “antigen-binding fragment,” “antigen-binding domain,” “antibody fragment” or a “functional fragment of an antibody” are used interchangeably in the present invention to mean one or more fragments of an antibody that retain the ability to specifically bind to an antigen, (see generally, Holliger et al., Nature Biotech. 23 (9) 1126-1129 (2005)). Non-limiting examples of antibody fragments included within, but not limited to, the term “antigen-binding portion” of an antibody include (i) a Fab fragment, a monovalent fragment consisting of the VL, VH, CL and CH1 domains; (ii) a F(ab′)2 fragment, a bivalent fragment comprising two Fab fragments linked by a disulfide bridge at the hinge region; (iii) a Fd fragment consisting of the VH and CH1 domains; (iv) a Fv fragment consisting of the VL and VH domains of a single arm of an antibody, (v) a dAb fragment (Ward et al., (1989) Nature 341:544 546), which consists of a VH domain; and (vi) an isolated complementarity determining region (CDR). Furthermore, although the two domains of the Fv fragment, VL and VH, are coded for by separate genes, they can be joined, using recombinant methods, by a synthetic linker that enables them to be made as a single protein chain in which the VL and VH regions pair to form monovalent molecules (known as single chain Fv (scFv); see e.g., Bird et al. (1988) Science 242:423 426; and Huston et al. (1988) Proc. Natl. Acad. Sci. USA 85:5879 5883; and Osbourn et al. (1998) Nat. Biotechnol. 16:778). Such single chain antibodies are also intended to be encompassed within the term “antigen-binding portion” of an antibody. Any VH and VL sequences of specific scFv can be linked to human immunoglobulin constant region cDNA or genomic sequences, in order to generate expression vectors encoding complete IgG molecules or other isotypes. VH and VL can also be used in the generation of Fab, Fv or other fragments of immunoglobulins using either protein chemistry or recombinant DNA technology. Other forms of single chain antibodies, such as diabodies are also encompassed.
[0132] “F(ab′)2” and “Fab′” moieties can be produced by treating immunoglobulin (monoclonal antibody) with a protease such as pepsin and papain, and includes an antibody fragment generated by digesting immunoglobulin near the disulfide bonds existing between the hinge regions in each of the two H chains. For example, papain cleaves IgG upstream of the disulfide bonds existing between the hinge regions in each of the two H chains to generate two homologous antibody fragments in which an L chain composed of VL (L chain variable region) and CL (L chain constant region), and an H chain fragment composed of VH (H chain variable region) and CHγ1 (γ1 region in the constant region of H chain) are connected at their C terminal regions through a disulfide bond. Each of these two homologous antibody fragments is called Fab′. Pepsin also cleaves IgG downstream of the disulfide bonds existing between the hinge regions in each of the two H chains to generate an antibody fragment slightly larger than the fragment in which the two above-mentioned Fab′ are connected at the hinge region. This antibody fragment is called F(ab′)2.
[0133] The Fab fragment also contains the constant domain of the light chain and the first constant domain (CH1) of the heavy chain. Fab′ fragments differ from Fab fragments by the addition of a few residues at the carboxyl terminus of the heavy chain CH1 domain including one or more cysteine(s) from the antibody hinge region. Fab′-SH is the designation herein for Fab′ in which the cysteine residue(s) of the constant domains bear a free thiol group. F(ab′)2 antibody fragments originally were produced as pairs of Fab′ fragments which have hinge cysteines between them. Other chemical couplings of antibody fragments are also known.
[0134] “Fv” is the minimum antibody fragment which contains a complete antigen-recognition and antigen-binding site. This region consists of a dimer of one heavy chain and one light chain variable domain in tight, non-covalent association. It is in this configuration that the three hypervariable regions of each variable domain interact to define an antigen-binding site on the surface of the VH-VL dimer. Collectively, the six hypervariable regions confer antigen-binding specificity to the antibody. However, even a single variable domain (or half of an Fv comprising only three hypervariable regions specific for an antigen) has the ability to recognize and bind antigen, although at a lower affinity than the entire binding site.
[0135] “Single-chain Fv” or “sFv” antibody fragments comprise the VH and VL domains of an antibody, wherein these domains are present in a single polypeptide chain. Generally, the Fv polypeptide further comprises a polypeptide linker between the VH and VL domains which enables the sFv to form the desired structure for antigen binding. For a review of sFv see Pluckthun in The Pharmacology of Monoclonal Antibodies, vol. 113, Rosenburg and Moore eds. Springer-Verlag, New York, pp. 269 315 (1994).
[0136] The term “Avimer™” refers to a new class of therapeutic proteins that are from human origin, which are unrelated to antibodies and antibody fragments, and are composed of several modular and reusable binding domains, referred to as A-domains (also referred to as class A module, complement type repeat, or LDL-receptor class A domain). They were developed from human extracellular receptor domains by in vitro exon shuffling and phage display, (Silverman et al., 2005, Nat. Biotechnol. 23:1493-94; Silverman et al., 2006, Nat. Biotechnol. 24:220). The resulting proteins may comprise multiple independent binding domains that may exhibit improved affinity (in some cases sub-nanomolar) and specificity compared with single-epitope binding proteins. See, for example, U.S. Patent Application Publ. Nos. 2005/0221384, 2005/0164301, 2005/0053973 and 2005/0089932, 2005/0048512, and 2004/0175756, each of which is hereby incorporated by reference herein in its entirety.
[0137] Each of the known 217 human A-domains comprises ˜35 amino acids (˜4 kDa) and domains are separated by linkers that average five amino acids in length. Native A-domains fold quickly and efficiently to a uniform, stable structure mediated primarily by calcium binding and disulfide formation. A conserved scaffold motif of only 12 amino acids is required for this common structure. The end result is a single protein chain containing multiple domains, each of which represents a separate function. Each domain of the proteins binds independently and that the energetic contributions of each domain are additive. These proteins were called “Avimers™” from avidity multimers.
[0138] As used herein, “natural” or “naturally occurring” antibodies or antibody variable domains, refers to antibodies or antibody variable domains having a sequence of an antibody or antibody variable domain identified from a nonsynthetic source, for example, from a differentiated antigen-specific B cell obtained ex vivo, or its corresponding hybridoma cell line, or from the serum of an animal. These antibodies can include antibodies generated in any type of immune response, either natural or otherwise induced. Natural antibodies include the amino acid sequences, and the nucleotide sequences that constitute or encode these antibodies, for example, as identified in the Kabat database.
[0139] The terms “synthetic polynucleotide,” “synthetic gene” or “synthetic polypeptide,” as used herein, mean that the corresponding polynucleotide sequence or portion thereof, or amino acid sequence or portion thereof, is derived, from a sequence that has been designed, or synthesized de-novo, or modified, compared to the equivalent naturally occurring sequence. Synthetic polynucleotides or synthetic genes can be prepared by methods known in the art, including but not limited to, the chemical synthesis of nucleic acid or amino acid sequences or amplified via PCR (or similar enzymatic amplification systems). Synthetic genes are typically different from unmodified genes or naturally occurring genes, either at the amino acid, or polynucleotide level (or both) and are, typically, located within the context of synthetic expression control sequences. For example, synthetic gene sequences may include amino acid, or polynucleotide, sequences that have been changed, for example, by the replacement, deletion, or addition, of one or more, amino acids, or nucleotides, thereby providing an antibody amino acid sequence, or a polynucleotide coding sequence that is different from the source sequence. Synthetic gene or polynucleotide sequences may not necessarily encode proteins with different amino acids, compared to the natural gene, for example, they can also encompass synthetic polynucleotide sequences that incorporate different codons but which encode the same amino acid; i.e. the nucleotide changes represent silent mutations at the amino acid level. In one embodiment, synthetic genes exhibit altered susceptibility to SHM compared to the naturally occurring or unmodified gene. Synthetic genes can be iteratively modified using the methods described herein and, in each successive iteration, a corresponding polynucleotide sequence or amino acid sequence, is derived, in whole or part, from a sequence that has been designed, or synthesized de novo, or modified, compared to an equivalent unmodified sequence.
[0140] The terms “semi-synthetic polynucletide” or “semi-synthetic gene,” as used herein, refer to polynucleotide sequences that consist in part of a nucleic acid sequence that has been obtained via polymerase chain reaction (PCR) or other similar enzymatic amplification system which utilizes a natural donor (i.e., peripheral blood monocytes) as the starting material for the amplification reaction. The remaining “synthetic” polynucleotides, i.e., those portions of semi-synthetic polynucleotide not obtained via PCR or other similar enzymatic amplification system can be synthesized de novo using methods known in the art including, but not limited to, the chemical synthesis of nucleic acid sequences.
[0141] The term “synthetic variable regions” refers to synthetic polynucleotide sequences that are substantially comprised of optimal SHM hot spots and hot codons that, when combined with the activity of AID and/or one or more error-prone polymerases, can generate a broad spectrum of potential amino acid diversity at each position. Synthetic variable regions may be separated by synthetic frame work sequences that encompass codons that are not specifically targeted for SHM, or that are resistant to SHM but that provide an optimal context for mutagenesis.
[0142] The term “diabodies” refers to small antibody fragments with two antigen-binding sites, which fragments comprise a heavy chain variable domain (VH) connected to a light chain variable domain (VL) in the same polypeptide chain (VH-VL). By using a linker that is too short to allow pairing between the two domains on the same chain, the domains are forced to pair with the complementary domains of another chain and create two antigen-binding sites. Diabodies are described more fully in, for example, EP 404,097; WO 93/11161; and Hollinger et al., Proc. Natl. Acad. Sci. USA 90:6444 6448 (1993).
[0143] Antibodies of the present invention also include heavy chain dimers, such as antibodies from camelids and sharks. Camelid and shark antibodies comprise a homodimeric pair of two chains of V-like and C-like domains (neither has a light chain). Since the VH region of a heavy chain dimer IgG in a camelid does not have to make hydrophobic interactions with a light chain, the region in the heavy chain that normally contacts a light chain is changed to hydrophilic amino acid residues in a camelid. VH domains of heavy-chain dimer IgGs are called VHH domains. Shark Ig-NARs comprise a homodimer of one variable domain (termed a V-NAR domain) and five C-like constant domains (C-NAR domains).
[0144] In camelids, the diversity of antibody repertoire is determined by the complementary determining regions (CDR) 1, 2, and 3 in the VH or VHH regions. The CDR3 in the camel VHH region is characterized by its relatively long length averaging 16 amino acids (Muyldermans et al., 1994, Protein Engineering 7(9): 1129). This is in contrast to CDR3 regions of antibodies of many other species. For example, the CDR3 of mouse VH has an average of 9 amino acids.
[0145] Libraries of camelid-derived antibody variable regions, which maintain the in vivo diversity of the variable regions of a camelid, can be made by, for example, the methods disclosed in U.S. Patent Application Ser. No. 20050037421, published Feb. 17, 2005.
[0146] “Humanized” forms of non-human (e.g., murine) antibodies are chimeric antibodies which contain minimal sequence derived from non-human immunoglobulin. For the most part, humanized antibodies are human immunoglobulins (recipient antibody) in which hypervariable region residues of the recipient are replaced by hypervariable region residues from a non-human species (donor antibody) such as mouse, rat, rabbit or nonhuman primate having the desired specificity, affinity, and capacity. In some instances, framework region (FR) residues of the human immunoglobulin are replaced by corresponding non-human residues. Furthermore, humanized antibodies may comprise residues which are not found in the recipient antibody or in the donor antibody. These modifications are made to further refine antibody performance. In general, the humanized antibody will comprise substantially all of at least one, and typically two, variable domains, in which all or substantially all of the hypervariable regions correspond to those of a non-human immunoglobulin and all, or substantially all, of the FRs are those of a human immunoglobulin sequence. The humanized antibody optionally also will comprise at least a portion of an immunoglobulin constant region (Fc), typically that of a human immunoglobulin. For further details, see Jones et al., Nature 321:522 525 (1986); Reichmann et al., Nature 332:323 329 (1988); and Presta, Curr. Op. Struct. Biol. 2:593 596 (1992).
[0147] A “humanized antibody” of the present invention includes synthetic and semi-synthetic antibodies prepared by in vitro somatic hypermutation driven affinity maturation of library-derived polynucleotides. Specifically included are monoclonal antibodies in which part, or all of the complementarity determining regions of the heavy and light chain are derived from a non-human monoclonal antibody, substantially all the remaining portions of the variable regions are derived from human variable region templates as described herein (both heavy and light chain), and the constant regions are derived from human constant region templates likewise described herein. In one aspect, such non-human CDR sequences comprise synthetic polynucleotide sequences that have been optimized for somatic hypermutation, and comprise preferred SHM codons, e.g., preferred SHM hot spot codons. In one embodiment, the CDR3 regions of the heavy and light chain are derived from the non-human antibody.
[0148] The term “monoclonal antibody” as used herein refers to an antibody obtained from a population of substantially homogeneous antibodies, i.e., the individual antibodies comprising the population are identical except for possible naturally occurring mutations that may be present in minor amounts. Monoclonal antibodies are highly specific, being directed against a single antigenic site. Furthermore, in contrast to conventional (polyclonal) antibody preparations which include different antibodies directed against different determinants (epitopes), each monoclonal antibody is directed against a single determinant on the antigen. The modifier “monoclonal” indicates the character of the antibody as being obtained from a substantially homogeneous population of antibodies, and is not to be construed as requiring production of the antibody by any particular method. For example, the monoclonal antibodies to be used in accordance with the present invention may be made by the hybridoma method first described by Kohler et al., Nature 256:495 (1975), or may be made by recombinant DNA methods (see, e.g., U.S. Pat. No. 4,816,567). In certain embodiments, the “monoclonal antibodies” may also be isolated from phage antibody libraries using the techniques described in Clackson et al., Nature 352:624 628 (1991) and Marks et al., J. Mol. Biol. 222:581 597 (1991), for example.
[0149] In other embodiments, monoclonal antibodies can be isolated and purified from the culture supernatant or ascites mentioned above by saturated ammonium sulfate precipitation, euglobulin precipitation method, caproic acid method, caprylic acid method, ion exchange chromatography (DEAE or DE52), affinity chromatography using anti-immunoglobulin column or protein A column.
[0150] A polyclonal antibody (antiserum) or monoclonal antibody of the present invention can be produced by known methods. Namely, mammals, preferably, mice, rats, hamsters, guinea pigs, rabbits, cats, dogs, pigs, goats, horses, or cows, or more preferably, mice, rats, hamsters, guinea pigs, or rabbits are immunized, for example, with an antigen mentioned above with Freund's adjuvant, if necessary. The polyclonal antibody can be obtained from the serum obtained from the animal so immunized. The monoclonal antibodies are produced as follows. Hybridomas are produced by fusing the antibody-producing cells obtained from the animal so immunized and myeloma cells incapable of producing autoantibodies. Then the hybridomas are cloned, and clones producing the monoclonal antibodies showing the specific affinity to the antigen used for immunizing the mammal are screened.
[0151] An “isolated specific binding member” is one which has been identified and separated and/or recovered from a component of its natural environment. Contaminant components of its natural environment are materials which would interfere with diagnostic or therapeutic uses for the specific binding member, and may include enzymes, hormones, and other proteinaceous or nonproteinaceous solutes. In preferred embodiments, the specific binding member will be purified (1) to greater than 95% by weight as determined by the Lowry or comparable assay method, and most preferably more than 99% by weight, (2) to a degree sufficient to obtain at least 15 residues of N-terminal or internal amino acid sequence by use of a spinning cup sequenator, or (3) to homogeneity by SDS-PAGE under reducing or nonreducing conditions using Coomassie blue or, preferably, silver stain. Isolated specific binding members include those in situ within recombinant cells since at least one component of the specific binding member's natural environment will not be present. Ordinarily, however, isolated specific binding members will be prepared by at least one purification step.
[0152] As used herein, an “intrabody or fragment thereof” refers to antibodies that are expressed and function intracellularly. Intrabodies typically lack disulfide bonds and are capable of modulating the expression or activity of target genes through their specific binding activity. Intrabodies include single domain fragments such as isolated VH and VL domains and scFvs. An intrabody can include sub-cellular trafficking signals attached to the N or C terminus of the intrabodies to allow them to be expressed at high concentrations in the sub-cellular compartments where a target protein is located. Upon interaction with the target gene, an intrabody modulates target protein function, and/or achieves phenotypic/functional knockout by mechanisms such as accelerating target protein degradation and sequestering the target protein in a non-physiological sub-cellular compartment. Other mechanisms of intrabody-mediated gene inactivation can depend on the epitope to which the intrabody is directed, such as binding to the catalytic site on a target protein or to epitopes that are involved in protein-protein, protein-DNA or protein-RNA interactions. In one embodiment, an intrabody is a scFv.
[0153] The “cell producing an antibody reactive to a protein or a fragment thereof” of the present invention means any cell producing the above-described antibodies or antigen-binding fragments of the present invention.
[0154] The term “germline gene segments” refers to the genes from the germline (the haploid gametes and those diploid cells from which they are formed). The germline DNA contain multiple gene segments that encode a single immunoglubin heavy or light chain. These gene segments are carried in the germ cells but cannot be transcribed and translated into heavy and light chains until they are arranged into functional genes. During B-cell differentiation in the bone marrow, these gene segments are randomly shuffled by a dynamic genetic system capable of generating more than 108 specificities. Most of these gene segments are published and collected by the germline database.
[0155] As used herein, “library” refers to a plurality of polynucleotides, proteins, or cells comprising a collection of two, or two or more, non-identical but related members. A “synthetic library” refers to a plurality of synthetic polynucleotides, or a population of cells that comprise said plurality of synthetic polynucleotides. A “semi-synthetic library” refers to a plurality of semi-synthetic polynucleotides, or a population of cells that comprise said plurality of semi-synthetic polynucleotides. A “seed library” refers to a plurality of one or more synthetic or semi-synthetic polynucleotides, or cells that comprise said polynucleotides, that contain one or more sequences or portions thereof, that have been modified to act as a substrate for SHM, e.g., AID-mediated somatic hypermutatin, and that are capable, when acted upon by somatic hypermutation, to create a library of polynucleotides, proteins or cells in situ.
[0156] “Antigen” refers to substances that are capable, under appropriate conditions, of inducing a specific immune response and of reacting with the products of that response, that is, with specific antibodies or specifically sensitized T-lymphocytes, or both. Antigens may be soluble substances, such as toxins and foreign proteins, or particulates, such as bacteria and tissue cells; however, only the portion of the protein or polysaccharide molecule known as the antigenic determinant (epitopes) combines with the antibody or a specific receptor on a lymphocyte. More broadly, the term “antigen” may be used to refer to any substance to which an antibody binds, or for which antibodies are desired, regardless of whether the substance is immunogenic. For such antigens, antibodies may be identified by recombinant methods, independently of any immune response.
[0157] The term “affinity” refers to the equilibrium constant for the reversible binding of two agents and is expressed as Kd. Affinity of a binding protein to a ligand such as affinity of an antibody for an epitope can be, for example, from about 100 nanomolar (nM) to about 0.1 nM, from about 100 nM to about 1 picomolar (pM), or from about 100 nM to about 1 femtomolar (fM). The term “avidity” refers to the resistance of a complex of two or more agents to dissociation after dilution.
[0158] “Epitope” refers to that portion of an antigen or other macromolecule capable of forming a binding interaction with the variable region binding pocket of an antibody. An epitope can be a linear peptide sequence (i.e., “continuous”) or can be composed of noncontiguous amino acid sequences (i.e., “conformational” or “discontinuous”). An antibody or antigen-binding fragment can recognize one or more amino acid sequences; therefore an epitope can define more than one distinct amino acid sequence. Epitopes recognized by an antibody or antigen-binding fragment can be determined by peptide mapping and sequence analysis techniques well known to one of skill in the art. Typically, such binding interaction is manifested as an intermolecular contact with one or more amino acid residues of a CDR. Often, the antigen binding involves a CDR3 or a CDR3 pair.
[0159] A “cryptic epitope” or a “cryptic binding site” is an epitope or binding site of a protein sequence that is not exposed or substantially protected from recognition within at least one native conformation of the polypeptide, but is capable of being recognized by an antibody or antigen-binding fragment in a second conformation of the polypeptide, or in the denatured, or proteolyzed polypeptide Amino acid sequences that are not exposed, or are only partially exposed, in only one specific native conformation of the polypeptide structure are potential cryptic epitopes. If an epitope is not exposed, or only partially exposed, then it is likely that it is buried within the interior of the polypeptide, or masked by an interaction with a macromolecular structure. Candidate cryptic epitopes can be identified, for example, by examining the three-dimensional structure of a native polypeptide.
[0160] The term “specific” may be used to refer to the situation in which one member of a specific binding pair will not show any significant binding to molecules other than its specific binding partner(s). The term is also applicable where e.g. an antigen binding domain is specific for a particular epitope which is carried by a number of antigens, in which case the specific binding member carrying the antigen binding domain will be able to bind to the various antigens carrying the epitope.
[0161] The term “binding” refers to a direct association between two molecules, due to, for example, covalent, electrostatic, hydrophobic, and ionic and/or hydrogen-bond interactions under physiological conditions, and includes interactions such as salt bridges and water bridges.
[0162] The term “specific binding member” describes a member of a pair of molecules which have binding specificity for one another. The members of a specific binding pair may be naturally derived or wholly or partially synthetically produced. One member of the pair of molecules has an area on its surface, or a cavity, which specifically binds to and is therefore complementary to a particular spatial and polar organization of the other member of the pair of molecules. Thus, the members of the pair have the property of binding specifically to each other. Examples of types of specific binding pairs include antigen-antibody, Avimer™-substrate, biotin-avidin, hormone-hormone receptor, receptor-ligand, protein-protein, and enzyme-substrate.
[0163] The term “adjuvant” refers to a compound or mixture that enhances the immune response, particularly to an antigen. An adjuvant can serve as a tissue depot that slowly releases the antigen and also as a lymphoid system activator that non-specifically enhances the immune response (Hood et al., Immunology, Second Ed., 1984, Benjamin/Cummings: Menlo Park, Calif., p. 384). Often, a primary challenge with an antigen alone, in the absence of an adjuvant, will fail to elicit a humoral or cellular immune response. Previously known and utilized adjuvants include, but are not limited to, complete Freund's adjuvant, incomplete Freund's adjuvant, saponin, mineral gels such as aluminum hydroxide, surface active substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil or hydrocarbon emulsions, keyhole limpet hemocyanins, dinitrophenol, and potentially useful human adjuvant such as BCG (bacille Calmette-Guerin) and Corynebacterium parvum. Mineral salt adjuvants include but are not limited to: aluminum hydroxide, aluminum phosphate, calcium phosphate, zinc hydroxide and calcium hydroxide. Preferably, the adjuvant composition further comprises a lipid of fat emulsion comprising about 10% (by weight) vegetable oil and about 1-2% (by weight) phospholipids. Preferably, the adjuvant composition further optionally comprises an emulsion form having oily particles dispersed in a continuous aqueous phase, having an emulsion forming polyol in an amount of from about 0.2% (by weight) to about 49% (by weight), optionally a metabolizable oil in an emulsion-forming amount of up to 15% (by weight), and optionally a glycol ether-based surfactant in an emulsion-stabilizing amount of up to about 5% (by weight).
[0164] As used herein, the term “immunomodulator” refers to an agent which is able to modulate an immune response. An example of such modulation is an enhancement of antibody production.
[0165] An “immunological response” to a composition or vaccine comprised of an antigen is the development in the host of a cellular- and/or antibody-mediated immune response to the composition or vaccine of interest. Usually, such a response consists of the subject producing antibodies, B cells, helper T cells, suppressor T cells, and/or cytotoxic T cells directed specifically to an antigen or antigens included in the composition or vaccine of interest.

Molecular Biological Terminology

[0166] The term “nucleotide” as used herein refers to a monomeric unit of a polynucleotide that consists of a heterocyclic base, a sugar, and one or more phosphate groups. The naturally occurring bases (guanine, (G), adenine, (A), cytosine, (C), thymine, (T), and uracil (U)) are typically derivatives of purine or pyrimidine, though it should be understood that naturally and non-naturally occurring base analogs are also included. The naturally occurring sugar is the pentose (five-carbon sugar) deoxyribose (which forms DNA) or ribose (which forms RNA), though it should be understood that naturally and non-naturally occurring sugar analogs are also included. Nucleic acids are typically linked via phosphate bonds to form nucleic acids, or polynucleotides though many other linkages are known in the art (such as, though not limited to phosphorothioates, boranophosphates and the like).
[0167] The terms “nucleic acid” and “polynucleotide” as used herein refer to a polymeric form of nucleotides of any length, either ribonucleotides (RNA) or deoxyribonucleotides (DNA). These terms refer to the primary structure of the molecule, and thus include double- and single-stranded DNA, and double- and single-stranded RNA. The terms include, as equivalents, analogs of either RNA or DNA made from nucleotide analogs and modified polynucleotides such as, though not limited to methylated and/or capped polynucleotides.
[0168] A “DNA molecule” refers to the polymeric form of deoxyribonucleotides (adenine, guanine, thymine, or cytosine) in its either single stranded form, or a double-stranded helix. This term refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear DNA molecules (e.g., restriction fragments), viruses, plasmids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5′ to 3′ direction along the non transcribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA).
[0169] A DNA “coding sequence” or “coding region” is a double-stranded DNA sequence which is transcribed and translated into a polypeptide in vivo when placed under the control of appropriate expression control sequences. The boundaries of the coding sequence (the “open reading frame” or “ORF”) are determined by a start codon at the 5′ (amino) terminus and a translation stop codon at the 3′ (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and synthetic DNA sequences. A polyadenylation signal and transcription termination sequence will usually be located 3′ to the coding sequence. The term “noncoding sequence” or “noncoding region” refers to regions of a polynucleotide sequence that not translated into amino acids (e.g. 5′ and 3′ untranslated regions).
[0170] The term “reading frame” refers to one of the six possible reading frames, three in each direction, of the double stranded DNA molecule. The reading frame that is used determines which codons are used to encode amino acids within the coding sequence of a DNA molecule.
[0171] As used herein, an “antisense” nucleic acid molecule comprises a nucleotide sequence which is complementary to a “sense” nucleic acid encoding a protein, e.g., complementary to the coding strand of a double-stranded cDNA molecule, complementary to an mRNA sequence or complementary to the coding strand of a gene. Accordingly, an antisense nucleic acid molecule can hydrogen bond to a sense nucleic acid molecule.
[0172] The term “base pair” or (“bp”): a partnership of adenine (A) with thymine (T), or of cytosine (C) with guanine (G) in a double stranded DNA molecule. In RNA, uracil (U) is substituted for thymine.
[0173] As used herein a “codon” refers to the three nucleotides which, when transcribed and translated, encode a single amino acid residue; or in the case of UUA, UGA or UAG encode a termination signal. Codons encoding amino acids are well known in the art and are provided for convenience herein in Table 1.
[0174] 
[00002] [TABLE-US-00002]
  TABLE 1
 
  Codon Usage Table
  Codon   Amino acid   AA   Abbrev.   Codon   Amino acid   AA   Abbrev.
 
  UUU   Phenylalanine   Phe   F   UCU   Serine   Ser   S
  UUC   Phenylalanine   Phe   F   UCC   Serine   Ser   S
  UUA   Leucine   Leu   L   UCA   Serine   Ser   S
  UUG   Leucine   Leu   L   UCG   Serine   Ser   S
 
  CUU   Leucine   Leu   L   CCU   Proline   Pro   P
  CUC   Leucine   Leu   L   CCC   Proline   Pro   P
  CUA   Leucine   Leu   L   CCA   Proline   Pro   P
  CUG   Leucine   Leu   L   CCG   Proline   Pro   P
 
  AUU   Isoleucine   Ile   I   ACU   Threonine   Thr   T
  AUC   Isoleucine   Ile   I   ACC   Threonine   Thr   T
  AUA   Isoleucine   Ile   I   ACA   Threonine   Thr   T
  AUG   Methionine   Met   M   ACH   Threonine   Thr   T
 
  GUU   Valine   Val   V   GCU   Alanine   Ala   A
  GUC   Valine   Val   V   GCC   Alanine   Ala   A
  GUA   Valine   Val   V   GCA   Alanine   Ala   A
  GUG   Valine   Val   V   GCG   Alanine   Ala   A
 
  UAU   Tyrosine   Tyr   Y   UGU   Cysteine   Cys   C
  UAC   Tyrosine   Tyr   Y   UGC   Cysteine   Cys   C
  UUA     Stop     UGA     Stop
  UAG     Stop     UGG   Tryptophan   Trp   W
 
  CAU   Histidine   His   H   CGU   Arginine   Arg   R
  CAC   Histidine   His   H   CGC   Arginine   Arg   R
  CAA   Glutamine   Gln   Q   CGA   Arginine   Arg   R
  CAG   Glutamine   Gln   Q   CGG   Arginine   Arg   R
 
  AAU   Asparagine   Asn   N   AGU   Serine   Ser   S
  AAC   Asparagine   Asn   N   AGC   Serine   Ser   S
  AAA   Lysine   Lys   K   AGA   Arginine   Arg   R
  AAG   Lysine   Lys   K   AGG   Arginine   Arg   R
 
  GAU   Aspartate   Asp   D   GGU   Glycine   Gly   G
  GAC   Aspartate   Asp   D   GGC   Glycine   Gly   G
  GAA   Glutamate   Glu   E   GGA   Glycine   Gly   G
  GAG   Glutamate   Glu   E   GGG   Glycine   Gly   G
 
[0175] AA: amino acid; Abbr: abbreviation. It should be understood that the codons specified above are for RNA sequences. The corresponding codons for DNA have a T substituted for U. Optimal codon usage is indicated by codon usage frequencies for expressed genes, for example, as shown in the codon usage chart from the program “Human-High.cod” from the Wisconsin Sequence Analysis Package, Version 8.1, Genetics Computer Group, Madison, Wis. Codon usage is also described in, for example, R. Nussinov, “Eukaryotic Dinucleotide Preference Rules and Their Implications for Degenerate Codon Usage,” J. Mol. Biol. 149: 125-131 (1981). The codons which are most frequently used in highly expressed human genes are presumptively the optimal codons for expression in human host cells and, thus, form the bases for constructing a synthetic coding sequence.
[0176] As used herein, a “wobble position” refers to the third position of a codon. Mutations in a DNA molecule within the wobble position of a codon typically result in silent or conservative mutations at the amino acid level. For example, there are four codons that encode Glycine, i.e., GGU, GGC, GGA and GGG, thus mutation of any wobble position nucleotide, to any other nucleotide, does not result in a change at the amino acid level of the encoded protein, i.e. is a silent substitution.
[0177] Accordingly a “silent substitution” or “silent mutation” is one in which a nucleotide within a codon is modified, but does not result in a change in the amino acid residue encoded by the codon. Examples include mutations in the third position of a codon, as well as in the first position of certain codons, such as the codon “CGG,” which when mutated to AGG, still encodes the amino acid Arginine (Arg, or R).
[0178] The phrase “conservative amino acid substitution” or “conservative mutation” refers to the replacement of one amino acid by another amino acid with a common property. A functional way to define common properties between individual amino acids is to analyze the normalized frequencies of amino acid changes between corresponding proteins of homologous organisms (Schulz, G. E. and R. H. Schirmer, Principles of Protein Structure, Springer-Verlag). According to such analyses, groups of amino acids may be defined where amino acids within a group exchange preferentially with each other, and therefore resemble each other most in their impact on the overall protein structure (Schulz, G. E. and R. H. Schirmer, Principles of Protein Structure, Springer-Verlag).
[0179] Examples of amino acid groups defined in this manner include: a “charged/polar group,” consisting of Glu, Asp, Asn, Gln, Lys, Arg and His; an “aromatic, or cyclic group,” consisting of Pro, Phe, Tyr and Tip; and an “aliphatic group” consisting of Gly, Ala, Val, Leu, Ile, Met, Ser, Thr and Cys.
[0180] Within each group, subgroups may also be identified, for example, the group of charged/polar amino acids may be sub-divided into the sub-groups consisting of the “positively-charged sub-group,” consisting of Lys, Arg and His; the negatively-charged sub-group,” consisting of Glu and Asp, and the “polar sub-group” consisting of Asn and Gln.
[0181] The aromatic, or cyclic group may be sub-divided into the sub-groups consisting of the “nitrogen ring sub-group,” consisting of Pro, His and Tip; and the “phenyl sub-group” consisting of Phe and Tyr.
[0182] The aliphatic group may be sub-divided into the sub-groups consisting of the “large aliphatic non-polar sub-group,” consisting of Val, Leu and Ile; the “aliphatic slightly-polar sub-group,” consisting of Met, Ser, Thr and Cys; and the “small-residue sub-group,” consisting of, Gly, and Ala.
[0183] Examples of conservative mutations include amino acid substitutions of amino acids within the sub-groups above, for example, Lys for Arg and vice versa such that a positive charge may be maintained; Glu for Asp and vice versa such that a negative charge may be maintained; Ser for Thr such that a free —OH can be maintained; and Gln for Asn such that a free —NH2 can be maintained.
[0184] “Semi-conservative mutations” include amino acid substitutions of amino acids with the same groups listed above, that do not share the same sub-group. For example, the mutation of Asp for Asn, or Asn for Lys all involve amino acids within the same group, but different sub-groups.
[0185] “Non-conservative mutations” involve amino acid substitutions between different groups, for example Lys for Leu, or Phe for Ser, etc.
[0186] The term “amino acid residue” refers to the radical derived from the corresponding alpha-amino acid by eliminating the OH portion of the carboxyl group and the H-portion of the alpha amino group. For the most part, the amino acids used in the application are those naturally occurring amino acids found in proteins, or the naturally occurring anabolic or catabolic products of such amino acids which contain amino and carboxyl groups. Alternatively, un-natural amino acids can be incorporated into proteins to facilitate the chemical conjugation to other proteins, toxins, small organic compounds or anti-cancer agents (Datta et al., J Am Chem Soc. (2002) 124 (20):5652-3). In general, the abbreviations used herein for designating the amino acids and the protective groups are based on recommendations of the IUPAC-IUB Commission on Biochemical Nomenclature (see Biochemistry (1972) 11: 1726-1732). The term “amino acid residue” also includes analogs, derivatives and congeners of any specific amino acid referred to herein, as well as C-terminal or N-terminal protected amino acid derivatives (e.g., modified with an N-terminal or C-terminal protecting group). For example, the present invention contemplates the use of amino acid analogs wherein a side chain is lengthened or shorted while still providing a carboxyl, amino or other reactive precursor functional group for cyclization, as well as amino acid analogs having variant side chains with appropriate functional groups).
[0187] The term “amino acid side chain” is that part of an amino acid exclusive of the —CH—(NH2)COOH portion, as defined by K. D. Kopple, “Peptides and Amino Acids,” W. A. Benjamin Inc., New York and Amsterdam, 1996, pages 2 and 33; examples of such side chains of the common amino acids are —CH2CH2SCH3 (the side chain of methionine), —CH2(CH3)—CH2CH3 (the side chain of isoleucine), —CH2CH(CH3)2 (the side chain of leucine) or H— (the side chain of glycine).
[0188] The amino acid residues described herein are preferred to be in the “L” isomeric form. However, residues in the “D” isomeric form can be substituted for any L-amino acid residue, as long as the desired functional property of antibody (immunoglobulin)-binding is retained by the polypeptide. NH2 refers to the free amino group present at the amino terminus of a polypeptide. COOH refers to the free carboxy group present at the carboxy terminus of a polypeptide.
[0189] An “amino acid motif” is a sequence of amino acids, optionally a generic set of conserved amino acids, associated with a particular functional activity.
[0190] As used herein, the terms “protein,” “peptide” and “polypeptide” are used interchangeably to refer to polymers of amino acid residues of any length connected to one another by peptide bonds between the alpha-amino group and carboxy group of contiguous amino acid residues. Polypeptides, proteins and peptides may exist as linear polymers, branched polymers or in circular form. These terms also include forms that are post-translationally modified in vivo, or chemically modified during synthesis.
[0191] It should be noted that all amino-acid residue sequences are represented herein by formulae whose left and right orientation is in the conventional direction of amino-terminus to carboxy terminus. Furthermore, it should be noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino-acid residues.
[0192] The terms “gene,” “recombinant gene” and “gene construct” as used herein, refer to a DNA molecule, or portion of a DNA molecule, that encodes a protein. The DNA molecule can contain an open reading frame encoding the protein (as exon sequences) and can further include intron sequences. The term “intron” as used herein, refers to a DNA sequence present in a given gene which is not translated into protein and is generally found between exons. Usually, it is desirable for the gene to be operably linked to, (or it may comprise), one or more promoters, enhancers, repressors and/or other regulatory sequences to modulate the activity or expression of the gene, as is well known in the art.
[0193] As used herein, a “complementary DNA” or “cDNA” includes recombinant polynucleotides synthesized by reverse transcription of mRNA and from which intervening sequences (introns) have been removed.
[0194] The term “operably linked” as used herein, describes the relationship between two polynucleotide regions such that they are functionally related or coupled to each other. For example, a promoter (or other expression control sequence) is operably linked to a coding sequence if it controls (and is capable of effecting) the transcription of the coding sequence. Although an operably linked promoter is generally located upstream of the coding sequence, it is not necessarily contiguous with it.
[0195] “Expression control sequences” are DNA regulatory sequences, such as promoters, enhancers, polyadenylation signals, terminators, internal ribosome entry sites (IRES) and the like, that provide for the expression of a coding sequence in a host cell. Exemplary expression control sequences are described in Goeddel; Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990).
[0196] A “promoter” is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3′ direction) coding sequence. As used herein, the promoter sequence is bounded at its 3′ terminus by the transcription initiation site and extends upstream (5′ direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease 51), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain “TATA” boxes and “CAT” boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the −10 and −35 consensus sequences.
[0197] A large number of promoters, including constitutive, inducible and repressible promoters, from a variety of different sources are well known in the art. Representative sources include for example, viral, mammalian, insect, plant, yeast, and bacterial cell types), and suitable promoters from these sources are readily available, or can be made synthetically, based on sequences publicly available on line or, for example, from depositories such as the ATCC as well as other commercial or individual sources. Promoters can be unidirectional (i.e., initiate transcription in one direction) or bi-directional (i.e., initiate transcription in either a 3′ or 5′ direction). Non-limiting examples of promoters include, for example, the T7 bacterial expression system, pBAD (araA) bacterial expression system, the cytomegalovirus (CMV) promoter, the SV40 promoter, the RSV promoter. Inducible promoters include the Tet system, (U.S. Pat. Nos. 5,464,758 and 5,814,618), the Ecdysone inducible system (No et al., Proc. Natl. Acad. Sci. (1996) 93 (8) 3346-3351; the T-REX™ system (Invitrogen Carlsbad, Calif.), LacSwitch® (Stratagene, (San Diego, Calif.) and the Cre-ERT tamoxifen inducible recombinase system (Indra et al. Nuc. Acid. Res. (1999) 27 (22)4324-4327; Nuc. Acid. Res. (2000) 28 (23) e99; U.S. Pat. No. 7,112,715). See generally, Kramer & Fussenegger Methods Mol. Biol. (2005) 308 123-144) or any promoter known in the art suitable for expression in the desired cells.
[0198] As used herein, a “minimal promoter” refers to a partial promoter sequence which defines the transcription start site but which by itself is not capable, if at all, of initiating transcription efficiently. The activity of such minimal promoters depends on the binding of activators such as a tetracycline-controlled transactivator to operably linked binding sites.
[0199] The terms “IRES” or “internal ribosome entry site” refer to a polynucleotide element that acts to enhance the translation of a coding sequence encoded with a. polycistronic messenger RNA. IRES elements, mediate the initiation of translation by directly recruiting and binding ribosomes to a messenger RNA (mRNA) molecule, bypassing the 7-methyl guanosine-cap involved in typical ribosome scanning. The presence of an IRES sequence can increase the level of cap-independent translation of a desired protein. Early publications descriptively refer to IRES sequences as “translation enhancers.” For example, cardioviral RNA “translation enhancers” are described in U.S. Pat. No. 4,937,190 to Palmenberg et al. and U.S. Pat. No. 5,770,428 to Boris-Lawrie.
[0200] The terms “nuclear localization signal” and “NLS” refer to a domain, or domains capable of mediating the nuclear import of a protein or polynucleotide, or retention thereof, within the nucleus of a cell. A “strong nuclear import signal” represents a domain or domains capable of mediating greater than 90% subcellular localization in the nucleus when operatively linked to a protein of interest. Representative examples of NLSs include but are not limited to, monopartite nuclear localization signals, bipartite nuclear localization signals and N and C-terminal motifs. N terminal basic domains usually conform to the consensus sequence K-K/R-X-K/R which was first discovered in the SV40 large T antigen and which represents a monopartite NLS. One non-limiting example of an N-terminal basic domain NLS is PKKKRKV (SEQ ID NO: 439). Also known are bipartite nuclear localization signals which contain two clusters of basic amino acids separated by a spacer of about 10 amino acids, as exemplified by the NLS from nucleoplasmin: KR[PAATKKAGQA]KKKK (SEQ ID NO: 450). N and C-terminal motifs include, for example, the acidic M9 domain of hnRNP A1, the sequence KIPIK (SEQ ID NO: 464) in yeast transcription repressor Matα2 and the complex signals of U snRNPs. Most of these NLSs appear to be recognized directly by specific receptors of the importin β family.
[0201] The term “enhancer” as used herein, refers to a DNA sequence that increases transcription of, for example, a gene or coding sequence to which it is operably linked. Enhancers can be located many kilobases away from the coding sequence and can mediate the binding of regulatory factors, patterns of DNA methylation or changes in DNA structure. A large number of enhancers, from a variety of different sources are well known in the art and available as or within cloned polynucleotides (from, e.g., depositories such as the ATCC as well as other commercial or individual sources). A number of polynucleotides comprising promoters (such as the commonly-used CMV promoter) also comprise enhancer sequences. Operably linked enhancers can be located upstream, within, or downstream of coding sequences. The term “Ig enhancers” refers to enhancer elements derived from enhancer regions mapped within the Ig locus (such enhancers include for example, the heavy chain (mu) 5′ enhancers, light chain (kappa) 5′ enhancers, kappa and mu intronic enhancers, and 3′ enhancers, (see generally Paul W E (ed) Fundamental Immunology, 3rd Edition, Raven Press, New York (1993) pages 353-363; U.S. Pat. No. 5,885,827).
[0202] “Terminator sequences” are those that result in termination of transcription. Termination sequences are known in the art and include, but are not limited to, poly A (e.g., Bgh Poly A and SV40 Poly A) terminators. A transcriptional termination signal will typically include a region of 3′ untranslated region (or “3′ ut”), an optional intron (also referred to as intervening sequence or “IVS”) and one or more poly adenylation signals (“p(A)” or “pA.” Terminator sequences may also be referred to as “IVS−pA,” “IVS+p(A),” “3′ ut+p(A)” or “3′ ut/p(A).” Natural or synthetic terminators can be used as a terminator region.
[0203] The terms “polyadenylation,” “polyadenylation sequence” and “polyadenylation signal”, “Poly A,” “p(A)” or “pA” refer to a nucleic acid sequence present in a RNA transcript that allows for the transcript, when in the presence of the polyadenyl transferase enzyme, to be polyadenylated. Many polyadenylation signals are known in the art. Non-limiting examples include the human variant growth hormone polyadenylation signal, the SV40 late polyadenylation signal and the bovine growth hormone polyadenylation signal.
[0204] The term “splice site” as used herein refers to polynucleotides that are capable of being recognized by the spicing machinery of a eukaryotic cell as suitable for being cut and/or ligated to a corresponding splice site. Splice sites allow for the excision of introns present in a pre-mRNA transcript. Typically the 5′ portion of the splice site is referred to as the splice donor and the 3′ corresponding splice site is referred to as the acceptor splice site. The term splice site includes, for example, naturally occurring splice sites, engineered splice sites, for example, synthetic splice sites, canonical or consensus splice sites, and/or non-canonical splice sites, for example, cryptic splice sites.
[0205] A “signal sequence” can be included before the coding sequence. This sequence encodes a signal peptide, N-terminal to the polypeptide, that communicates to the host cell to direct the polypeptide to the cell surface or secrete the polypeptide into the media, and this signal peptide is clipped off by the host cell before the protein leaves the cell. Signal sequences can be found associated with a variety of proteins native to prokaryotes and eukaryotes.
[0206] “Post-translational modification” can encompass any one of or a combination of modifications including covalent modification, which a protein undergoes after translation is complete and after being released from the ribosome or on the nascent polypeptide co-translationally. Posttranslational modification includes but is not limited to phosphorylation, myristylation, ubiquitination, glycosylation, coenzyme attachment, methylation, S-nitrosylation and acetylation. Posttranslational modification can modulate or influence the activity of a protein, its intracellular or extracellular destination, its stability or half-life, and/or its recognition by ligands, receptors or other proteins. Post-translational modification can occur in cell organelles, in the nucleus or cytoplasm or extracellularly.
[0207] The term “primer” as used herein refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand, is induced, i.e., in the presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable temperature and pH. The primer may be either single-stranded or double-stranded and must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon many factors, including temperature, source of primer and use of the method. For example, for diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide primer typically contains 15-25 or more nucleotides, although it may contain fewer nucleotides. The polynucleotide primers can be prepared using any suitable method, such as, for example, the phosphotriester on phosphodiester methods see Narang et al., Meth. Enzymol., 68:90, (1979); U.S. Pat. No. 4,356,270; and Brown et al., Meth. Enzymol., 68:109, (1979).
[0208] The primers herein are selected to be “substantially” complementary to different strands of a particular target polynucleotide sequence. This means that the primers must be sufficiently complementary to hybridize with their respective strands. Therefore, the primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5′ end of the primer, with the remainder of the primer sequence being complementary to the strand. Alternatively, non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the strand to hybridize therewith and thereby form the template for the synthesis of the extension product.
[0209] As used herein, the terms “restriction endonucleases” and “restriction enzymes” refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.
[0210] The term “multiple cloning site” as used herein, refers to a segment of a vector polynucleotide which can recognize one or more different restriction enzymes.
[0211] A “replicon” is any genetic element (e.g., plasmid, episome, chromosome, yeast artificial chromosome (YAC), or virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control, and containing autonomous replicating sequences.
[0212] A “vector” or “cloning vector” is a replicon, such as plasmid, phage or cosmid, into which another polynucleotide segment may be introduced so as to bring about the replication of the inserted segment. Vectors typically exist as circular, double stranded DNA, and range in size form a few kilobases (kb) to hundreds of kb. Preferred cloning vectors have been modified from naturally occurring plasmids to facilitate the cloning and recombinant manipulation of polynucleotide sequences. Many such vectors are well known in the art; see for example, by Sambrook (In. “Molecular Cloning: A Laboratory Manual,” second edition, edited by Sambrook, Fritsch, & Maniatis, Cold Spring Harbor Laboratory, (1989)), Maniatis, In: Cell Biology: A Comprehensive Treatise, Vol. 3, Gene Sequence Expression, Academic Press, NY, pp. 563-608 (1980).
[0213] The term “expression vector” as used herein, refers to an agent used for expressing certain polynucleotides within a host cell or in-vitro expression system. The term includes plasmids, episomes, cosmids retroviruses or phages; the expression vector can be used to express a DNA sequence encoding a desired protein and in one aspect includes a transcriptional unit comprising an assembly of expression control sequences. The choice of promoter and other regulatory elements generally varies according to the intended host cell, or in-vitro expression system.
[0214] An “episomal expression vector” is able to replicate in the host cell, and persists as an extrachromosomal segment of DNA within the host cell in the presence of appropriate selective pressure. (See for example, Conese et al., Gene Therapy 11 1735-1742 (2004)). Representative commercially available episomal expression vectors include, but are not limited to, episomal plasmids that utilize Epstein Barr Nuclear Antigen 1 (EBNA1) and the Epstein Barr Virus (EBV) origin of replication (oriP). The vectors pREP4, pCEP4, pREP7 from Invitrogen, pcDNA3.1 from Invitrogen, and pBK-CMV from Stratagene represent non-limiting examples of an episomal vector that uses T-antigen and the SV40 origin of replication in lieu of EBNA1 and oriP.
[0215] An “integrating expression vector” may randomly integrate into the host cell's DNA, or may include a recombination site to enable the specific recombination between the expression vector and the host cells chromosome. Such integrating expression vectors may utilize the endogenous expression control sequences of the host cell's chromosomes to effect expression of the desired protein. Examples of vectors that integrate in a site specific manner include, for example, components of the flp-in system from Invitrogen (e.g., pcDNA™5/FRT), or the cre-lox system, such as can be found in the pExchange-6 Core Vectors from Stratagene. Examples of vectors that integrate into host cell chromosomes in a random fashion include, for example, pcDNA3.1 (when introduced in the absence of T-antigen) from Invitrogen, pCI or pFN10A (ACT) Flexi® from Promega.
[0216] Representative commercially available viral expression vectors include, but are not limited to, the adenovirus-based Per.C6 system available from Crucell, Inc., the lentiviral-based pLP1 from Invitrogen, and the Retroviral Vectors pFB-ERV plus pCFB-EGSH from Stratagene.
[0217] Alternatively, the expression vector may be used to introduce and integrate a strong promoter or enhancer sequences into a locus in the cell so as to modulate the expression of an endogenous gene of interest (Capecchi M R. Nat Rev Genet. (2005); 6 (6):507-12; Schindehutte et al., Stem Cells (2005); 23 (1):10-5). This approach can also be used to insert an inducible promoter, such as the Tet-On promoter (U.S. Pat. Nos. 5,464,758 and 5,814,618), in to the genomic DNA of the cell so as to provide inducible expression of an endogenous gene of interest. The activating construct can also include targeting sequence(s) to enable homologous or non-homologous recombination of the activating sequence into a desired locus specific for the gene of interest (see for example, Garcia-Otin & Guillou, Front Biosci. (2006) 11:1108-36). Alternatively, an inducible recombinase system, such as the Cre-ER system, can be used to activate a transgene in the presence of 4-hydroxytamoxifen. (Indra et al. Nuc. Acid. Res. (1999) 27 (22) 4324-4327; Nuc. Acid. Res. (2000) 28 (23) e99; U.S. Pat. No. 7,112,715).
[0218] Expression vectors may also include anti-sense, ribozymes or siRNA polynucleotides to reduce the expression of target sequences. (See generally, Sioud M, & Iversen, Curr. Drug Targets (2005) 6 (6):647-53; Sandy et al., Biotechniques (2005) 39 (2):215-24).
[0219] As used herein, a “recombination system” refers to one which allows for recombination between a vector of the present application and a chromosome for incorporation of a gene of interest. Recombination systems are known in the art and include, for example, Cre/Lox systems and FLP-IN systems.
[0220] As used herein an “in-vitro expression system” refers to cell free systems that enable the transcription, or coupled transcription and translation of DNA templates. Such systems include for example the classical rabbit reticulocyte system, as well as novel cell free synthesis systems, (J. Biotechnol. (2004) 110 (3) 257-63; Biotechnol Annu. Rev. (2004) 10 1-30).
[0221] As used herein, a “Cre/Lox” system refers to one such as described by Abremski et al., Cell, 32: 1301-1311 (1983) for a site-specific recombination system of bacteriophage P1. Methods of using Cre-Lox systems are known in the art; see, for example, U.S. Pat. No. 4,959,317, which is hereby incorporated in its entirety by reference. The system consists of a recombination site designated loxP and a recombinase designated Cre. In methods for producing site-specific recombination of DNA in eukaryotic cells, DNA sequences having first and second lox sites are typically introduced into eukaryotic cells and contacted with Cre, thereby producing recombination at the lox sites.
[0222] As used here, “FLP-IN” recombination refers to systems in which a polynucleotide activation/inactivation and site-specific integration system has been developed for mammalian cells. The system is based on the recombination of transfected sequences by FLP, a recombinase derived from Saccharomyces. In several cell lines, FLP has been shown to rapidly and precisely recombine copies of its specific target sequence. FLP-IN systems have been described in, for example, U.S. Pat. Nos. 5,654,182 and 5,677,177).
[0223] The term “transfection,” “transformation,” or “transduction” as used herein, refers to the introduction of one or more exogenous polynucleotides into a host cell by using one or physical or chemical methods. Many transfection techniques are known to those of ordinary skill in the art including but not limited to calcium phosphate DNA co-precipitation (see Methods in Molecular Biology, Vol. 7, Gene Transfer and Expression Protocols, Ed. E. J. Murray, Humana Press (1991)); DEAE-dextran; electroporation; cationic liposome-mediated transfection; tungsten particle-facilitated microparticle bombardment (Johnston, S. A., Nature 346: 776-777 (1990)); and strontium phosphate DNA co-precipitation (Brash D. E. et al. Molec. Cell. Biol. 7: 2031-2034 (1987). Phage or retroviral vectors can be introduced into host cells, after growth of infectious particles in packaging cells that are commercially available.
[0224] The terms “cells,” “cell cultures,” “cell line,” “recombinant host cells,” “recipient cells” and “host cells” are often used interchangeably and will be clear from the context in which they are used. These terms include the primary subject cells and any progeny thereof, without regard to the number of transfers. It should be understood that not all progeny are exactly identical to the parental cell (due to deliberate or inadvertent mutations or differences in environment). However, such altered progeny are included in these terms, so long as the progeny retain the same functionality as that of the originally transformed cell. For example, though not limited to, such a characteristic might be the ability to produce a particular recombinant protein. A “mutator positive cell line” is a cell line containing cellular factors that are sufficient to work in combination with other vector elements to effect hypermutation. The cell line can be any of those known in the art or described herein. A “clone” is a population of cells derived from a single cell or common ancestor by mitosis.
[0225] A “reporter gene” refers to a polynucleotide that confers the ability to be specifically detected, (or detected and selected) typically when expressed with a cell of interest. Numerous reporter gene systems are known in the art and include, for example alkaline phosphatase (Berger, J., et al., Gene 66 1-10 (1988); Kain, S R., Methods Mol. Biol. 63 49-60 (1997)), beta-galactosidase (U.S. Pat. No. 5,070,012), chloramphenicol acetyltransferase (Gorman et al., Mol. Cell. Biol. 2 1044-51 (1982)), beta glucuronidase, peroxidase, beta lactamase (U.S. Pat. Nos. 5,741,657, 5,955,604), catalytic antibodies, luciferases (U.S. Pat. Nos. 5,221,623; 5,683,888; 5,674,713; 5,650,289; 5,843,746) and naturally fluorescent proteins (Tsien, R Y, Annu. Rev. Biochem. 67 509-544 (1998)). The term “reporter gene,” also includes any peptide which can be specifically detected based on the use of one or more, antibodies, epitopes, binding partners, substrates, modifying enzymes, receptors, or ligands that are capable of, or desired to (or desired not to), interact with the peptide of interest to create a detectable signal. Reporter genes also include genes that can modulate cellular phenotype.
[0226] The term “selectable marker gene” as used herein, refers to polynucleotides that allow cells carrying the polynucleotide to be specifically selected for or against, in the presence of a corresponding selective agent. Selectable markers can be positive, negative or bifunctional. Positive selectable markers allow selection for cells carrying the marker, whereas negative selectable markers allow cells carrying the marker to be selectively eliminated. The selectable marker polynucleotide can either be directly linked to the polynucleotides to be expressed, or introduced into the same cell by co-transfection. A variety of such marker polynucleotides have been described, including bifunctional (i.e., positive/negative) markers (see, e.g., WO 92/08796, published May 29, 1992, and WO 94/28143, published Dec. 8, 1994), hereby incorporated in their entirety by reference herein. Specific examples of selectable markers of drug-resistance genes include, but are not limited to, ampicillin, tetracycline, blasticidin, puromycin, hygromycin, ouabain or kanamycin. Specific examples of selectable markers are those, for example, that encode proteins that confer resistance to cytostatic or cytocidal drugs, such as the DHFR protein, which confers resistance to methotrexate (Wigler et al., Proc. Natl. Acad. Sci. USA, 77:3567 (1980); O'Hare et al., Proc. Natl. Acad. Sci. USA, 78:1527 (1981)); the GPF protein, which confers resistance to mycophenolic acid (Mulligan & Berg, Proc. Natl. Acad. Sci. USA, 78:2072 (1981)), the neomycin resistance marker, which confers resistance to the aminoglycoside G-418 (Colberre-Garapin et al., J. Mol. Biol., 150:1 (1981)); the Hygromycin protein, which confers resistance to hygromycin (Santerre et al., Gene, 30:147 (1984)); murine Na+, K+-ATPase alpha subunit, which confers resistance to ouabain (Kent et al., Science, 237:901-903 (1987); and the Zeocin™ resistance marker (available commercially from Invitrogen). In addition, the herpes simplex virus thymidine kinase (Wigler et al., Cell, 11:223 (1977)), hypoxanthine-guanine phosphoribosyltransferase (Szybalska & Szybalski, Proc. Natl. Acad. Sci. USA, 48:2026 (1962)), and adenine phosphoribosyltransferase (Lowy et al., Cell, 22:817 (1980)) can be employed in tk-, hgprt- or aprt-cells, respectively. Glutamine synthetase permits the growth of cells in glutamine (GS)-free media (see, e.g., U.S. Pat. Nos. 5,122,464; 5,770,359; and 5,827,739). Other selectable markers encode, for example, puromycin N-acetyl transferase or adenosine deaminase.
[0227] “Homology” or “identity” or “similarity” refers to sequence similarity between two peptides or between two nucleic acid molecules. Homology and identity can each be determined by comparing a position in each sequence which may be aligned for purposes of comparison. When an equivalent position in the compared sequences is occupied by the same base or amino acid, then the molecules are identical at that position; when the equivalent site occupied by the same or a similar amino acid residue (e.g., similar in steric and/or electronic nature), then the molecules can be referred to as homologous (similar) at that position. Expression as a percentage of homology/similarity or identity refers to a function of the number of identical or similar amino acids at positions shared by the compared sequences. A sequence which is “unrelated” or “non-homologous” shares less than 40% identity, less than 35% identity, less than 30% identity, or less than 25% identity with a sequence of the present invention. In comparing two sequences, the absence of residues (amino acids or nucleic acids) or presence of extra residues also decreases the identity and homology/similarity.
[0228] The term “homology” describes a mathematically based comparison of sequence similarities which is used to identify genes or proteins with similar functions or motifs. The nucleic acid and protein sequences of the present invention may be used as a “query sequence” to perform a search against public databases to, for example, identify other family members, related sequences or homologs. Such searches can be performed using the NBLAST and XBLAST programs (version 2.0) of Altschul, et al. (1990) J. Mol. Biol. 215:403-10. BLAST nucleotide searches can be performed with the NBLAST program, score=100, wordlength=12 to obtain nucleotide sequences homologous to nucleic acid molecules of the invention. BLAST protein searches can be performed with the XBLAST program, score=50, wordlength=3 to obtain amino acid sequences homologous to protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as described in Altschul et al., (1997) Nucleic Acids Res. 25(17):3389-3402. When utilizing BLAST and Gapped BLAST programs, the default parameters of the respective programs (e.g., XBLAST and BLAST) can be used (See www.ncbi.nlm.nih.gov).
[0229] As used herein, “identity” means the percentage of identical nucleotide or amino acid residues at corresponding positions in two or more sequences when the sequences are aligned to maximize sequence matching, i.e., taking into account gaps and insertions. Identity can be readily calculated by known methods, including but not limited to those described in (Computational Molecular Biology, Lesk, A. M., ed., Oxford University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. W., ed., Academic Press, New York, 1993; Computer Analysis of Sequence Data, Part I, Griffin, A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991; and Carillo, H., and Lipman, D., SIAM J. Applied Math., 48: 1073 (1988). Methods to determine identity are designed to give the largest match between the sequences tested. Moreover, methods to determine identity are codified in publicly available computer programs. Computer program methods to determine identity between two sequences include, but are not limited to, the GCG program package (Devereux, J., et al., Nucleic Acids Research 12(1): 387 (1984)), BLASTP, BLASTN, and FASTA (Altschul, S. F. et al., J. Molec. Biol. 215: 403-410 (1990) and Altschul et al. Nuc. Acids Res. 25: 3389-3402 (1997)). The BLAST X program is publicly available from NCBI and other sources (BLAST Manual, Altschul, S., et al., NCBI NLM NIH Bethesda, Md. 20894; Altschul, S., et al., J. Mol. Biol. 215: 403-410 (1990). The well known Smith Waterman algorithm may also be used to determine identity.
[0230] A “heterologous” region of the DNA construct is an identifiable segment of DNA within a larger DNA molecule that is not found in association with the larger molecule in nature. Thus, when the heterologous region encodes a mammalian gene, the gene will usually be flanked by DNA that does not flank the mammalian genomic DNA in the genome of the source organism. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., a cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons different than the native gene). Allelic variations or naturally-occurring mutational events do not give rise to a heterologous region of DNA as defined herein.

SHM Related Terminology

[0231] The term “activation-induced cytidine deaminase” or (“AID”) refers to members of the AID/APOBEC family of RNA/DNA editing cytidine deaminases capable of mediating the deamination of cytosine to uracil within a DNA sequence. (See generally Conticello et al., Mol. Biol. Evol. 22 No 2 367-377 (2005), Evolution of the AID/APOBEC Family of Polynucleotide (Deoxy)cytidine Deaminases); U.S. Pat. No. 6,815,194). Suitable AID enzymes include all vertebrate forms of the enzyme, including, for example, primate, rodent, avian and bony fish. Representative examples of AID enzymes include without limitation, human (accession No. NP_065712), rat, chicken, canine and mouse (accession No. NP_033775) forms. In one embodiment, AID enzymes include the mutation L198A.
[0232] The term “AID homolog” refers to the enzymes of the Apobec family and include, for example, Apobec-1, Apobec3C or Apobec3G (described, for example, by Jarmuz et al., (2002) Genomics, 79: 285-296) (2002)). AID and AID homologs further include, without limitation, modified polypeptides, or portions thereof, which retain the activity of a native AID/APOBEC polypeptides (e.g. mutants or muteins) that retain the ability to deaminate a polynucleotide sequence. The term “AID activity” includes activity mediated by AID and AID homologs.
[0233] The term “substrate for SHM” refers to a synthetic or semi-synthetic polynucleotide sequence which is acted upon by AID and/or error prone DNA polymerases to effect a change in the nucleic acid sequence of the synthetic or semi-synthetic polynucleotide sequence.
[0234] The term “transition mutations” refers to base changes in a DNA sequence in which a pyrimidine (cytidine (C) or thymidine (T) is replaced by another pyrimidine, or a purine (adenosine (A) or guanosine (G) is replaced by another purine.
[0235] The term “transversion mutations” refers to base changes in a DNA sequence in which a pyrimidine (cytidine (C) or thymidine (T) is replaced by a purine (adenosine (A) or guanosine (G), or a purine is replaced by a pyrimidine.
[0236] The term “base excision repair” refers to a DNA repair pathway that removes single bases from DNA such as uridine nucleotides arising by deamination of cytidine. Repair is initiated by uracil glycosylase that recognizes and removes uracil from single- or double-stranded DNA to leave an abasic site.
[0237] The term “mismatch repair” refers to the repair pathway that recognizes and corrects mismatched bases, such as those that typically arise from errors of chromosomal DNA replication.
[0238] As used herein, the term “SHM hot spot” or “hot spot” refers to a polynucleotide sequence, or motif, of 3-6 nucleotides that exhibits an increased tendency to undergo somatic hypermutation, as determined via a statistical analysis of SHM mutations in antibody genes (see Tables 2 and 3 which provide a relative ranking of various motifs for SHM, and Table 6 which lists canonical hot spots and cold spots). The statistical analysis can be extrapolated to analysis of SHM mutations in non-antibody genes as described elsewhere herein. For the purposes of graphical representations of hot spots in Figures, the first nucleotide of a canonical hot spot is represented by the letter “H.”
[0239] Likewise, as used herein, a “SHM coldspot” or “cold spot” refers to a polynucleotide or motif, of 3-6 nucleotides that exhibits a decreased tendency to undergo somatic hypermutation, as determined via a statistical analysis of SHM mutations in antibody genes (see Tables 2 and 3 which provide a relative ranking of various motifs for SHM, and Table 6 which lists canonical hot spots and cold spots). The statistical analysis can be extrapolated to analysis of SHM mutations in non-antibody genes as described elsewhere herein. For the purposes of graphical representations of cold spots in Figures, the first nucleotide of a canonical cold spot is represented by the letter “C.”
[0240] The term “somatic hypermutation motif” or “SHM motif” refers to a polynucleotide sequence that includes, or can be altered to include, one or more hot spots or cold spots, and which encodes a defined set of amino acids. SHM motifs can be of any size, but are conveniently based around polynucleotides of about 2 to about 20 nucleotides in size, or from about 3 to about 9 nucleotides in size. SHM motifs can include any combination of hot spots and cold spots, or may lack both hot spots and cold spots.
[0241] The term “preferred SHM motif” refers to an SHM motif that includes one or more preferred (canonical) SHM codons (See Table 6 and Table 9 infra).
[0242] The terms “preferred hot spot SHM codon,” “preferred hot spot SHM motif,” “preferred SHM hot spot codon” and “preferred SHM hot spot motif,” all refer to a codon including, but not limited to codons AAC, TAC, TAT, AGT, or AGC. Such sequences may be potentially embedded within the context of a larger SHM motif, recruits SHM mediated mutagenesis and generates targeted amino acid diversity at that codon.
[0243] As used herein, a polynucleotide sequence has been “optimized for SHM” if the polynucleotide, or a portion thereof has been altered to increase or decrease the frequency and/or location of hot spots and/or cold spots within the polynucleotide. A polynucleotide that has been made “susceptible to SHM” if the polynucleotide, or a portion thereof, has been altered to increase the frequency and/or location of hot spots within the polynucleotide or to decrease the frequency (density) and/or location of cold spots within the polynucleotide. Conversely, a polynucleotide sequence has been made “resistant to SHM” if the polynucleotide sequence, or a portion thereof, has been altered to decrease the frequency (density) and/or location of hot spots within the open reading frame of the polynucleotide sequence. In general, a sequence can be prepared that has a greater or lesser propensity to undergo SHM mediated mutagenesis by altering the codon usage, and/or the amino acids encoded by polynucleotide sequence.
[0244] Optimization of a polynucleotide sequence refers to modifying about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 25%, about 50%, about 75%, about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, 100% or any range therein of the nucleotides in the polynucleotide sequence. Optimization of a polynucleotide sequence also refers to modifying about 1, about 2, about 3, about 4, about 5, about 10, about 20, about 25, about 50, about 75, about 90, about 95, about 96, about 97, about 98, about 99, about 100, about 200, about 300, about 400, about 500, about 750, about 1000, about 1500, about 2000, about 2500, about 3000 or more, or any range therein of the nucleotides in the polynucleotide sequence such that some or all of the nucleotides are optimized for SHM-mediated mutagenesis. Reduction in the frequency (density) of hot spots and/or cold spots refers to reducing about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 25%, about 50%, about 75%, about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, 100% or any range therein of the hot spots or cold spots in a polynucleotide sequence. Increasing the frequency (density) of hot spots and/or cold spots refers to increasing about 1%, about 2%, about 3%, about 4%, about 5%, about 10%, about 20%, about 25%, about 50%, about 75%, about 90%, about 95%, about 96%, about 97%, about 98%, about 99%, 100% or any range therein of the hot spots or cold spots in a polynucleotide sequence.
[0245] The position or reading frame of a hot spot or cold spot is also a factor governing whether SHM mediated mutagenesis that can result in a mutation that is silent with regards to the resulting amino acid sequence, or causes conservative, semi-conservative or non conservative changes at the amino acid level. As discussed below, these design parameters can be manipulated to further enhance the relative susceptibility or resistance of a nucleotide sequence to SHM. Thus both the degree of SHM recruitment and the reading frame of the motif are considered in the design of SHM susceptible and SHM resistant polynucleotide sequences.
[0246] As used herein, “somatic hypermutation” or “SHM” refers to the mutation of a polynucleotide sequence initiated by, or associated with the action of activation-induced cytidine deaminase, uracil glycosylase and/or error prone polymerases on that polynucleotide sequence. The term is intended to include mutagenesis that occurs as a consequence of the error prone repair of the initial lesion, including mutagenesis mediated by the mismatch repair machinery and related enzymes.
[0247] As used herein, the term “UDG” refers to uracil DNA glycosylase, one of several DNA glycosylases that recognize different damaged DNA bases and remove them before replication of the genome. Typically, DNA glycosylases remove DNA bases that are cytotoxic or cause DNA polymerase to introduce errors, and are part of the base excision repair pathway for DNA. Uracil DNA glycosylase recognizes uracil in DNA, a product of cytidine deamination, leading to its removal and potential replacement with a new base.
[0248] The term “pol eta” (also called PolH, RAD30A, XPV, XP-V) refers to a low-fidelity DNA polymerase that plays a role in relication through lesions, for instance, replication through UV-induced thymidine dimers. The gene for pol eta is defective in Xeroderma pigmentosum variant type protein, XPV. On non-damaged DNA, pol eta misincorporates incorrect nucleotides at a rate of approximately 3 per 100 bp, and is especially error-prone when replicating through templates containing WA dinucleotides (W=A or T) (Gearhart and Wood, 2001). Pol eta has been shown to play an important role as an A/T mutator during SHM in immunoglobulin variable genes (Zeng et al., 2001). Representative examples of pol eta include without limitation, human (GenBank Accession No. BAA81666), rat (GenBank Accession No. XP_001066743), chicken (GenBank Accession No. NP 001001304), canine (GenBank Accession No. XP_532150) and mouse (GenBank Accession No. NP_109640) forms.
[0249] The term “pol theta” (also called PolQ) refers to a low-fidelity DNA polymerase that may play a role in crosslink repair (Gearhart and Wood, Nature Rev Immunol 1: 187-192 (2001)) and contains an intrinsic ATPase-helicase domain (Kawamura et al., Int. J. Cancer 109(1):9-16 (2004)). The polymerase is able to efficiently replicate through an abasic site by functioning both as a mispair inserter and as a mispair extender (Zan et al., EMBO Journal 24, 3757-3769 (2005)). Representative examples of pol theta include without limitation, human (GenBank Accession No. NP_955452), rat (GenBank Accession No. XP_221423), chicken (GenBank Accession No. XP_416549), canine (GenBank Accession No. XP_545125), and mouse (GenBank Accession No. NP_084253) forms. Pol ete and Pol theta are sometimes referred to collectively as “error prone polymerases.”

Phage Display Terminology

[0250] “Phage display” is a technique by which variant polypeptides are displayed as fusion proteins to at least a portion of a coat protein on the surface of phage, e.g., filamentous phage, particles. A utility of phage display lies in the fact that large libraries of randomized protein variants can be rapidly and efficiently sorted for those sequences that bind to a target molecule with high affinity. Display of peptide and protein libraries on phage has been used for screening millions of polypeptides for ones with specific binding properties. Polyvalent phage display methods have been used for displaying small random peptides and small proteins through fusions to either gene III or gene VIII of filamentous phage. Wells and Lowman, Curr. Opin. Struct. Biol., 3:355-362 (1992), and references cited therein. In monovalent phage display, a protein or peptide library is fused to a gene III or a portion thereof, and expressed at low levels in the presence of wild type gene III protein so that phage particles display one copy or none of the fusion proteins. Avidity effects are reduced relative to polyvalent phage so that sorting is on the basis of intrinsic ligand affinity, and phagemid vectors are used, which simplify DNA manipulations. Lowman and Wells, Methods: A companion to Methods in Enzymology, 3:205-0216 (1991).
[0251] A “phagemid” is a plasmid vector having a bacterial origin of replication, e.g., ColE1, and a copy of an intergenic region of a bacteriophage. Phagemids may be used on any known bacteriophage, including filamentous bacteriophage and lambdoid bacteriophage. Generally, the plasmid will also contain a selectable marker for antibiotic resistance. Segments of DNA cloned into these vectors can be propagated as plasmids. When cells harboring these vectors are provided with all genes necessary for the production of phage particles, the mode of replication of the plasmid changes to rolling circle replication to generate copies of one strand of the plasmid DNA and package phage particles. The phagemid may form infectious or non-infectious phage particles. This term includes phagemids, which contain a phage coat protein gene or fragment thereof linked to a heterologous polypeptide gene as a gene fusion such that the heterologous polypeptide is displayed on the surface of the phage particle.
[0252] The term “phage vector” means a double stranded replicative form of a bacteriophage containing a heterologous gene and capable of replication. The phage vector has a phage origin of replication allowing phage replication and phage particle formation. The phage is preferably a filamentous bacteriophage, such as an M13, fl, fd, Pf3 phage or a derivative thereof, or a lambdoid phage, such as lambda, 21, phi80, phi81, 82, 424, 434, etc., or a derivative thereof.
[0253] The term “coat protein” means a protein, at least a portion of which is present on the surface of the virus particle. From a functional perspective, a coat protein is any protein, which associates with a virus particle during the viral assembly process in a host cell, and remains associated with the assembled virus until it infects another host cell. The coat protein may be the major coat protein or may be a minor coat protein. A “major” coat protein is generally a coat protein which is present in the viral coat at preferably at least about 5, more preferably at least about 7, even more preferably at least about 10 copies of the protein or more. A major coat protein may be present in tens, hundreds or even thousands of copies per virion. An example of a major coat protein is the p8 protein of filamentous phage.
[0254] A “fusion protein” and a “fusion polypeptide” refer to a polypeptide having two portions covalently linked together, where each of the portions is a polypeptide having a different property. The property may be a biological property, such as activity in vitro or in vivo. The property may also be a simple chemical or physical property, such as binding to a target molecule, catalysis of a reaction, etc. The two portions may be linked directly by a single peptide bond or through a peptide linker containing one or more amino acid residues. Generally, the two portions and the linker will be in reading frame with each other.

II. Introduction to Somatic Hypermutation (SHM)

[0255] Natural mechanisms for generating antibody diversification have evolved utilizing the process of somatic hypermutation (SHM), which triggers diversification of the variable region of immunoglobulin genes, generating the secondary antibody repertoire thereby allowing affinity maturation of a humoral response. Thus, by directing hypermutation to defined hypervariable regions of an immunoglobulin (Ig) protein scaffold and applying selective pressure to identify improved antibodies, the immune system has developed a diversification strategy capable of rapidly evolving high affinity antibodies within about three weeks in response to antigen exposure.
[0256] AID is expressed within activated B cells and is an essential protein factor for SHM, as well as class switch recombination and gene conversion (Muramatsu et al., 2000; Revy et al., 2000). AID belongs to a family of enzymes, the APOBEC family, which share certain features with the metabolic cytidine deaminases but differs from them in that AID deaminates nucleotides within single stranded polynucleotides, and cannot utilize free nucleotide as a substrate. Other enzymes of the AID/APOBEC family can also act to deaminate cytidine on single stranded RNA or DNA (Conticello et al., (2005)).
[0257] The human AID protein comprises 198 amino acids and has a predicted molecular weight of 24 kDa. The human AID gene is located at locus 12p13, close to APOBEC-1. The AID protein has a cytidine/deoxycytidine deaminase motif, is dependent on zinc, and can be inhibited by tetrahydrouridine (THU) which is a specific inhibitor of cytidine deaminases.
[0258] Even prior to the discovery of AID, it was noted that SHM occurs more frequently in cytidines that are within the context of WRCY (AT/GA/C/AT) motifs. There is now accumulating evidence that this motif for SHM likely represents a composite of this hot spot motif for AID deamination and for initiating error prone repair by the DNA polymerases pol eta and pol theta (Rogozin et al. (2004); Zan et al. (2005)).
[0259] High levels of DNA transcription have been shown necessary but alone are not sufficient for AID mediated mutagenesis. In vivo, SHM begins about 80 to about 100 nucleotides from the transcription start site, but decreases in frequency as a function of distance from the promoter. AID has been shown in vitro to interact directly with the transcriptional elongation complex, but not the transcriptional initiation complex, and this interaction may be dependent upon the dissociation of the initiation factors, that occurs as the transcriptional initiation complex converts to the fully processive, elongation-competent transcription elongation complex (Besmer et al., 2006).
[0260] Since AID is only able to deaminate cytidines on single stranded DNA, it is likely that the requirement for transcription reflects the generation of single stranded regions by transcription bubbles. Studies with purified AID in vitro however suggest that AID binding is sequence independent, potentially allowing a scanning mode for hot spot capture that is driven by active transcription of the gene. In vitro studies suggest that AID has an apparent Kd for single stranded DNA in the range of 0.3 to 2 nM, and that the complex has a half life of 4-8 minutes. The turnover number of purified AID on single stranded DNA is approximately one deamination every 4 minutes, (Larijani et al., (2006)).
[0261] AID acts on DNA to deaminate cytidine residues to uracil residues on either strand of the transcribed DNA molecule. If the initial (C→U) lesion is not further modified prior to, or during DNA replication then an adenosine (A) can be inserted opposite the U nucleotide, ultimately resulting in C→T or G→A transition mutations. The significance of this change at the amino acid level depends upon the location of the nucleotide within the codon within the reading frame. If this mutation occurs in the first or second position of the codon, the result is likely to be a non conservative amino acid substitution. By contrast, if the change occurs at the third position of the codon reading frame, within the wobble position, the practical effect of the mutation at the amino level will be slight because the effect of the nucleotide change will be silent or result in a conservative amino acid substitution.
[0262] Alternatively, the C→U lesion, and potentially the neighboring bases can be acted upon by DNA repair machinery, which in SHM, leads to repair in an error prone fashion. Studies in knock out mice have established that base excision repair via uracil DNA glycosylase (UDG), plays a role in mediating the mutation of A and T residues close to hot spot motifs; (Shen et al (2006)). Additionally there is increasing evidence that the creation of abasic sites by UDG recruits error prone polymerases, such as pol eta and pol theta, and that these polymerases introduce additional mutations at all base positions in the surrounding sequence (Watanabe et al. (2004); Neuberger et al (2005)). It is believed that pol eta is central to the creation of A mutations during SHM and is particularly error prone for coding strand adenosines proceeded by A or T (W/A) that are preferentially mutated to G.
[0263] It has been observed that in antibody genes, codon usage and precise concomitant hot spot/cold spot targeting of AID activity and pol eta errors in the CDRs and FRs, respectively, has evolved under selective pressure to maximize mutations in the variable regions and minimize mutations in the framework regions (Zheng et al., JEM 201(9): 1467-1478 (2005)) for example, observed that the precise alignment of C and G nucleotides within the codons preferentially used within an antibody gene causes most C to T and G to A mutations to be silent or conservative. Juxaposed on the precise placement of Cs and Gs, Zheng et al., also observed the preferential placement of As and Ts in hot spots of pol eta in the variable regions and the exclusion from these sites in the framework regions.
[0264] The regulation of SHM in vivo and the determinants that direct and limit SHM to the Ig locus has been the subject of intense debate and experimental research. The rate of SHM observed in vivo has been shown to be at least partially dependent upon, for example, the following factors: 1) the AID expression levels and AID activity levels within a particular cell type; (Martin et al. (2002), Rucci et al., (2006)), 2) the degree of AID post translational modification and degree of nuclear localization; (McBride et al. (2006), Pasqualucci et al. (2006), Muto et al. (2006)), 3) the presence of immune locus specific enhancer regions, E-box motifs, or associated cis acting binding factors; (Komori et al. (2006), Schoetz, et al. (2006)), 4) the proximity of the targeted sequence to the transcriptional initiation site/promoter region; (Rada et al., (2001)), 5) the rate of transcription of the target sequence; (Storb et al., (2001)), 6) the degree of target gene methylation; (Larijani et al (2005)), 7) the genomic context of the target gene, if integrated into the cell's genomic DNA; 8) the presence or absence of auxiliary factors, such as Pol Eta, MSH2; (Shen et al. (2006)), 9) the existence of hotspot or coldspot sequences within the target sequence; (Zheng et al. (2005)), 10) the existence of inhibitory factors; (Santa-Marta, et al. (2006)), 11) rate of DNA repair within the cell type of interest, (Poltoratsky (2006)), 12) the formation of local DNA or RNA hairpins structures; (Steele et al. (2006)), and 13) the phosphorylation state of histone H2B (Odegard et al. (2005)).

III. Polynucleotides for Somatic Hypermutation

[0265] The degree to which a polynucleotide sequence or motif is a SHM “hot spot” or “cold spot” is derived from a statistical analysis of SHM mutations identified in antibody sequences, as described in priority U.S. application No. 60/902,414, and is shown in Tables 2 and 3 below. These Tables show the 3-mer, 4-mer, and 6-mer motifs ranked by z-score for their ability to attract SHM-mediated mutation.
[0266] 
[00003] [TABLE-US-00003]
  TABLE 2
 
  3-   3-mer     4-mer     4-mer     4-mer     4-mer
  mer   z-score   4-mer   z-score   4-mer   z-score   4-mer   z-score   4-mer   z-score
 
 
  ATA   271.09   AATA   249.23   TACC   92.73   ACGA   19.69   CTGG   −55.05
 
  AGC   185.10   AGCA   225.50   GAAA   89.97   TTTT   17.21   CGGA   −56.07
 
  TAT   178.79   ATAT   224.06   CTGC   88.23   TTCT   16.95   ACGG   −58.65
 
  CAG   176.52   AACA   215.78   CCAA   87.55   GATC   16.55   GCCT   −61.62
 
  ACA   161.58   ATAA   213.14   TATC   86.83   TGTA   15.70   CGCC   −62.50
 
  CCA   156.43   ATCA   193.93   CCCA   86.81   CCCC   14.29   CTTG   −63.02
 
  ATT   128.07   TACA   190.78   GCTA   84.30   TTCC   8.07   AGTG   −64.08
 
  AAT   123.91   CACA   183.94   CTTA   83.60   CGCA   7.95   GGAC   −66.33
 
  CAC   113.31   ACAA   182.20   GCAA   83.41   CCTG   6.44   CCCG   −68.14
 
  CAT   106.72   ATTA   174.57   ATCC   82.88   AAGT   6.21   GTGA   −69.31
 
  GCT   99.04   CAGA   172.86   GAAT   82.09   GTTA   5.83   TTGT   −70.87
 
  TCA   92.35   AACT   171.38   ATTC   80.57   GTAA   5.54   GCGA   −71.78
 
  TAC   90.32   AGAT   167.36   AGCC   79.90   GACT   5.46   GTTT   −73.35
 
  ACT   84.63   ACAG   165.72   CTCA   78.97   TCCT   4.16   GGGA   −75.77
 
  ATC   82.30   CAAC   163.72   CCAG   78.46   GACC   2.64   CGTA   −76.30
 
  AGA   78.69   TATA   159.43   AGTA   78.05   GGAT   −0.62   TCGA   −76.40
 
  CTA   71.32   ATAC   157.31   TAGC   76.80   TCTG   −1.62   CGAG   −78.05
 
  GCA   70.80   ACTA   152.17   ATTT   74.50   GCTG   −2.06   AGGG   −81.46
 
  GAT   68.06   CAGC   148.78   ACTG   74.10   GATG   −2.19   GAGT   −82.94
 
  CTG   67.83   ACCA   146.54   TCAC   71.95   ACCG   −2.66   CCGG   −85.06
 
  ACC   65.99   AAGC   145.36   CTGA   68.58   TTTC   −4.30   GAGG   −85.74
 
  GAA   59.03   AGAA   144.62   CCTA   67.05   TAGT   −4.65   GTTG   −86.35
 
  TGA   56.50   AAAA   136.44   TCTA   66.67   CGCT   −5.54   TCCG   −88.86
 
  ATG   52.18   ACAT   135.69   AATG   66.07   AGCG   −5.58   GTTC   −89.62
 
  CAA   48.79   AGCT   134.58   GCAT   65.56   CCCT   −7.38   CGGC   −90.00
 
  AAA   39.39   CAAT   133.12   ACCC   62.47   CCTC   −7.50   GCGC   −91.60
 
  AAC   37.15   GATA   131.74   TCAT   61.22   TGGA   −8.79   CTCG   −92.05
 
  TTA   35.04   ACAC   130.35   TGCT   61.11   CTGT   −10.50   TGGC   −92.93
 
  TAA   31.78   ATCT   128.86   CTAG   59.03   GTAT   −10.53   TCGC   −96.14
 
  AAG   24.73   CACC   125.86   ACTT   58.98   TATG   −13.14   TGTG   −96.30
 
  CTT   17.61   CATA   125.75   AGAG   58.81   AAGG   −13.25   TTGG   −100.73
 
  TTC   16.92   ATAG   121.65   TTAC   57.51   CCGC   −13.98   GGTT   −102.17
 
  GTA   15.61   TAAT   121.29   TTTA   56.94   ATGG   −13.99   GCCG   −104.21
 
  TAG   13.84   CAAA   121.00   TCAG   56.45   CGAA   −14.21   CCGT   −105.94
 
  GGA   11.44   TATT   120.42   ATGC   54.70   TCTT   −15.45   GTCT   −108.78
 
  TTT   6.80   CTAA   119.93   AGAC   53.01   TGAC   −16.19   GGCC   −110.06
 
  AGT   2.60   CATC   118.61   TGAT   51.51   CCTT   −16.61   GACG   −112.93
 
  CTC   −1.47   TTCA   117.73   GCAC   51.04   CACG   −19.16   TGGT   −115.42
 
  TCC   −5.22   AAAC   116.35   AGGA   50.16   GGCA   −21.99   GTGC   −117.74
 
  CCT   −5.42   TTAT   114.64   TAAG   49.76   TCCC   −23.02   TTCG   −118.98
 
  CCC   −7.09   AAAT   114.43   CAGT   49.09   AACG   −26.20   ACGT   −121.92
 
  GAG   −8.26   CCAT   113.51   ACTC   46.69   CGAT   −27.41   GCGG   −124.24
 
  TGC   −14.70   ACCT   111.92   AGTT   45.47   AGGT   −29.09   TGCG   −126.58
 
  TCT   −18.88   TAAC   111.26   CAAG   43.20   TCTC   −29.53   TGGG   −127.63
 
  GAC   −23.11   CTAT   110.83   CTCC   43.07   TTGC   −29.86   GTCC   −128.75
 
  AGG   −27.85   TAAA   110.30   GTAC   42.84   CCGA   −32.32   GGGC   −132.40
 
  GCC   −38.10   CCAC   110.05   GAAC   42.62   TGAG   −34.69   GGGG   −133.41
 
  TGG   −40.97   AATT   109.92   GAGC   41.24   ATGT   −34.90   TCGT   −135.34
 
  TTG   −43.86   TGCA   107.12   GCCA   40.88   TAGG   −37.28   GGTG   −135.80
 
  ACG   −61.29   CATT   106.83   GCTT   39.88   GGCT   −38.30   CGTT   −136.77
 
  GTT   −62.25   TCAA   104.12   CAGG   37.16   GCCC   −40.66   TGTC   −137.57
 
  CGA   −62.60   AAAG   103.76   GATT   35.99   GGAG   −44.01   GTGT   −142.24
 
  TGT   −64.56   TACT   101.53   GACA   35.71   TGTT   −44.49   CGGT   −144.04
 
  GGC   −70.30   AAGA   100.90   CTTC   34.67   CGAC   −45.06   GTGG   −149.24
 
  CGC   −82.93   CACT   100.32   CTCT   33.87   GGTA   −46.07   CGTC   −155.95
 
  CCG   −85.43   AACC   99.86   GAAG   31.97   AGGC   −46.08   GGTC   −158.84
 
  GGG   −97.46   GCAG   99.17   TTGA   31.29   TACG   −46.78   TCGG   −159.56
 
  GTG   −110.90   ATGA   98.38   CTTT   28.94   AGTC   −46.82   CGGG   −159.99
 
  GGT   −112.41   CTAC   95.93   TTAG   27.86   ACGC   −47.10   GGGT   −162.17
 
  CGG   −116.32   TCCA   95.63   GGAA   26.38   ATCG   −48.15   GGCG   −171.27
 
  GCG   −118.80   AATC   95.61   ATTG   25.55   GTCA   −52.15   CGCG   −172.40
 
  TCG   −125.83   TGAA   93.81   CATG   24.39   TTTG   −52.48   CGTG   −180.34
 
  GTC   −126.67   TTAA   93.67   GCTC   22.00   GTAG   −53.73   GCGT   −194.57
 
  CGT   −130.10   TAGA   93.03   GAGA   21.55   TGCC   −54.56   GTCG   −207.74
 
[0267] 
[00004] [TABLE-US-00004]
    TABLE 3
   
      6-mer z-
    6-mer   score
   
 
    ACAGCT   266.45
   
    ATTAAT   248.7
   
    ATAATA   227
   
    CAGCTA   223.27
   
    AATATA   220.6
   
    AATACA   215.65
   
    AGCTAC   211.24
   
    AGATAT   211.07
   
    AGCTAA   210.24
   
    ATATAT   209.3
   
    AATACT   203.19
   
    ATATAC   192.44
   
    ATAACT   190.78
   
    ATATTA   189.76
   
    ATAGCA   186.89
   
    ATACCA   186.58
   
    ATACAA   181.41
   
    GCAGCT   180.69
   
    ATTACA   180.46
   
    CAGCTC   180.29
   
    ATAGCT   180.08
   
    AATAAT   179.41
   
    AGCTAT   178.14
   
    CAGCTT   176.31
   
    ATATCT   174.41
   
    AGCTGC   169
   
    CAGCTG   167.78
   
    AGCTGA   167.41
   
    AATAAA   167.35
   
    ACTACA   167.11
   
    AACAGC   167.08
   
    ATTATT   166.89
   
    AAGCTA   166.44
   
    ACTACT   164.71
   
    AATACC   164.29
   
    TATTAT   164.1
   
    ACAGCA   161.72
   
    AGCAGA   160.66
   
    AGCAAT   159.61
   
    TAATAC   159.28
   
    AATCCA   156.67
   
    AATAGA   156.3
   
    TATACA   155.5
   
    AGCTCC   153.55
   
    CATATA   152.22
   
    ATACAT   151.77
   
    TATATT   150.71
   
    TAATAT   150.37
   
    ATTACT   150.2
   
    TCAGCT   149.79
   
    AACTAC   149.11
   
    AAAGCT   148.88
   
    CAGCAT   147.47
   
    ATACAC   147.42
   
    ATAGAT   147.33
   
    ATCAGC   147.06
   
    AGATAC   146.34
   
    AGCACA   146.01
   
    CAGATA   145.75
   
    TAGCTA   145.22
   
    TTAGCT   144.8
   
    AAGCTG   143.55
   
    CACAGC   141.38
   
    ACAACT   140.89
   
    CATACA   139.87
   
    AGCAGC   139.64
   
    ACTATT   139.36
   
    CCAGCT   137.43
   
    GATACA   136.87
   
    AGCTTC   136.64
   
    AGCTCA   136.52
   
    ACCAGC   136.02
   
    AAATAC   135.35
   
    AGCTTA   135.22
   
    AGAGCT   134.71
   
    TAACTA   134.57
   
    TACTAC   134.52
   
    AACTAT   133.79
   
    ATAAAC   132.79
   
    TAGATA   132.74
   
    AACACA   131.7
   
    CTAATA   131.46
   
    AATAGC   130.99
   
    GAGCTA   130.78
   
    ATACTA   130.56
   
    ATATCA   130.47
   
    CTACTA   130.24
   
    ATACAG   129.95
   
    CCAGCA   129.73
   
    CAGCAG   129.37
   
    AATGCA   128.88
   
    ACTAAT   128.87
   
    AGCTTT   128.11
   
    ATCCAC   128.11
   
    GAAGCT   126.98
   
    CAGCAA   126.51
   
    ACCACC   126.44
   
    GCTACA   126.36
   
    AGCTGT   126.35
   
    ATAACA   126.34
   
    AGTTAT   125.56
   
    TTACTA   125.4
   
    AATTAC   124.76
   
    AATTCA   123.97
   
    CAGCAC   123.54
   
    ACAGCC   123.25
   
    TTAATA   122.8
   
    AGTATT   122.69
   
    CAACTA   122.15
   
    CAATAA   121.87
   
    AGCAAC   121.8
   
    ATCTAC   121.63
   
    TACACC   121.61
   
    AGCACC   121.59
   
    ATAGCC   120.05
   
    TAGCTG   119.3
   
    AAAACA   119.25
   
    ATTATA   119.17
   
    AGTACT   118.38
   
    CACCAT   117.87
   
    ATCTAT   116.19
   
    ACCATT   115.23
   
    TACTAT   115.17
   
    TCAGCA   115.13
   
    AGCATA   114.84
   
    TATTAA   114.69
   
    CAAGCT   113.83
   
    AGATGA   113.27
   
    GATATA   112.88
   
    TAGCTT   112.54
   
    TATTAC   111.72
   
    AGCTCT   111.46
   
    TCACCA   111.34
   
    ATAGTA   110.66
   
    ATACCT   110.48
   
    AGCATC   109.68
   
    TATCTA   109.46
   
    TACAAC   108.83
   
    GCAGCA   108.59
   
    AGTAAT   108.57
   
    TGCACA   108.53
   
    TTTATT   108.51
   
    ATGATA   108.34
   
    CAAATA   108.12
   
    ACAATA   107.6
   
    AATAGT   107.19
   
    AACAAC   107.08
   
    CACCAG   107.01
   
    TAGCTC   106.68
   
    TACAGC   106.65
   
    AACTGA   106.63
   
    GCATAT   106.63
   
    GAGCTG   106.39
   
    ATTCAC   106.22
   
    AAATAA   105.92
   
    TAGCAA   105.71
   
    CCAGAT   105.22
   
    ACCATC   105.14
   
    AATAAC   105.1
   
    TACCAT   104.92
   
    AGAACA   104.85
   
    ATCATA   104.56
   
    ATCACC   104.5
   
    AGAAAT   104.29
   
    ATATAA   104.19
   
    CATATC   103.97
   
    ATTCCA   103.78
   
    GGAGCT   102.99
   
    TACAGA   102.58
   
    TACTAA   102.18
   
    ATCACT   102.01
   
    ATATGA   101.89
   
    AAACAG   101.82
   
    ACACAG   101.77
   
    ACACCA   101.38
   
    ACAACC   101.23
   
    TAAGCT   100.84
   
    CAATAG   100.69
   
    CTATTA   100.61
   
    TTACCA   100.56
   
    AGTACA   100.42
   
    AACCAC   100.39
   
    CCACCA   100.19
   
    AAACAC   99.94
   
    ATAAAT   99.38
   
    GCTATA   99.35
   
    GTAGCT   99.14
   
    CAGCCA   99.11
   
    TTCAGC   99
   
    AGACAC   98.97
   
    AGCACT   98.85
   
    CCAATA   98.8
   
    AAACCA   98.68
   
    CAGCCT   98.34
   
    AAGCAC   98.34
   
    ACTGCA   98.25
   
    AGAAGC   98.23
   
    CCATCA   98.1
   
    CAACCA   97.53
   
    CAACTG   97.51
   
    ATTAGC   97.37
   
    AATATT   96.98
   
    ACCACA   96.82
   
    ATATGC   96.53
   
    GTATTA   96.49
   
    CATAGC   96.33
   
    GTATAT   96.2
   
    ACCAAC   96.14
   
    CAGATC   96.05
   
    AACATA   96.05
   
    AGATCC   95.89
   
    CTACCA   95.82
   
    GATCCA   95.8
   
    ATTGCT   95.61
   
    ACCATA   95.61
   
    CATCTA   95.61
   
    CCAGCC   95.4
   
    ACCTAC   95.39
   
    TCAACT   95.32
   
    ATGCAC   95.22
   
    GAAATA   95.07
   
    TATAGC   94.95
   
    TACCAC   94.81
   
    AGCTAG   94.59
   
    CCATAT   94.32
   
    TATATA   94.2
   
    CATATT   94.16
   
    TAATAA   94.05
   
    AGAACT   93.81
   
    TATCAC   93.66
   
    CACCAC   93.38
   
    AAAGCC   93.36
   
    CTACAG   93.16
   
    GCAGAT   93.16
   
    AGATCA   93.03
   
    ACTTCA   92.78
   
    ACACAC   91.91
   
    ACCACT   91.48
   
    AAGCTT   91.27
   
    ACCAAT   90.89
   
    CTAGCT   90.83
   
    ATTTAT   90.72
   
    CAGTTA   90.71
   
    CATAGA   90.61
   
    ATACTG   90.19
   
    ATTACC   90
   
    TATCAT   89.91
   
    ACTATA   89.16
   
    TACACA   89.01
   
    GCTGAA   88.67
   
    CCATTA   88.62
   
    TGCTAT   88.19
   
    TACATA   88.12
   
    CACCAA   88.08
   
    ATAGTT   87.88
   
    CACCTA   87.77
   
    GCACCA   87.64
   
    CTATCA   87.58
   
    GCTATT   87.58
   
    TATTAG   87.34
   
    CCACCT   87.28
   
    AGAACC   87.26
   
    ACTACC   87.25
   
    TATAAT   87.06
   
    ATTTCA   86.86
   
    TAGCAG   86.76
   
    AAGCTC   86.67
   
    AACCAA   86.61
   
    AATATC   86.37
   
    TAGTAA   86.29
   
    GCTGAT   86.25
   
    TATATC   86.21
   
    TAATTA   86.14
   
    AACCAT   86.06
   
    ATAGAC   86.03
   
    CCATCT   85.84
   
    TTATTA   85.75
   
    TCAGCC   85.73
   
    ACATAC   85.65
   
    ACATAG   85.6
   
    CACAAT   85.55
   
    GTAATA   85.54
   
    GAAGCA   85.45
   
    TCATAT   85.24
   
    CAGCCC   85.03
   
    ACCTAT   84.68
   
    AGCCAC   84.68
   
    CAGTAA   84.62
   
    CCAACA   84.17
   
    AAAAGC   84.12
   
    AACTGC   83.95
   
    CCAACT   83.78
   
    ATCATT   83.47
   
    AGAGCA   83.38
   
    GATACT   83.35
   
    CCACAG   83.35
   
    ATAATT   83.26
   
    TAAACA   83.21
   
    ACATAT   82.99
   
    GCTACT   82.86
   
    CAGTAT   82.76
   
    ATCACA   82.36
   
    TCAACA   82.34
   
    AGCCCA   82.25
   
    AATTAT   82.21
   
    ATCATC   82.17
   
    TGCTAC   81.84
   
    GCTTCA   81.55
   
    CCACTA   81.49
   
    GCTGCA   81.44
   
    TAGTTA   80.97
   
    AATCAA   80.92
   
    CAATTA   80.84
   
    CTGCTA   80.71
   
    ATATAG   80.66
   
    TGCACC   80.52
   
    AAGACA   80.5
   
    TAATAG   80.31
   
    TGCAGC   80.23
   
    CCTCCA   80.17
   
    GATGCA   80.15
   
    AACTCC   80.09
   
    TCCAGC   80.02
   
    ACACTG   79.79
   
    TATAAC   79.77
   
    TTATAA   79.58
   
    CAACAA   79.5
   
    GCTAAT   79.35
   
    TGATAC   79
   
    AGATCT   78.63
   
    ATAACC   78.57
   
    AGAAAC   78.2
   
    ATTGCA   78.18
   
    AACACC   78.06
   
    TGCATT   78
   
    CAACTC   77.9
   
    GTACTA   77.86
   
    ACTCCA   77.83
   
    CAGATG   77.71
   
    TGCAGA   77.69
   
    AAGAAA   77.67
   
    TCCACC   77.66
   
    TAACCA   77.39
   
    TAACAG   77.34
   
    TTATAT   77.04
   
    TCTATT   76.92
   
    ACACTA   76.75
   
    CACTAA   76.68
   
    GTAGCA   76.59
   
    AGCCAT   76.52
   
    TCATCT   76.5
   
    CACTAT   76.28
   
    CAATAT   76.05
   
    CACAGA   76.03
   
    AGTTAC   75.97
   
    ATACTC   75.91
   
    TATATG   75.77
   
    CACTAC   75.68
   
    ATTTCT   75.56
   
    TACCAA   75.44
   
    GCAATA   75.24
   
    ATCTCA   74.72
   
    ACAGAT   74.63
   
    TCACCT   74.58
   
    CATCAG   74.49
   
    TCAGAT   74.33
   
    AGTAAC   74.08
   
    CTACAC   73.7
   
    AATGAT   73.53
   
    ATTAGT   73.5
   
    TAGTAC   73.49
   
    TAACTG   73.35
   
    AAAATA   73.29
   
    AAAACT   73.19
   
    ATTTAC   72.97
   
    ATCTGA   72.97
   
    ATCCAT   72.95
   
    ATACCC   72.75
   
    AACTTC   72.62
   
    AATACG   72.39
   
    AAATCA   72.22
   
    TTCACA   72.18
   
    CAGATT   72.08
   
    CAGAAA   71.97
   
    ACACAT   71.91
   
    AAGATA   71.91
   
    CTGCAG   71.63
   
    GCAACT   71.57
   
    GATATT   71.57
   
    AGATTC   71.53
   
    ACCAGA   71.47
   
    CTATAT   71.38
   
    TGATAT   71.06
   
    AAGAGC   70.89
   
    ATACGC   70.65
   
    CTGATA   70.47
   
    GATAAA   70.39
   
    ACATCC   70.36
   
    AAACTA   70.26
   
    ATCAAT   70.13
   
    GAAACA   70.11
   
    CATCAT   70.01
   
    AGCTTG   70.01
   
    TGAGCT   69.96
   
    CTATAA   69.96
   
    ATTCAT   69.85
   
    TACTGC   69.83
   
    CAGAGA   69.69
   
    CATTTA   69.68
   
    AGCTGG   69.06
   
    GAATCA   68.99
   
    TTATTT   68.98
   
    ATCTGC   68.96
   
    TAGCAC   68.84
   
    ATGCTA   68.58
   
    TATACT   68.54
   
    TCATCA   68.5
   
    AGATGC   68.48
   
    ATAGCG   68.46
   
    CATACT   68.15
   
    TAGCAT   68.15
   
    TACAAA   68.02
   
    TACCTA   67.99
   
    CATCTT   67.88
   
    ATCAAC   67.83
   
    ACCTTC   67.82
   
    TTAGCA   67.82
   
    AGTAGC   67.72
   
    TTGCTA   67.61
   
    TAAGCA   67.57
   
    AATATG   67.49
   
    TCACTA   67.42
   
    CATTAA   67.2
   
    AGCAAA   67.17
   
    GGCTAT   67.15
   
    ATGCAA   67.06
   
    ACACCC   67.05
   
    GCAGTA   67.04
   
    AGTAAA   67
   
    TTCACC   66.71
   
    GATACC   66.69
   
    CTACAA   66.54
   
    CTGAAA   66.27
   
    ATGTAT   66.24
   
    CACCTT   66.08
   
    ACCCAG   65.77
   
    ATATCC   65.64
   
    CAAAGC   65.58
   
    ACAGTA   65.5
   
    CATACC   65.47
   
    TGAATT   65.43
   
    TATTCA   65.2
   
    GATATC   65.15
   
    ACAAAT   65.04
   
    CCATTT   64.91
   
    AAAAAC   64.81
   
    GCTCCA   64.64
   
    AAGCCA   64.61
   
    CCTTCA   64.45
   
    GAGCTT   64.45
   
    ATAGAA   64.31
   
    TGAAGC   64.22
   
    GAACCA   64.2
   
    ACAGAC   64.16
   
    ACAGAG   64.14
   
    TGTATA   64
   
    TGAACC   63.94
   
    TTATCA   63.94
   
    AACAGA   63.94
   
    GATTCA   63.93
   
    ATGAAT   63.83
   
    GCTGCT   63.71
   
    CACACA   63.58
   
    GCAGCC   63.54
   
    TAGCCA   63.4
   
    GAGCTC   63.35
   
    AACTCA   63.19
   
    GTATCA   63.01
   
    CATAAT   62.96
   
    TCCACA   62.68
   
    CAGAAG   62.65
   
    CCCAGC   62.57
   
    CGCTAT   62.55
   
    CCTACT   62.52
   
    CAATAC   62.45
   
    CAACTT   62.28
   
    AGAATC   62.21
   
    GAGCAC   62.17
   
    TCTGCA   62.09
   
    CAATCC   61.99
   
    AGAATT   61.72
   
    CATTAC   61.65
   
    ACTGCT   61.63
   
    AACACT   61.62
   
    GTAACA   61.62
   
    TATCAG   61.58
   
    ATGAAC   61.56
   
    CAACAT   61.55
   
    TCAATA   61.47
   
    TGCATC   61.37
   
    GCACAG   61.24
   
    AGAGCC   61.12
   
    AGTATA   61.1
   
    GTAGAT   60.86
   
    TACACT   60.8
   
    TATCCA   60.75
   
    AGCATT   60.65
   
    ATTAAA   60.65
   
    ACAAGC   60.61
   
    ACTGAT   60.54
   
    CAACAG   60.42
   
    ATGCTG   60.37
   
    TATCAA   60.3
   
    AGTTGA   60.16
   
    TTTACT   60.02
   
    CTTCAC   59.96
   
    GAAGAT   59.8
   
    CATCTG   59.68
   
    ATCCCA   59.65
   
    CAACAC   59.49
   
    AACATC   59.39
   
    AAGCAG   59.37
   
    CATCAC   59.3
   
    ACTAGC   59.24
   
    ACAACA   59.21
   
    CATAAC   59.02
   
    TATTTC   58.98
   
    CCATAA   58.89
   
    CACCCT   58.6
   
    ACACCG   58.31
   
    TACTAG   58.31
   
    TGAATA   58.12
   
    ACAATC   58.11
   
    AGGAGC   58.09
   
    TGAGCA   57.87
   
    TATGAT   57.78
   
    TATACC   57.77
   
    GATATG   57.64
   
    TCTGCT   57.47
   
    AGTAGT   57.38
   
    ACCAAA   57.17
   
    TGTAAT   57.16
   
    CAGCGA   57.12
   
    AAGCAT   57.06
   
    GATGCT   57.03
   
    CATTTC   56.98
   
    AAGATG   56.93
   
    ATCCAG   56.88
   
    CATATG   56.87
   
    TGGATT   56.83
   
    TGCAAC   56.76
   
    CACCTC   56.75
   
    CAGACT   56.73
   
    ATGCAG   56.72
   
    GTAACT   56.7
   
    AGTAGA   56.45
   
    TATGCA   56.42
   
    GGAATA   56.3
   
    AGTATC   56.23
   
    CATTAG   56.19
   
    CAGTAC   56.18
   
    TACATC   56.14
   
    AAAGCA   56.13
   
    TCTCCA   56.01
   
    ACAGAA   55.96
   
    GGAGCA   55.88
   
    CAGCCG   55.8
   
    CTGCAC   55.6
   
    AGCAGT   55.46
   
    CACATA   55.45
   
    TATCTG   55.37
   
    TACTCA   55.36
   
    CTTATA   55.34
   
    GACACA   55.17
   
    TGTATT   55.14
   
    GAATCT   55.12
   
    AACAGT   55.1
   
    ATCAGA   55.06
   
    GCATCT   54.8
   
    AACTAA   54.79
   
    CAGCGC   54.76
   
    ACACAA   54.74
   
    TAACAA   54.73
   
    TGCATA   54.73
   
    TTACAG   54.68
   
    GAAGCC   54.6
   
    AAGAAC   54.37
   
    TTACTG   54.36
   
    GTTTAT   54.25
   
    ACCAGT   54.25
   
    AATCCT   54.22
   
    ACAAAG   54.18
   
    TCACAG   54.18
   
    ACTATG   54.15
   
    GATGAT   54.08
   
    TGCAAT   54.03
   
    GTAATT   53.95
   
    TTAGTA   53.95
   
    CATGAA   53.93
   
    CATCTC   53.89
   
    AGCCTC   53.8
   
    CACATT   53.79
   
    AATTAA   53.78
   
    GCACAT   53.76
   
    ATTGAT   53.75
   
    AAAACC   53.75
   
    TACCAG   53.61
   
    ACTAGT   53.57
   
    AAAGAT   53.54
   
    CTCCAA   53.42
   
    CACACT   53.37
   
    CCACAA   53.24
   
    TACAAT   53.13
   
    CTATTG   53.01
   
    TAGTAG   52.94
   
    GATCAT   52.84
   
    AATCAT   52.81
   
    ATTCAG   52.71
   
    AGTACC   52.64
   
    AAAAAT   52.58
   
    CAGAAC   52.37
   
    ACAGTT   52.35
   
    TGAAAT   52.33
   
    GAGATC   52.3
   
    CATTCA   52.24
   
    CGAGCT   52.22
   
    GATAGC   52.17
   
    TCATTA   52.11
   
    CTCCAG   52.03
   
    CAGAGC   51.98
   
    TGCTGA   51.92
   
    CCAAGA   51.92
   
    ATAAGC   51.86
   
    TTACAC   51.85
   
    AGATGG   51.72
   
    TCTACT   51.69
   
    TTACAA   51.68
   
    TGCAAA   51.62
   
    TAGTAT   51.42
   
    TTTATC   51.26
   
    CCCAGA   51.25
   
    GACTAC   51.19
   
    ATTCTA   51.19
   
    CAAAAA   51.15
   
    ATACTT   51.15
   
    ATACGA   51.08
   
    ATCTTC   51.06
   
    ACATCA   51.04
   
    AACCCA   51
   
    CATAAA   50.95
   
    TGAAGA   50.88
   
    TAGATG   50.83
   
    CTGCAT   50.78
   
    CAAGCA   50.65
   
    AAATCC   50.5
   
    GAACTA   50.47
   
    CTATGA   50.36
   
    ACTTAT   50.3
   
    CCAAAT   50.25
   
    CCTGCA   50.24
   
    TACTCC   50.15
   
    GAGCAG   50.07
   
    TACCCA   50.02
   
    ACCTCC   49.97
   
    GTTATA   49.88
   
    CATCAA   49.87
   
    TGATAA   49.86
   
    AATCAC   49.84
   
    ATTAGA   49.71
   
    CATCCC   49.63
   
    GTATTT   49.61
   
    ACCTGA   49.59
   
    ACTGAA   49.51
   
    CATCCA   49.5
   
    TAACAC   49.46
   
    AGAGAT   49.39
   
    AGCATG   49.33
   
    CAACCC   49.27
   
    ACTTCT   49.23
   
    ATGATC   49.2
   
    GATAGA   49.19
   
    GAACAG   48.99
   
    CCAAAA   48.88
   
    GAAACT   48.8
   
    GACAGC   48.76
   
    CAATGA   48.7
   
    ACAAGA   48.64
   
    CTCAGA   48.55
   
    AGATAA   48.54
   
    CTAGCA   48.43
   
    ATCAAA   48.36
   
    TCTTCA   48.34
   
    GATGAA   48.34
   
    ATCCAA   48.27
   
    AACCAG   48.27
   
    CACATC   48.25
   
    TCCAAC   48.16
   
    TAAAGC   48.1
   
    AGACCC   48.09
   
    CAGGAA   48.07
   
    TTAACA   48.04
   
    TTATTG   48
   
    CATGGA   47.99
   
    CTTCCA   47.96
   
    CAGTTG   47.94
   
    ATATGG   47.86
   
    GTATCT   47.79
   
    CTTCAA   47.73
   
    GAGAAC   47.72
   
    TTCACT   47.71
   
    AAAGAA   47.71
   
    ACACCT   47.51
   
    AGTTCA   47.47
   
    ACCTGC   47.45
   
    TATGCT   47.44
   
    TTGTAT   47.43
   
    ACAGGC   47.42
   
    TCCATA   47.27
   
    TATTCC   47.17
   
    GGCTGA   47.15
   
    TGCTAA   47.05
   
    ACCCCA   46.96
   
    GTAGTA   46.89
   
    ATCCTA   46.79
   
    CGCATA   46.68
   
    AATTCT   46.54
   
    GGATCT   46.23
   
    TTATAG   46.2
   
    ACTAAA   46.2
   
    CAGACA   46.2
   
    GTACCA   46.16
   
    CAAAGA   46.13
   
    ACTCCT   46.11
   
    CACAGT   46.1
   
    AAACCT   46.05
   
    CGCTGA   46.02
   
    AATGAA   45.98
   
    GTTACT   45.95
   
    TACAAG   45.86
   
    AGGAAT   45.81
   
    ACTCAA   45.79
   
    ATGACA   45.7
   
    ACCATG   45.69
   
    CATAGT   45.61
   
    ATATTG   45.6
   
    AGGTAT   45.57
   
    CTCAGC   45.54
   
    ATATTC   45.46
   
    CTACTC   45.36
   
    TACAGG   45.33
   
    CCTCAG   45.33
   
    CACTGC   45.24
   
    GCACCT   45.13
   
    ACTATC   45.05
   
    CTGCTG   44.96
   
    AGCCTT   44.9
   
    GGTATT   44.89
   
    TAAATA   44.79
   
    TTCCAC   44.78
   
    CAAAAG   44.78
   
    TTTCAG   44.77
   
    TAATGA   44.74
   
    TTACAT   44.73
   
    AACCCC   44.73
   
    ATGGTA   44.66
   
    CACTGA   44.64
   
    CAAATC   44.64
   
    CATGCT   44.62
   
    GCTTCT   44.61
   
    TCCATC   44.59
   
    TCAGTT   44.56
   
    ACTGCC   44.54
   
    CTTCAT   44.49
   
    TGCTCA   44.45
   
    TGGAAT   44.41
   
    CTTCAG   44.4
   
    ACATCT   44.4
   
    CACCTG   44.39
   
    ATGCAT   44.36
   
    CCAACC   44.33
   
    CATTAT   44.25
   
    CTAGTA   44.22
   
    TACAGT   44.18
   
    TACTGA   44.12
   
    CTACTG   44.1
   
    TAGAAT   44.07
   
    ACAGCG   44.06
   
    ATGGAT   44.04
   
    TTCATA   43.92
   
    ATAAAA   43.84
   
    ACTCAG   43.83
   
    CTGCAA   43.65
   
    CAGGCT   43.52
   
    TGATAG   43.5
   
    AGAGAC   43.5
   
    CCATGA   43.49
   
    CTACTT   43.4
   
    ACATTA   43.36
   
    GAATAG   43.29
   
    GCAGTT   43.25
   
    CACAAA   43.25
   
    TGAACT   43.25
   
    TGAGAT   43.21
   
    CACTAG   43.13
   
    CCCCAT   43.06
   
    CTAACA   42.92
   
    CCAGTA   42.86
   
    CTCCAT   42.76
   
    CAAGAT   42.74
   
    GAACCC   42.71
   
    CCAGAA   42.65
   
    TTCATC   42.62
   
    AACCTG   42.6
   
    AGCCCC   42.52
   
    CCTACA   42.47
   
    GGATAT   42.47
   
    TCCACT   42.41
   
    ATTACG   42.39
   
    AAGATC   42.32
   
    AGCCTA   42.29
   
    ACACGG   42.21
   
    CTGAAT   42.18
   
    CTATTC   42.04
   
    ACAATG   42.01
   
    TCATAA   42
   
    TGAATC   41.89
   
    ATCAGT   41.74
   
    GATTTA   41.74
   
    AATCTG   41.72
   
    GCTGGA   41.71
   
    AGCGAT   41.68
   
    TATTTT   41.67
   
    GAATCC   41.64
   
    TTTACC   41.63
   
    AGCAGG   41.62
   
    AAATAT   41.58
   
    ATTATC   41.55
   
    GAGATA   41.47
   
    CCAGGA   41.41
   
    TCATAG   41.39
   
    GCTTTT   41.33
   
    ATGACT   41.26
   
    GAACTG   41.19
   
    CTGAAC   41.19
   
    GGCTAC   41.14
   
    AGCTCG   41.12
   
    ACCCAC   41.04
   
    CAATCA   41.01
   
    AGCGCA   40.99
   
    ACTCCC   40.96
   
    CTCCAC   40.95
   
    AATCTA   40.93
   
    GCATCA   40.9
   
    ATTTTT   40.87
   
    TGAAAA   40.84
   
    TCACAT   40.84
   
    ATTCCT   40.83
   
    TTGATA   40.69
   
    CACAAC   40.69
   
    TATTGA   40.61
   
    AGGCTG   40.57
   
    AATGCT   40.53
   
    TATTTG   40.53
   
    CAGGTA   40.51
   
    CATGCA   40.5
   
    AAACTG   40.46
   
    AACAAA   40.38
   
    CTTTCA   40.38
   
    CAAACT   40.38
   
    TATTTA   40.37
   
    GGAACA   40.37
   
    GCCACT   40.35
   
    CGCAGC   40.24
   
    TAAATC   40.2
   
    AGGTAC   40.19
   
    ACTGTA   40.17
   
    GAAGGA   40.16
   
    CAGTTC   40.09
   
    TTTTAC   40.04
   
    TGAACA   40
   
    GCTATC   39.99
   
    GCTTTA   39.98
   
    ATTAAC   39.98
   
    GAATAT   39.96
   
    CCATCC   39.94
   
    TACCTG   39.93
   
    CAAACC   39.91
   
    CACTTC   39.84
   
    TTATAC   39.76
   
    TTGCAT   39.73
   
    CTGTAT   39.67
   
    GAAACC   39.64
   
    AGTGAT   39.53
   
    CAAGCC   39.3
   
    AGGATT   39.29
   
    CAGTAG   39.29
   
    AGAATA   39.23
   
    ATGCCA   39.23
   
    GTGATA   39.2
   
    AATCCC   39.2
   
    AACAAT   39.16
   
    GAAGAA   39.02
   
    TAACAT   39
   
    CAAACA   38.97
   
    AGGATA   38.8
   
    AAATGG   38.8
   
    TTTAAT   38.75
   
    TTTACA   38.66
   
    GACACC   38.6
   
    CTTACT   38.54
   
    TAAAAC   38.52
   
    TCAGCG   38.41
   
    TTTGCA   38.37
   
    ACAAAC   38.35
   
    GATCTC   38.32
   
    TGGATC   38.23
   
    AAAAAA   38.16
   
    CACGAT   38.16
   
    TTTTCA   38.15
   
    AAACAA   38.11
   
    AATCAG   38.1
   
    ATGAGA   38.04
   
    CCAATT   38.03
   
    CTATAC   37.99
   
    AGGACA   37.98
   
    GAACAA   37.98
   
    TCCAAA   37.84
   
    TTTCCA   37.82
   
    ACTGGA   37.81
   
    AAGCAA   37.77
   
    ATGAAG   37.77
   
    ACAAGG   37.76
   
    AAGCCC   37.72
   
    GCTCCT   37.68
   
    ACACGA   37.64
   
    AGCCGA   37.6
   
    CCAGCG   37.57
   
    ATCCCC   37.48
   
    TGTAGC   37.33
   
    AGCCGC   37.29
   
    TCAGAA   37.28
   
    TAAAAA   37.16
   
    GATAAT   37.15
   
    TCCTAC   37.13
   
    TACTTC   37.09
   
    GAAATG   36.99
   
    ATATTT   36.91
   
    GAACTC   36.81
   
    CTAATG   36.79
   
    AACAGG   36.76
   
    AAGGCT   36.76
   
    TCCAAT   36.72
   
    TATGAC   36.67
   
    ACCTCA   36.63
   
    TGATGA   36.62
   
    AAGCCT   36.59
   
    GAGACA   36.59
   
    ATGATT   36.47
   
    CCACCC   36.46
   
    GCAATT   36.27
   
    CCCACA   36.26
   
    TACTTA   36.25
   
    TGACCA   36.23
   
    CCATAG   36.13
   
    ATTCCC   36.08
   
    CCCACT   36.08
   
    AAACCC   35.99
   
    GAACCT   35.97
   
    GTTATT   35.96
   
    CCATAC   35.9
   
    TTCTAC   35.9
   
    ATGAGC   35.85
   
    GATCAG   35.85
   
    TATGAA   35.79
   
    CAAGAA   35.7
   
    TATAAG   35.62
   
    ATCTCC   35.59
   
    ACTACG   35.54
   
    GAACAC   35.49
   
    TATTGC   35.48
   
    TAAATG   35.47
   
    ATGAAA   35.43
   
    GATCTG   35.38
   
    TATAAA   35.37
   
    ATACGG   35.34
   
    ATTATG   35.3
   
    CAAGGA   35.22
   
    AAATAG   35.19
   
    AAGACT   35.13
   
    ACCCCC   35.07
   
    AGATTT   35.05
   
    GAGCAT   35.02
   
    CCCCAA   35.02
   
    AAATGC   35
   
    TGATCA   34.95
   
    GAGCCC   34.9
   
    ATCTGG   34.82
   
    AGAAGT   34.81
   
    ACTAAC   34.76
   
    TGGAGA   34.73
   
    TAATCA   34.7
   
    CAACCT   34.69
   
    GACCAC   34.64
   
    GTAAAA   34.56
   
    TCTACC   34.54
   
    GATTAC   34.54
   
    CCAGTT   34.52
   
    ACCAGG   34.5
   
    GCAACC   34.48
   
    ACATTT   34.47
   
    ACTTCC   34.46
   
    AAGTAC   34.43
   
    ACCTTA   34.43
   
    TAATTG   34.26
   
    CACCCA   34.26
   
    ATCTTT   34.13
   
    TTAATT   34.07
   
    TTGCAC   34.06
   
    CACCCC   34.06
   
    CATGAT   34.02
   
    ATAGGT   33.92
   
    GCTACC   33.92
   
    ATAGAG   33.86
   
    AGTTCT   33.81
   
    TGCTTA   33.8
   
    GCTGTT   33.73
   
    AAGAAT   33.68
   
    GATTCT   33.67
   
    ACCGCC   33.57
   
    ACAGGG   33.56
   
    CAAGAC   33.52
   
    CCACTG   33.47
   
    AAGTAA   33.38
   
    TGTACT   33.36
   
    CTGAAG   33.36
   
    AGACCT   33.33
   
    ACTAGA   33.32
   
    AAATCT   33.23
   
    GCTATG   33.22
   
    TTGATT   33.18
   
    TGCTGC   33.18
   
    AGAAGA   33.16
   
    AATGGA   33.11
   
    TTCCCA   33.1
   
    AATGGT   33.08
   
    GTTACA   33.07
   
    TCAGGA   33.04
   
    TACACG   32.96
   
    TTACTT   32.93
   
    TAAAGA   32.93
   
    CACTTT   32.87
   
    AACTGG   32.82
   
    CTCACC   32.81
   
    ACATGC   32.79
   
    AGCCTG   32.79
   
    TCCCAG   32.78
   
    ACATGG   32.77
   
    CACTTA   32.69
   
    CCCCCA   32.63
   
    ATGATG   32.59
   
    GCAGAG   32.58
   
    ACATAA   32.53
   
    AAAGTA   32.47
   
    AAAAGA   32.46
   
    GAACAT   32.46
   
    CAATTC   32.4
   
    CCACTT   32.39
   
    GGCTTT   32.37
   
    TTCAAC   32.34
   
    GCTTAT   32.32
   
    CAGGAT   32.32
   
    AGCCCT   32.3
   
    CAATGC   32.26
   
    TGTATC   32.2
   
    TGATCT   32.2
   
    CTGTTA   32.12
   
    ACAATT   32.12
   
    TATCTT   32.05
   
    ATTCAA   32.04
   
    TTCAAA   32.03
   
    CAGACC   31.98
   
    ACATGA   31.9
   
    CTAAGC   31.75
   
    CTAAGA   31.7
   
    ATAAAG   31.69
   
    AACTAG   31.56
   
    GTACCT   31.55
   
    AGATAG   31.51
   
    CAAAAT   31.5
   
    GTGAAT   31.48
   
    AGCCAA   31.4
   
    GAGATG   31.33
   
    GGAGAA   31.29
   
    AATTGC   31.29
   
    ATGGCT   31.23
   
    GCAAAT   31.22
   
    TAGAAC   31.2
   
    ATGGAA   31.19
   
    GATGGA   31.15
   
    CTGCTC   31.09
   
    CCAGAC   31.09
   
    ACTCAT   31.09
   
    CGAACA   31.02
   
    AGCCAG   31.01
   
    GGATAC   31.01
   
    GCAGAA   30.98
   
    GTAAAT   30.95
   
    TTTATA   30.85
   
    TGCTTC   30.8
   
    CTCAAC   30.7
   
    AAAGAC   30.65
   
    GCTCAA   30.56
   
    ACAGTC   30.55
   
    CACAAG   30.53
   
    TGGATA   30.52
   
    GCATAG   30.51
   
    ACCTGG   30.5
   
    CTCCCA   30.43
   
    TGATTC   30.33
   
    GCTGTA   30.33
   
    GCATAC   30.26
   
    TCAAGC   30.25
   
    CAGAAT   30.22
   
    TCATAC   30.18
   
    CATCCT   30.14
   
    TGAAAC   30.04
   
    AAACTC   30
   
    GCATTT   29.91
   
    AAGGAC   29.86
   
    ACAAAA   29.84
   
    GAGTAT   29.79
   
    AAATGA   29.74
   
    AGCGGA   29.72
   
    GAATTA   29.71
   
    AGTGAA   29.7
   
    AACAAG   29.69
   
    TCAAGA   29.63
   
    AACCTT   29.53
   
    GAATAA   29.53
   
    CTCACA   29.49
   
    TCACAA   29.46
   
    CCCATC   29.46
   
    TGTGCA   29.41
   
    ATTGGA   29.27
   
    ATTGAA   29.23
   
    ATAATG   29.22
   
    CCTTTA   29.21
   
    GGAACT   29.21
   
    TTCAGA   29.18
   
    GCAACA   29.12
   
    ATAATC   29.11
   
    CTCATA   29.07
   
    GAATAC   29
   
    CTGATC   29
   
    ACCAAG   28.96
   
    CACAGG   28.94
   
    ATTTCC   28.86
   
    GCATAA   28.83
   
    TCCCAC   28.82
   
    GAGCAA   28.81
   
    TCCAGA   28.65
   
    TTCCAT   28.63
   
    GGCACA   28.6
   
    TTTCTT   28.55
   
    TAAACC   28.53
   
    AAATTA   28.46
   
    CTTGCA   28.46
   
    ACCTCT   28.41
   
    TCAGTA   28.39
   
    GAAGTT   28.37
   
    TACATT   28.33
   
    GACCCA   28.32
   
    GACCAT   28.29
   
    CCACAT   28.23
   
    CATTTT   28.22
   
    ATCGCT   28.15