Hiv Reverse Transcriptase Compositions And Methods

HIV REVERSE TRANSCRIPTASE COMPOSITIONS AND METHODS

This PCT application claims priority to U.S. Provisional Application, Serial

Number 60/905,168 filed March 6, 2007, which is incorporated by reference in its entirety.

This invention was made with the support of the National Institutes of Health/ NAJD Grant Nos.: N1H-NIAJD R37 AI027690R01 (02/01/98-01/31/09) and N1H- NIGMS POl GM066671 (08/01/02-07/31/07). The United States Government may have certain rights to this invention.

Throughout this application, various publications are referenced by name or by number. Full citations for these publications may be found listed at the end of the specification and preceding the claims. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art. A Sequence Listing is also provided.

FIELD OF THE INVENTION

The present invention relates to engineered novel variants of human immunodeficiency virus (HIV) reverse transcriptase (RT), a primary target for anti-

HTV agents. The present invention provides novel HTV-RT constructs capable of being expressed in large quantity and that provide polymerase and RNase H activity.

The present invention further provides RT in a form that facilitates crystallization and high resolution structure resolution upon X-ray diffraction (better than 2.0A). Thus, the present invention facilitates high resolution determination of RT in complexes with RT drugs and RT inhibitors, thereby facilitating structure-based design of new

RT inhibitors.

BACKGROUND OF THE INVENTION The production of effective and safe treatment against harmful viruses and other pathogens such as HTV continues to be a difficult endeavor. To date, even the most successful treatments are subject to viral breakthrough by drug-resistant virus strains.

HIV-I reverse transcriptase (HTV-RT) is responsible for generating double- stranded DNA from the single stranded RNA packaged in the HIV-I virus. Twelve of the 25 approved anti-AIDS drugs target RT (hivinsite.ucsf.edu, 2007). The two classes of approved RT inhibitors are nucleoside/nucleotide RT inhibitors (NRTIs) and non-nucleoside RT inhibitors (NNRTIs). A high rate of viral turnover combined with lack of efficient proofreading activities in both the RT and human RNA polymerase II involved in HIV-I replication results in a pool of mutant viruses (Telesnitsky and Goff, 1997). The ability to mutate rapidly enables HIV-I to develop resistance to anti-AIDS drugs, sometimes within days to a few months of treatment (Larder and Kemp, 1987). New anti-AIDS drugs must overcome the resistance that limits the efficacy of existing drugs. To overcome some of these complications, it is highly desirable to develop methodologies and reagents that will direct antiviral treatments that are stable and resistant to viral breakthrough.

Protein crystal engineering through mutagenesis has been used to determine crystal structures of previously intractable drug/HIV- 1 RT complexes. Structures play an important role in designing inhibitors of RT. High-resolution structures can be critical in designing RT inhibitors but RT complexes have usually been structurally determined at ~3.0 A.

RT is a heterodimer consisting of p66 and p51 subunits with mass 66 and 51 kDa, respectively. The p51 subunit is formed when the RNase H domain of p66 is proteoiytically removed at residue 440 by HTV protease. RT crystallizes with different space groups, unit cells, and X-ray diffraction resolution depending on the complex (e.g. +/- nucleic acid, +/- NNRTI, etc.) and the RT construct. Three different RT constructs, varying in termini and strain sequence, have been used for RT/NNRTI complex crystallization, each crystallizing with characteristic space group symmetry: P2i2,2ι, (Ren et al, 1995); C2, (Kohlstaedt et al. 1992; Ding et al, 1995); and C222,, (Hogberg et al., 2001).

NNRTIs are a diverse set of inhibitors first discovered in Janssen Pharmaceuticals in 1987 (Pauwels el al, 1990). Crystal structures have been instrumental in the development of NNRTIs. Some of the major discoveries from structural studies include: 1. All NNRTIs bind in the same NNRTI binding pocket (NNIBP); 2. Different classes of NNRTIs have distinct modes of binding including the mechanism υf entrance into the NNTBP; 3. The NNIBP exists in a closed form when an NNRTI is not bound; and 4. The NNIBP is elastic, and its confirmation depends on the NNRTI bound (for review see Das et al, 2004 and 2005). The elastic nature of the NNIBP poses a challenge for computational modeling and molecular dynamic simulations as both the target and ϋgand are flexible.

NNRTIs do not affect binding of RT to the nucleic acid substrate or to dNTPs (Rittinger et al, 1995; Spence et al, 1995). Recently, evidence for the mechanism of NNRTI inhibition was shown through two crystal structures produced in the presence of ATP and Mn+2 with and without the NNRTI HBY 097. The structure with NNRTI bound contains an ATP coordinated by two Mn+2 at the polymerase active site. The coordination is not present in the NNRTI-bound crystal form. NNRTI may restrict the flexibility of the YMDD active site loop and thereby prevent the catalytic aspartate residues (185 and 186) from binding the two Mn+2 (Das et al, 1998 and 2007).

TMC278, belonging to the diarylpyrimidine (DAPY) class of NNRTIs, is currently in advanced Phase-II clinical trials in the USA and Europe. It was developed by a multidisciplinary effort involving medicinal chemists, virologists, crystallographers, molecular modelers, toxicologists, analytical chemists, and pharmacologists (Janssen et al, 2005). Short-term results from a Phase-II clinical trial were recently published showing the efficacy of TMC278 in HIV-I infected patients (Goebel et al, 2006). Antiretroviral naϊve patients were given a once daily dosage of 25, 50, 100, or 150 mg of TMC278 for seven days. Their HTV-I RNA viral loads were measured before the initial dosage and on day 8; results showed an average decrease of 1.199 logio copies/ml versus an increase of 0.002 logio copies/ml in the placebo group. The side effects were found to be less than the placebo group with headaches reported in 14% of the patients given TMC278 versus 18% in the placebo group. Bioavailability was also shown to be excellent with plasma concentrations of TMC278 not below the target concentration of 13.5 ng/ml at any of the time points tested.

Structural studies have been utilized in determining the mechanism of NNRTI- resistance mutations. Tyrl δlCys and Tyrl 88Cys directly alter the binding ability of the NNRTI to the NNIBP by elimination of π-π interactions between the protein and ligand. LeulOlIle and Glyl03Ala cause steric hindrance for the NNRTI by altering the shape of the NNIBP. LyslO3Asn uses an unexpected mechanism for resistance; it stabilizes the closed form of the NNIBP by coordinating a sodium atom with residues LyslOl and Tyrl 88. The stabilization of the NNIBP by the Lysl03Asn mutation creates an additional energetic penalty for entry that the NNRTl must compensate for by additional interactions (Hsiou et a!., 2001; Das et CtL1 2007).

Indeed, structural studies were instrumental in developing the DAPY class of NNRTIs, including TMC278/rilpivirine and TMC125/etravirine, which inhibit wild- type and drug-resistant HTV-I viruses (Janssen et al 2005). The DAPY NNRTIs have strategic flexibility, allowing them to inhibit NNRTI resistant RT (Das et al, 2004; Das et al., 2008). In early attempts to crystallize RT/TMC278 complex, the crystals failed to diffract beyond 6 A resolution. The conformational flexibility of TMC278 potentially introduced heterogeneity in the arrangement of RT molecules in the crystal lattice (Das et al. 2005), which may have been responsible for low resolution diffraction obtained in many trials over five years. Ineed, during the development of the DAPY class of compounds, structural studies using X-ray crystallography were used to determine the modes of binding and effects of resistance mutations on the potency of the inhibitor candidates. Numerous crystal structures, with and without resistance mutations, showed the existence of an NNRTl-binding pocket (NNIBP), which is not present in crystal structures where RT is not complexed with an NNRTI (Ding et al, 1995a; Ding et al, 1995b; Esnouf et al, 1995; Ren et al, 1995a; Ren et al, 1995b; Das et al, 1996; Hopkins et al, 1996; Esnouf et al, 1997; Hsiou et al, 1998; Ren el al, 1998; Hopkins et al, 1999; Ren et al, 1999; Hogberg et al, 2000; Ren et al, 2000a; Ren et al, 2000b; Chan et al, 2001 ; Hsiou et al, 2001; Ren et al, 2001; Chamberlain et al, 2002; Lindberg et al, 2002; Das et al, 2004; Hopkins et al, 2004; Pata et al, 2004; Ren et al, 2004; and Das et al, 2007). The formation of the NNIBP only in the presence of NNRTIs and the binding of a chemically diverse set of inhibitors to the same pocket is one of the major results of crystallographic studies of RT complexed with NNRTIs. The flexible nature of the NNlBP (depending on the arrangement of the NNRTI bound) limits the usefulness of non- crystal lographic structural studies (i.e., molecular dynamic modeling) due to the complexity of studying both a flexible ligand and target. Understanding the mechanism of binding and resistance to early DAPYs and other NNRTIs, by structural studies, led to the development of the current inhibitors including TMC 125- etravirine and TMC278-rilpivirine (Das et al, 2004 and 2005). TMC278 is a potent inhibitor of NNRTI-resistant HTV-I strains including the LeulOOIle/LyslO3Asn and LyslO3Asn/Tyrl 81Cys double mutants, which are resistant to all approved NNRTIs. The strategic flexibility of TMC278 may have been responsible for no diffraction quality crystals being obtained in five years of trials.

According to the approach of the present invention, restriction of RT conformations in the crystal lattice through protein engineering was employed to improve diffraction quality. The present invention therefore provides a systematic protein crystal engineering approach to solve the problem of the prior art and to obtain improved crystal structures of the RT/TMC278 complex. There are three fundamental types of protein engineering approaches for crystallography: 3) alterations affecting the suitability of the protein for biochemical study including mutagenesis and the addition of tags for expression, solubility, and purification; 2) engineering to increase the conformational homogeneity of the protein sample; and 3) modification of the protein to directly alter interactions at crystal contact interfaces (for reviews Dale el ah, 2003 and Derewenda, 2004). Examples of engineering to increase the homogeneity of the sample include addition and subsequent removal of purification tags; deletions of disordered regions including termini, loops, and domains; and replacement of highly entropic residues (e.g., lysines and glutamic acids) by the surface entropy reduction method. Rational alterations of the protein for crystallization include substitution of residues known to be required for crystallization of a homologous protein, systematic or random alteration of surface residues to create a library of potentially crystal I izable proteins, and alteration of known crystal contacts to create potentially new crystal forms.

The present invention, for the first time, provides the crystal structures of

TMC278 with and without the NNRTl-resistance mutations Leul 00Ile/Lysl03Asn and LyslO3Asn/Tyrl 81Cys. The structure of TMC278 cαmplexed with the provided engineered RT, RT52A, at 1.8 A resolution, has the highest resolution ever obtained for any HIV-I RT structure.

According to the present invention, engineered RTs were co-crystallized with TMC278, and screened for quality of X-ray diffraction data. Several iterative rounds of mutagenesis and crystallization with TMC278 were employed to produce a construct that produced improved diffraction with this important drug candidate. One construct, RT52A, provided by the present invention, is a product of multiple iterative rounds of design. RT52A produces crystals within hours to days of crystal drop generation with and without microseeding. High-resolution datasets, some better than 2.0 A, can now be produced quickly and reproducibly for most of the NNRTIs tested. Most notably, TMC278 was structurally solved to 1.8 A resolution; thousands of crystallizations prior to this effort had yielded only 8 A resolution crystal diffraction. This is compared to previous RT/inhibitor crystals, which in favorable cases formed in days to weeks with microseeding and structural resolution of 2.5 to 3.0 A. The previous highest resolution RT structure was 2.2 A. The swiftness of crystallization of this new construct allows for high-throughput structure-based design of new NNRTIs. Further protein engineering was carried out to obtain high-resolution structures of unliganded RT and RT/RNase H inhibitor complexes. The improved resolution enables a detailed understanding of drug resistance, designing improved drugs against existing targets in RT, and in finding novel sites for new types of RT inhibitors.

The present invention utilized a co-expression system that facilitates subunit- specific mutagenesis at multiple positions and the addition of a purification tag on the C or N terminus of the subunit of choice for facile purification. In the initial co- expression construct, the p51 subunit consisted of 428 residues and a hexahistidine purification tag at the C terminus (Huang et al., 1998 and Sarafianos et al., 2003). The co-expression construct codes for the p66 Q258C mutant, which is used to produce homogenous nucleic-acid cross-linked samples for X-ray crystallographic studies. This plasmid facilitates expression, purification, and crystallization of multiple RT constructs in parallel.

The present invention utilized two methods of expression/purification methods for RT. RT is a heterodimer consisting of a p66 and p51 subunit. The p51 subunit is identical to p66 with the RNase H domain proteolytically removed at residue 440. According to the first method p66 is expressed in E. colt, and then it is purified using laborious chromatography techniques. A co-purifying E. coli protease cleaves the p66 into a p66:p51 heterodimer, which is then further purified to homogeneity (Clark et al., 1990). This protein, referred to as IBl, has been extensively used for NNRTI structural studies. The second method uses co-expression of the p66 and p51. In the co-expression construct the p51 subunit terminates at residue 428, and a hexahistidine purification tag is appended at the C terminus (Sarafianos et al., 2004). The co- expression construct codes for a Q258C mutation that has been used in cross-linking experiments tυ link nucleic acid substrates; however, this construct has not been successfully used with NNRTIs. To produce large numbers of subunit specific mutants and express/purify them in parallel, the co-expression method was used. SUMMARY OF THE INVENTION

The present invention provides an isolated nucleic acid molecule encoding a peptide comprising the amino acid sequences of SEQ ID NO:1 and SEQ ID NO:2, representing the p66 and p51 subunits of HlV-RT. The present invention also provides an isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HIV-RT) wherein: (a) the amino-terminus of HIV-RT p66 comprises amino acid residues MVPISP (SEQ ID NO: 4); (b) the nucleic acid molecule encodes alanine at amino acid residue 172 of p66; (c) the nucleic acid molecule encodes alanine at amino acid residue 173 of p66; (d) the nucleic acid molecule encodes serine at amino acid residue 280 of p66; (e) the nucleic acid molecule encodes serine at amino acid residue 280 of p51; (f) the carboxy- terminus of p66 terminates at residue 555; and (g) the carboxy-terminus of HIV-RT p51 terminates at residue 428. The present invention also provides an isolated nucleic acid or portion thereof wherein the nucleic acid (a) encodes at least a portion of a human immunodeficiency virus (HTV) reverse transcriptase (RT); and (b) is capable of hybridizing under standard hybridization conditions to the provided nucleic acid sequence or complement thereof. According to specific embodiments of this invention, the provided nucleic acid is capable of hybridizing with a nucleic acid or its complement that is capable of encoding SEQ ED NO:1 or SEQ ID NO: 2.

The present invention further provides a method for generating crystallization variants of an HTV-RT- NNRTI complex, comprising the steps of: (a) truncating at least one terminus of HTV-RT; (b) reducing surface lysine acid regions; and (c) mutating at least one amino acid residue, thereby altering lattice contact from the non- mutated residue.

Finally, the present invention provides A method for identifying HlV-RT inhibitor solvent molecules comprising the steps of: (a) soaking a small molecule fragment into a crystallization variant generated by the provided method, thereby forming an HIV-RT complex with the molecule; (b) determining three dimensional structure of the complex; and (c) determining HTV-RT enzyme activity. BRIEF DESCRIPTION OF THE FIGURES

Figures lA-lC. Molecular cloning of RT. (A) 6.7 kilo-base pair expression vector with unique restriction sites for p51 and p66. (B) Diagram of RTlA and RT52A. (C) Schematic showing binding sites of the 2'-O-methylated primers used in MOE-LIC. Figures 2A-2B. Magnified images of RT crystals. (A) Images of crystals of round three mutant RT in complex with an NNRTI, CL32543. When grid is present, one grid is 0.12 mm on an edge. (B) Images of crystals from round four and five mutagenesis. Figures 3A-3B. Enzymatic activities of engineered RT round four mutants. (A) DNA-dependent DNA polymerase processivity assay using a 5' end-labeled primer annealed to single-stranded M13mpl 8 DNA. RT is allowed to bind in the absence of dNTPs. dNTPs and a "cold trap" (poly[rC]« oligo[dG]) are added and the reaction incubated at 37° . Presence of the "cold trap" limits the polymerase reaction to one cycle of extension. (B) RNase H activity assay. A 5' end-labeled RNA is annealed to a DNA primer. The template/primer is incubated with the various RTs for the indicated length of time. An untreated sample is included to show the size of the full- length RNA.

Figures 4A-4B. Structure of RT52A with TMC278 at 1.8 A. (A) Cartoon of RT52A with the p66 domains labeled. The YMDD polymerase active site is labeled in azure while TMC278 is in gray. (B) 2Fo-Fc map of TMC278 in the NNRTI binding pocket. The NNRTI binding pocket residues mutated are shown. Figure 5. Cartoon showing RT mutations.

Figure 6. Contacts in RT crystals. Residues within 4.5 A of a symmetry related residue are labeled in spheres. The lBl/NNRTI structure is PDB code 1S9E. Additional regions involved in crystal contacts in the RT52A and RT69A structures are from both the p66 and p51 subunits.

Figures 7A-7C. Resolution distribution of RTs. (A) RT52A and RT69A datasets compared to published RT data sets. Number of structures are plotted against resolution. Total unique reflections in 2.8, 2.2, and 1.8 A datasets are shown. (B) Diagram of tested RTs. Shown are mutants which produced low resolution diffracting crystals or no crystals, mutants producing medium 3-4 A resolution diffracting crystal, 2-3 A resolution diffracting crystal producing constructs, and RTs producing crystals which diffracted to better than 2 A. (C) Plot of inverse resolution of tested mutants, scaled by the one minus the exponent of the inverse of the resolution (l-Exp(l/resolution in A). Actual resolution is indicated.

Figure 8. Ramachandran plot and statistics for RT52A/TMC278.

Figures 9A-9C. RT52A/TMC278 structure and p66 fingers crystal contacts. (A) Cartoon of RT52A/TMC278 with p66 subdomains labeled. TMC278 is colored in grey and the polymerase active site in cyan. (B) Cartoon viewed from below. (C)

From the same viewing angle as B the p66 subunit makes crystal contacts with the thumb subdomain of one symmetry molecule and the RNase H domain of another molecule. Figures 10A-10B. Overlay of engineered RTs and wild-type RT. (A) Ribbon diagram of the RT structures from this study and wild-type RT/R129385 produced in

MacPyMol. (B) Alignment with secondary structure overlay. Includes the sequence from the structure of RT/nevirapine complex (PDB code: IVRT).

Figures 1 IA-I ID. TMC278 bound in the wild-type NNlBP. (A) Representation of TMC278 in the NNIBP of RT52A. Amino acids lining the NNBP are labeled. (B)

The cyanovinyl and Wing 1 reside inside the hydrophobic core of the NNTBP. (C)

Electron density defines the position of TMC278 in the NNTBP. (D) TMC278 with the cyanovinyl and labeled torsion angle locations.

Figures 12A-12C. Structural comparison of LlOOI and K103N double mutant to the wild-type NNTBP. (A) Omit map defines the position of the inhibitor. (B) Overlay showing the torsional flexibility of TMC278 (wiggling) when bound to the mutant

RT. (C) A perpendicular view showing the spatial movement of TMC278 (jiggling).

Figures 13A-13B. Structural comparison of K103N and Y181C double mutant to the wild-type NNIBP. (A) Omit map defines the position of the inhibitor. (B) Overlay showing the adjustment of Y183 to compensate for the Y181C mutation.

Figure 14. Published molecular structures of TMC 120 inhibitor used in RT engineering study.

Figure 15. Published molecular structures of TMC 125 inhibitor used in RT engineering study. Figure 16. Published molecular structures of TMC278 inhibitor used in RT engineering study.

Figure 17. Iterative approach to crystal engineering of the present invention.

Figures 18A-18E. Mutagenesis of RT of the present invention. (A) Schematic showing the binding sites (arrows) of the 2'-O-methylated primers used in MOE-LIC. (B) Annealing of the primer terminated insert and vector; 2'-O-methyl nucleotides are indicated with Me. (C) Cartoon of RT color-coded by the p66 subdomains. All mutations made in this study are indicated as spheres. The beneficial mutations are labeled. (D) Flowchart of mutants coded by crystal X-ray diffraction resolution. Stars mark mutants with improved resolution unliganded (E) Diagram of RTlA, RT52A, and RT69A.

Figures 19A-19C. Crystal Structure of RT52A with TMC278 at 1.8 A resolution. (A) Simulated annealed Fo-Fc omit map (3D contours) for TMC278. (B) Typical IB l-RT arrangement in a crystal unit cell (pdb code: 1S9E). (C) A relatively compact packing of RT52A molecules in the crystal lattice of RT52A/TMC278 complex.

Figure 20. Comparison of unit cell and X-ray diffraction resolution of mutants. Plot of unit cell (Matthews Coefficient) and X-ray diffraction resolution (A) of the mutants that produced crystals that diffracted X-rays to better than 4 A resolution. The legend table indicates the mutations and the template for each of the mutants. RT69A and RT97A are plotted based on crystals complexed with RNHIs bound, all others with NNRTIs.

Figures 21A-21C. (A) Overall structure of the wild-type HTV-I RT/TMC278 complex determined at 1.8 A resolution. (B) The position and conformation of TMC278 were defined by the difference (|Fo| -|Fc|) electron density calculated at 1.8

A resolution (3.5 D contours). (C) Chemical structure of TMC278. The D angles define the torsional flexibility of TMC278 .

Figures 22A-22B. (A) Interactions of TMC278 with NNRTI-binding pocket residues. (B) The molecular surface defines the hydrophobic tunnel that accommodates the cyanovinyl group of TMC278.

Figure 23. Superposition of Kl 03N/Y181C mutant RT /TMC278 complex on the wild-type RT/TMC278 complex. The YMDD motif in the mutant structure is repositioned closer to TMC278, this leads to an important interaction between the cyanovinyl group and the highly conserved Yl 83 residue. Despite the rearrangements in the inhibitor position and conformation and the binding-pocket residues, the extents of the inhibitor-protein interactions remain almost unchanged.

Figures 24A-24B. Comparison of L100I/K103N mutant RT /TMC278 structure with the wild-type RT /TMC278 structures reveals (A) wiggling and (B) jiggling of

TMC278. DETAILED DESCRIPTION

The present invention provides an isolated nucleic acid molecule encoding a peptide comprising the amino acid sequence of SEQ ID NO:1. SEQ ID NO: 1 encodes the p66 subunit of HTV-RT. According to one embodiment of the invention, the provided nucleic acid molecule further comprising SEQ ID NO: 2. SEQ ID NO: 2 encodes the p51 subunit of HTV-RT. According a preferred embodiment, the provided nucleic acid molecule is capable of expressing the p66/p51 heterodimer. According to a most preferred embodiment, the provided nucleic acid encodes the p66 subunit and the p51 subunit in different open reading frames. According to another preferred embodiment, separate promoters control expression of the p66 and p51 subunits. According to another embodiment the invention provides an isolated nucleic acid molecule encoding a peptide comprising the amino acid sequence of SEQ ID NO:2.

The present invention also provides an isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HTV-RT) wherein at least one terminal end of the protein is truncated. According to a preferred embodiment of this invention, truncation of an HTV-RT terminus facilitates resolution of three dimensional crystal structure. It is specifically contemplated by the invention that any combination of HTV-RT termini may be truncated so long as they facilitate resolution of the three dimensional crystal structure of the protein. According to a preferred embodiment of the invention, the

HTV-RT is complexed with a NNRTI ligand. NNRTI ligands are well known in the art and include the DAPY compounds. According to a preferred embodiment of this invention the Still further, the present invention provides HTV-RT in complex with TMC278. According to an alternative embodiment, the invention provides HIV-RT in complex with TMCl 25.

The present invention further provides an isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency vu-us reverse transcriptase (HIV-RT) wherein: (a) the amino-terminus of HIV-RT p66 comprises amino acid residues MVPISP (SEQ ID NO: 4); (b) the nucleic acid molecule encodes alanine at amino acid residue 172 of p66; (c) the nucleic acid molecule encodes alanine at amino acid residue 173 of p66; (d) the nucleic acid molecule encodes serine at amino acid residue 280 of p66; (e) the nucleic acid molecule encodes serine at amino acid residue 280 of p51 ; (f) the carboxy- terminus of p66 terminates at residue 555; and (g) the carboxy-terminus of HlV-RT p51 terminates at residue 428. According to one embodiment, the amino-terminus of p51 further comprises a human rhinovirus subtype 14 3C (HRV-14 3C) protease cleavage site, wherein the HRV-14 3C protease cleavage site is situated between a hexaHIS purification tag and the p51 coding sequence, thereby facilitating generation of a post-protease amino-terminus of gPISP upon exposure to HRV-14 3C protease under standard conditions for HRV-14 3C protease activity. According to a preferred embodiment, the isolated nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO 3. According to another embodiment, the invention provides a recombinant vector comprising the nucleic acid molecule of SEQ ID NO: 3. According to another embodiment, the present invention provides a nucleic acid molecule that encodes HlV-RT p66 and the amino-terminus of p66 begins with the amino acid residues MVPISP (SEQ ID NO: 121). According to yet another embodiment, the present invention provides a nucleic acid molecule that encodes at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HIV-RT), wherein the nucleic acid molecule encodes alanine at amino acid residue 172 of p66. According to yet another embodiment, the present invention provides a nucleic acid molecule that encodes HTV RT p66 and wherein the amino terminus of p66 comprises amino acid residues MVPISP (SEQ ED NO: 121). According to still yet another embodiment, the present invention provides a nucleic acid molecule that encodes alanine at amino acid residue 173 of p66. Still according to another embodiment, the present invention provides a nucleic acid molecule that encodes serine at amino acid residue 280 of p66. According to a further embodiment, the present invention provides a nucleic acid molecule that encodes serine at amino acid residue 280 of p51. Further still, according to another embodiment, the present invention provides a nucleic acid molecule that encodes HIV RT p66 and wherein the carboxy- terminus of p66 terminates at residue 555. It is understood that the termination residue for the naturally occurring protein is 560. Still another embodiment provides ihax the nucleic acid molecule encodes HTV RT p51 and wherein the amino-terminus of p51 comprises a human rhinovirus subtype 14 3C protease (HRV-14 3C) cleavage site. According to a preferred embodiment of this invention, the HRV-14 3C protease cleavage site is situated between a hexaHIS purification tag and the p51 coding sequence, thereby facilitating generation of a post-protease amino-terminus of gPISP upon exposure to HRV-14 3C protease under standard conditions for HRV-14 3C protease activity. According to a still further embodiment, the nucleic acid molecule encodes the carboxy-terminus of p51 terminates at residue 428. It is understood that the termination residue for the naturally occurring protein is 440. The present invention also provides the HIV-RT product of the expression of the provided nucleic acid. The present invention also provides an isolated nucleic acid or portion thereof wherein the nucleic acid (a) encodes at least a portion of a human immunodeficiency virus (HTV) reverse transcriptase (RT); and (b) is capable of hybridizing under standard hybridization conditions tα the provided nucleic acid sequence or complement thereof. According to specific embodiments of this invention, the provided nucleic acid is capable of hybridizing with a nucleic acid or its complement that is capable of encoding SEQ ID NO: or SEQ ID NO: 2. It is well understood and specifically contemplated by the present invention that the provided recombinant vector may be in the form of a replicon. According to a preferred embodiment of this invention, the vector is a plasmid. According to yet another embodiment of this invention, a host cell is transformed with the vector. According to one embodiment of this invention, the host cell is a prokaryotic cell. According to an alternative embodiment of this invention, the host cell is a eukaryotic cell. According to another embodiment, the present invention provides an isolated cell line comprising the provided nucleic acid. The present invention also provides a method for generating crystallization variants of an HIV-RT- NNRTl complex, comprising the steps of: (a) truncating at least one terminus of HTV-RT; (b) reducing surface lysine acid regions; and (c) mutating at least one amino acid residue, thereby altering lattice contact from the non- mutated residue. According to one embodiment, step b comprises reducing surface glutamic acid regions. According to another embodiment, step b comprises mutating lysine to alanine. According to still another embodiment, step b comprises mutating glutamic acid to alanine. According to a further embodiment, step c is systematic mutagenesis. According to a preferred embodiment, step c is achieved by methylated overlap extension ligation independent cloning (MOE-LIC). According to still yet another embodiment, the provided method further comprises the step of selecting mutant HTV-RT for enzymatic activity. Still a further embodiment of the method comprises the step of crystallizing the mutant HTV-RT. It is understood that it is important to minimize mutation of conserved amino acid residues. According to a further embodiment, the method further comprises the step of determining the three dimensional crystal structure of the mutant HlV-RT- NNRTl complex. According to a preferred embodiment the resolution is determined to better than about 3.0 A resolution. According to a most preferred embodiment, the resolution is determined to better than about 2.0 A resolution. The present invention also provides an HIV- RT- NNRTI complex produced by the provided method. According to a preferred embodiment, the NNRTI is a DAPY compound. According to a most preferred embodiment, the DAPY compound is selected from the group consisting of TMC278 and TMC125. The present invention also provides a L100/K103N and K103N/yl81C double mutant in the p66 subunit.

The present invention provides a method for identifying HIV-RT inhibitor solvent molecules comprising the steps of: (a) soaking a small molecule fragment into a crystallization variant generated by the provided method, thereby forming an

HIV-RT complex with the molecule; (b) determining three dimensional structure of the complex; and (c) determining HTV-RT enzyme activity.

The present invention provides a plasmid containing both the p66 and p51 subunits of RT under separate promoters. The plasmid was designed to allow facile manipulation of the subunits independently using standard molecular cloning techniques. Mutagenesis of HIV-RT was used to generate constructs capable of producing crystals to diffract X-rays to high resolution. The techniques of mutagenesis, expression, purification, crystallization, and X- ray diffraction data collection were performed in iterative cycles. The iterative search led to the invention of a construct of HIV-I RT that is biologically active and diffracts X-rays to high resolution (1.8 A resolution). The construct used for crystallization has sequence beginning with GPISP sequence after proteolytic removal of MAHHHHHHALEVLFQ using the PTRVl 4 C3 protease. The crystals of this construct, RT52A, in complexes with several NNRTIs have diffracted X-rays to better than 2.0 A resolution. The crystals have symmetry of space group C2 with approximate ceil parameters a=160-165, b=71-74, c=107-l 14 A, any= 90 and 0 = 99-103°. This unit cell is novel when compared with all crystal structures of HIV-I RT available in the Protein Data Bank. The present invention further provides Drug resistance mutations that were introduced into the plasmid and high resolution structures of mutant RT. Double mutants are provided that develop high resistance to most NNRTIs. Protein RT51 A contains two mutations in p66: Ll OOI and K103N. Protein RT55A contains two mutations in p66: K103N and Yl 81 C. These mutant RTs have also been successfully crystallized and structurally studied.

It is understood that the structures provided by the present invention using the provided methods are of highly improved quality compared to those available in the PDB and, therefore, provide reliable information on inhibitor binding pocket and ligand protein interactions. It is understood by those of skill in the art that the provided HIV-RT construct, mutants, described crystal form, and determined 3-D structure information in molecular docking and other computational tools to generate new lead RT inhibitors targeting polymerization/RNase H activity and in optimization of lead compounds. Moreover, high resolution crystal structures obtained using the construct(s) provided by the present invention provide a method to locate solvent molecules unambiguously which otherwise was not feasible using available crystal forms and techniques prior to this invention. The present invention thereby provides a fragment-based drug discovery method. According to the provided method, many small molecule fragments or cocktails of fragments (obtained commercially or through scientific collaborations) are soaked into the above described crystals of engineered RT. Binding of certain chemical fragments identify novel drug binding sites or novel modes of inhibitor binding to existing drug-binding sites. The provided plasmid produces engineered RT which is enzymaticaliy active and yields crystals that diffract X-rays to significantly high resolution (better than 2 A). The solution structures of RT and its complexes with different inhibitors are critical for design of RT inhibitors and availability of this construct and crystal form dramatically accelerates the rate of successful drug identification.

The provided plasmid produces a novel heterodimeric protein consisting of the two subunits p66 and p51. The amino-terminus of p66 begins as MVPISP while the amino terminus of p51 contains a cleavable purification tag which after cleaving leaves GPISP as the amino-terminus.

The carboxy-terminus of p66 ends at residue 555 and the carboxy-terminus of p51 ends at residue 428. The following mutations are present in p66: K172A, K 173 A, and C280S. p51 also has the C280S mutation.

These features in combination allow for improved resolution of X-ray crystal structures of the protein with inhibitors. Typical resolution of HIV- 1 reverse transcriptase with nonnucleoside reverse transcriptase inhibitors (NNRTI's) is 2.5-3.0 A resolution. With the engineered protein, called RT52A, resolution of 1.8-2.4 A is common. In addition to improved resolution the provided protein crystallizes in a fraction of the time it would take the non-engineered protein to crystallize. It now takes hours to days to crystallize instead of days to weeks.

In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Sambrook et al, "Molecular Cloning: A Laboratory Manual" (1989); "Current Protocols in Molecular Biology" Volumes I-Iϋ [Ausubel, R. M., ed. (1994)]; "Cell Biology: A Laboratory Handbook" Volumes I-1II [J. E. Celis, ed. (1994)]; "Current Protocols in Immunology" Volumes I-III [Coligan, J. E., ed. (1994)]; "Oligonucleotide Synthesis" (MJ. Gait ed. 1984); "Nucleic Acid Hybridization" [B.D. Hames & S.J. Higgins eds. (1985)]; "Transcription And Translation" [B.D. Hames & S.J. Higgins, eds. (1984)]; "Animal Cell Culture" [R.I. Freshney, ed. (1986)]; "Immobilized Cells And Enzymes" [IRL Press, (1986)]; B. Perbal, "A Practical Guide To Molecular Cloning" (1984). Therefore, if appearing herein, the following terms shall have the definitions set out below.

The amino acid residues described herein are preferred to be in the "L" isomeric form. However, residues in the "D" isomeric form can be substituted for any L-amino acid residue, as long as the desired fuctional property of immunoglobulin binding is retained by the polypeptide. NHT refers to the free amino group present at the amino terminus of a polypeptide. COOH refers to the free carboxy group present at the carboxy terminus of a polypeptide. In keeping with standard polypeptide nomenclature, abbreviations for amino acid residues are used as shown in shown in the following Table of Correspondence:

It should be noted that all amino-acid residue sequences are represented herein by formulae whose left and right orientation is in the conventional direction of amino- terminus to carboxy-terminus. Furthermore, it should be noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino-acid residues. The above Table is presented to correlate the three-letter and one-letter notations, which may appear alternately herein. It should also be noted that in addition to the standard IUPAC one-letter code for the nucleotides of DNA the following code is used herein including letters for ambiguity as follows: M is A or C; R is A or G; W is A or T; S is C or G; Y is C or T; K is G or T; V is A, C or G; H is A, C or T; D is A, G or T; B is C, G or T; and N is G, A, T or C.

A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that functions as an autonomous unit of DNA replication in vivo; i.e., capable of replication under its own control. A "vector" is a replicon, such as plasmid, phage or cosmid, to which another DNA segment may be attached so as to bring about the replication of the attached segment.

A "DNA molecule" refers to the polymeric foπn of deoxyribonucleotides

(adenine, guanine, thymine, or cytosine) in its either single stranded form, or a double-stranded helix. This term refers only to the primary and secondary structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this term includes double-stranded DNA found, inter alia, in linear DNA molecules (e.g., restriction fragments), viruses, plasm ids, and chromosomes. In discussing the structure of particular double-stranded DNA molecules, sequences may be described herein according to the normal convention of giving only the sequence in the 5' to 3' direction along the nontranscribed strand of DNA (i.e., the strand having a sequence homologous to the mRNA).

An "origin of replication" refers to those DNA sequences that participate in DNA synthesis.

A DNA "coding sequence" is a double-stranded DNA sequence which is transcribed and translated into a polypeptide in vivo when placed under the control of appropriate regulatory sequences. The boundaries of the coding sequence are determined by a start codon at the 5' (amino) terminus and a translation stop codon at the 3' (carboxyl) terminus. A coding sequence can include, but is not limited to, prokaryotic sequences, cDNA from eukaryotic mRNA, genomic DNA sequences from eukaryotic (e.g., mammalian) DNA, and even synthetic DNA sequences. A polyadenylation signal and transcription termination sequence will usually be located

3' to the coding sequence.

Another feature of this invention is the expression of the DNA sequences disclosed herein. As is well known in the art, DNA sequences may be expressed by operatively linking them to an expression control sequence in an appropriate expression vector and employing that expression vector to transform an appropriate unicellular host.

Such operative linking of a DNA sequence of this invention to an expression control sequence, of course, includes, if not already part of the DNA sequence, the provision of an initiation codon, ATG, in the correct reading frame upstream of the DNA sequence.

A wide variety of host/expression vector combinations may be employed in expressing the DNA sequences of this invention. Useful expression vectors, for example, may consist of segments of chromosomal, non-chromosomal and synthetic DNA sequences. Suitable vectors include derivatives of SV40 and known bacterial plasmids, e.g., E. coli plasmids col EI, pCRl, pBR322, pMB9 and their derivatives, plasmids such as RP4; phage DNAS, e.g., the numerous derivatives of phage λ, e.g., NM989, and other phage DNA, e.g., Ml 3 and filamentous single stranded phage DNA; yeast plasmids such as the 2μ plasmid or derivatives thereof; vectors useful in eukaryotic cells, such as vectors useful in insect or mammalian cells; vectors derived from combinations of plasmids and phage DNAs, such as plasmids that have been modified to employ phage DNA or other expression control sequences; and the like.

Any of a wide variety of expression control sequences — sequences that control the expression of a DNA sequence operatively linked to it — may be used in these vectors to express the DNA sequences of this invention. Such useful expression control sequences include, for example, the early or late promoters of SV40, CMV, vaccinia, polyoma or adenovirus, the lac system, the trp system, the TAC system, the TRC system, the LTR system, the major operator and promoter regions of phage λ, the control regions of fd coat protein, the promoter for 3-phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid phosphatase (e.g., Pho5), the promoters of the yeast α-mating factors, and other sequences known to control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and various combinations thereof. A wide variety of unicellular host cells are also useful in expressing the DNA sequences of this invention. These hosts may include well known eukaryotic and prokaryotic hosts, such as strains of E. coli, Pseiidomonas, Bacillus, Streptomyces, fungi such as yeasts, and animal cells, such as CHO, Rl.1, B-W and L-M cells, African Green Monkey kidney cells (e.g., COS 1, COS 7, BSCl, BSC40, and BMTIO), insect cells (e.g., Sf9), and human cells and plant cells in tissue culture.

It will be understood that not all vectors, expression control sequences and hosts will function equally well to express the DNA sequences of this invention. Neither will all hosts function equally well with the same expression system. Hovvever, one skilled in the art will be able to select the proper vectors, expression control sequences, and hosts without undue experimentation to accomplish the desired expression without departing from the scope of this invention. For example, in selecting a vector, the host must be considered because the vector must function in it. The vector's copy number, the ability to control that copy number, and the expression of any other proteins encoded by the vector, such as antibiotic markers, will also be considered.

In selecting an expression control sequence, a variety of factors will normally be considered. These include, for example, the relative strength of the system, its controllability, and its compatibility with the particular DNA sequence or gene to be expressed, particularly with regard to potential secondary structures. Suitable unicellular hosts will be selected by consideration of, e.g., their compatibility with the chosen vector, their secretion characteristics, their ability to fold proteins correctly, and their fermentation requirements, as well as the toxicity to the host of the product encoded by the DNA sequences to be expressed, and the ease of purification of the expression products.

Considering these and other factors a person skilled in the art will be able to construct a variety of vector/expression control sequence/host combinations that will express the DNA sequences of this invention on fermentation or in large-scale animal culture.

Transcriptional and translational control sequences are DNA regulatory sequences, such as promoters, enhancers, polyadenylation signals, terminators, and the like, that provide for the expression of a coding sequence in a host cell. A "promoter sequence" is a DNA regulatory region capable of binding RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. For purposes of defining the present invention, the promoter sequence is bounded at its 3' terminus by the transcription initiation site and extends upstream (51 direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter sequence will be found a transcription initiation site (conveniently defined by mapping with nuclease Sl), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgarno sequences in addition to the -10 and -35 consensus sequences. An "expression control sequence" is a DNA sequence that controls and regulates the transcription and translation of another DNA sequence. A coding sequence is "under the control" of transcriptional and translational control sequences in a cell when RNA polymerase transcribes the coding sequence into mRNA, which is then translated into the protein encoded by the coding sequence.

A "signal sequence" can be included before the coding sequence. This sequence encodes a signal peptide, N-terminal to the polypeptide, that communicates to the host cell to direct the polypeptide to the cell surface or secrete the polypeptide into the media, and this signal peptide is clipped off by the host cell before the protein leaves the cell. Signal sequences can be found associated with a variety of proteins native to prokaryotes and eukaryotes.

The term "oligonucleotide," as used generally herein, such as in referring to probes prepared and used in the present invention, is defined as a molecule comprised of two or more ribonucleotides, preferably more than three. Its exact size will depend upon many factors, which, in turn, depend upon the ultimate function and use of the oligonucleotide.

The term "primer" as used herein refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product, which is complementary to a nucleic acid strand, is induced, i.e., in the presence of nucleotides and an inducing agent such as a DNA polymerase and at a suitable temperature and pH. The primer may be either single-stranded or double-stranded and must be sufficiently long to prime the synthesis of the desired extension product in the presence of the inducing agent. The exact length of the primer will depend upon many factors, including temperature, source of primer and use of the method. For example, for diagnostic applications, depending on the complexity of the target sequence, the oligonucleotide primer typically contains 15-25 or more nucleotides, although it may contain fewer nucleotides. The primers herein are selected to be "substantially" complementary to different strands of a particular target DNA sequence. This means that the primers must be sufficiently complementary to hybridize with their respective strands. Therefore, the primer sequence need not reflect the exact sequence of the template. For example, a non-complementary nucleotide fragment may be attached to the 5' end of the primer, with the remainder of the primer sequence being complementary to the strand. Alternatively, non-complementary bases or longer sequences can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the sequence of the strand to hybridize therewith and thereby form the template for the synthesis of the extension product.

As used herein, the terms "restriction endonucieases" and "restriction enzymes" refer to bacterial enzymes, each of which cut double-stranded DNA at or near a specific nucleotide sequence.

A cell has been "transformed" by exogenous or heterologous DMA when such DNA has been introduced inside the cell. The transforming DNA may or may not be integrated (covalently linked) into chromosomal DNA making up the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transforming DNA may be maintained on an episomal element such as a plasmid. With respect to eukaryotic cells, a stably transformed cell is one in which the transforming DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cells containing the transforming DNA. A "clone" is a population of cells derived from a single cell or common ancestor by mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro for many generations.

Two DNA sequences are "substantially homologous" when at least about 75% (preferably at least about 80%, and most preferably at least about 90 or 95%) of the nucleotides match over the defined length of the DNA sequences. Sequences that are substantially homologous can be identified by comparing the sequences using standard software available in sequence data banks, or in a Southern hybridization experiment under, for example, stringent conditions as defined for that particular system. Defining appropriate hybridization conditions is within the skill of the art. See, e.g., Maniatis et a!., supra; DNA Cloning, VoIs. I & U, supra: Nucleic Acid Hybridization, supra. "Degenerate to" is meant that a different three-letter codon is used to specify a particular amino acid. It is well known in the art that the following codons can be used interchangeably to code for each specific amino acid:

Pheny IaI an i ne (Phe or F) UUU or UUC

Leucine (Leu or L) UUA or UUG or CUU or CUC or CUA or CUG Isoleucine (lie or I) AUU or AUC or AUA

Methionine (Met or M) AUG

VaI ine (VaI or V) GUU or GUC of GUA or GUG

Serine (Ser or S) UCU or UCC or UCA or UCG or AGU or AGC

Proline (Pro or P) CCU or CCC or CCA or CCG

Threonine (Thr or T) ACU or ACC or ACA or ACG

Alanine (Ala or A) GCU or GCG or GCA or GCG

Tyrosine (Tyr or Y) UAU or UAC Histidine (His or H) CAU or CAC

Glutamine (GIn or Q) CAA or CAG

Asparagine (Asn or N) AAU or AAC

Lysine (Lys or K) AAA or AAG

Aspartic Acid (Asp or D) GAU or GAC Glutamic Acid (GIu or E) GAA or GAG

Cysteine (Cys or C) UGU or UGC

Arginine (Arg or R) CGU or CGC or CGA or CGG or AGA or AGG

Glycine (GIy or G) GGU or GGC or GGA or GGG

Tryptophan (Trp or W) UGG Termination codon UAA (ochre) or UAG (amber) or UGA (opal)

It should be understood that the codons specified above are for RNA sequences. The corresponding codons for DNA have a T substituted for U.

Mutations can be made Ln the nucleotide sequence encoding SEQ.ID.NO: 1 or SEQ.ED.NO:2 or other sequences described herein, such that a particular codon is changed to a codon which codes for a different amino acid. Such a mutation is generally made by making the fewest nucleotide changes possible. A substitution mutation of this sort can be made to change an amino acid in the resulting protein in a non-conservative manner (i.e., by changing the codon from an amino acid belonging to a grouping of amino acids having a particular size or characteristic to an amino acid belonging to another grouping) or in a conservative manner (i.e., by changing the codon from an amino acid belonging to a grouping of amino acids having a particular size or characteristic to an amino acid belonging to the same grouping). Such a conservative change generally leads to less change in the structure and function of the resulting protein. A non-conservative change is more likely to alter the structure, activity or function of the resulting protein. The present invention should be considered to include sequences containing conservative changes which do not significantly alter the activity or binding characteristics of the resulting protein. The following is one example of various groupings of amino acids:

Amino acids with nonpolar R groups Alanine Valine Leucine Isoleucine Proline Phenylalanine Tryptophan Methionine

Amino acids with uncharged polar R groups

Glycine

Serine Threonine

Cysteine

Tyrosine

Asparagine

GIutamine

Amino acids with charged polar R groups (negatively charged at ph 6.0)

Aspartic acid Glutamic acid

Basic amino acids (positively charged at pH 6.0)

Lysine Arginine Histidine (at pH 6.0)

Another grouping may be those amino acids with phenyl groups:

Phenylalanine Tryptophan

Tyrosine

Another grouping may be according to molecular weight (i.e., size of R groups): Glycine 75

Alanine 89

Serine 105

Proline 115

Valine 117 Threonine 119

Cysteine 121

Leucine 131

Isoleucine 131

Asparagine 132 Aspartic acid 133 Glutamine 146

Lysine 146

Glutamic acid 147

Methionine 149

Histidine (at pH 6.0) 155

Phenylalanine 165

Arginine 174

Tyrosine 181

Tryptophan 204

Particularly preferred substitutions are:

- Lys for Arg and vice versa such that a positive charge may be maintained;

- GIu for Asp and vice versa such that a negative charge may be maintained;

- Ser for Thr such that a free -OH can be maintained; and - GIn for Asn such that a free NH2 can be maintained.

Two amino acid sequences are "substantially homologous" when at least about 70% of the amino acid residues (preferably at least about 80%, and most preferably at least about 90 or 95%) are identical, or represent conservative substitutions. A "heterologous" region of the DNA construct is an identifiable segment of

DNA within a larger DNA molecule that is not found in association with the larger molecule in nature. Thus, when the heterologous region encodes a mammalian gene, the gene will usually be flanked by DNA that does not flank the mammalian genomic DNA in the genome of the source organism. Another example of a heterologous coding sequence is a construct where the coding sequence itself is not found in nature (e.g., a cDNA where the genomic coding sequence contains introns, or synthetic sequences having codons different than the native gene). Allelic variations or naturally-occurring mutational events do not give rise to a heterologous region of DNA as defined herein. A DNA sequence is "operatively linked" to an expression control sequence when the expression control sequence controls and regulates the transcription and translation of that DNA sequence. The term "operatively linked" includes having an appropriate start signal (e.g., ATG) in front of the DNA sequence to be expressed and maintaining the correct reading frame to permit expression of the DNA sequence under the control of the expression control sequence and production of the desired product encoded by the DNA sequence. If a gene that one desires to insert into a recombinant DNA molecule does not contain an appropriate start signal, such a start signal can be inserted in front of the gene.

The term "standard hybridization conditions" refers to salt and temperature conditions substantially equivalent to 5 x SSC and 65°C for both hybridization and wash. However, one skilled in the art will appreciate that such "standard hybridization conditions" are dependent on particular conditions including the concentration of sodium and magnesium in the buffer, nucleotide sequence length and concentration, percent mismatch, percent formamide, and the like. Also important in the determination of "standard hybridization conditions" is whether the two sequences hybridizing are RNA-RNA, DNA-DNA or RNA-DNA. Such standard hybridization conditions are easily determined by one skilled in the art according to well known formulae, wherein hybridization is typically 10-20 C below the predicted or determined T17, with washes of higher stringency, if desired.

Media useful for the preparation of these compositions are both well-known in the art and commercially available and include synthetic culture media, inbred mice and the like. An exemplary synthetic medium is Dulbecco's minimal essential medium (DMEM; Dulbecco et al., Virol. 8:396 (1959)) supplemented with 4.5 gm/1 glucose and 20 mm glutamine.

It is contemplated that the proteins, peptides, nucleic acids, vectors and virus particles of this invention can be administered to a subject to impart a therapeutic or beneficial effect. Therefore, the proteins, peptides, nucleic acids, vectors and particles of this invention can be present in a pharmaceutically acceptable carrier. "Pharmaceutically acceptable" means that a material that is not biologically or otherwise undesirable, i.e., the material may be administered to a subject, along with the nucleic acid or vector of this invention, without causing any undesirable biological effects or interacting in a deleterious manner with any of the other components of the pharmaceutical composition in which it is contained. The carrier would naturally be selected to minimize any degradation of the active ingredient and to minimize any adverse side effects in the subject, as would be well known to one of skill in the art (see, e.g., Remington's Pharmaceutical Science; latest edition).

Pharmaceutical formulations of this invention, such as vaccines, of the present invention can comprise an immunogenic amount of the virus particles as disclosed herein in combination with a pharmaceutically acceptable carrier. An "immunogenic amount" is an amount of the virus particles sufficient to evoke an immune response (humoral and/or cellular immune response) in the subject to which the pharmaceutical formulation is administered. Exemplary pharmaceutically acceptable carriers include, but are not limited to, sterile pyrogen-free water and sterile pyrogen-free physiological saline solution.

Pharmaceutical formulations for the present invention can include those suitable for parenteral (e.g., subcutaneous, intradermal, intramuscular, intravenous and intraarticular) administration. Alternatively, pharmaceutical formulations of the present invention may be suitable for administration to the mucous membranes of a subject (e.g., intranasal administration). The formulations may be conveniently prepared in unit dosage form and may be prepared by any of the methods well known in the art. Thus, the present invention provides a method for delivering nucleic acids and vectors (e.g., virus particles) encoding the proteins of this invention to a cell, comprising administering the nucleic acids or vectors to a cell under conditions whereby the nucleic acids are expressed, thereby delivering the proteins of this invention to the cell. The nucleic acids can be delivered as naked DNA or in a vector (which can be a viral vector) or other delivery vehicles and can be delivered to cells in vivo and/or ex vivo by a variety of mechanisms well known in the art (e.g., uptake of naked DNA, viral infection, liposome fusion, endocytosis and the like). The cell can be any cell which can take up and express exogenous nucleic acids.

As used herein, "pM" means picomolar, "nM" means nanmolar, "uM" means micromolar, "mM" means millimolar, "ul" means microliter, "ml" means milliliter, "I" means liter.

As used herein, the term "synthetic amino acid" means an amino acid which is chemically synthesized and is not one of the 20 amino acids naturally occurring in nature. As used herein, the terms "non-natural amino acid" and "unnatural amino acid" means an amino acid, which is not one of the 20 amino acids naturally occurring in nature. Thus, a synthetic amino acid is an unnatural amino acid.

As used herein, the term "biosynthetic amino acid" means an amino acid found in nature other than the 20 amino acids commonly described and understood in the art as "natural amino acids." Examples of "non-amide isosteres" include but are not limited to secondary amine, ketone, carbon-carbon, thioether, and ether moieties.

As used herein, the term "non-natural peptide analog" means a variant peptide comprising a synthetic amino acid. As used herein, "TSIMR" means nuclear magnetic resonance, "ESMS" means electrospray mass spectrometry; "CBD" means chitin binding, domain; "SH2" means src homology type-2 domain; "AbI" means human Abelson protein tyrosine kinase, "GST" means glutathione S-transferase; "HSQC" means heteronuclear single-quantum correlation spectroscopy. "HPLX" means high pressure liquid chromatography; "PhSH" means thiophenol, "BzISH" means benzyl mercaptan; standard single and triple letter codes for amino acids, and single letter codes for nucleic acids are used throughout.

A "segment" as the term is used herein, consists of a portion of a protein or peptide primary amino acid sequence. Such a segment as used herein may be generated by proteolytic cleavage, chemical cleavage or physical disruption. Alternatively, such a segment may be generated by an expression vector or by an in vitro translation of an RNA transcript or portion thereof. Such a segment may assume a structural conformation or folding pattern which is unique to the segment or which represents the conformation of the segment in the complete protein or peptide.

A "domain" as used herein, is a portion of a protein that has a tertiary structure. The domain may be connected to other domains in the complete protein by short flexible regions of polypeptide. Alternatively, the domain may represent a functional portion of the protein.

As used herein, amino acid residues are preferred to be in the "L" isomeric form. However, residues in the "D" isomeric form can be substituted for any L-amino acid residue, as long as the desired functional property of immunoglobuiin-binding is retained by the polypeptide. NH2 refers to the free amino group present at the amino terminus of a polypeptide. COOH refers to the free carboxy group present at the carboxy terminus of a polypeptide. Abbreviations for amino acid residues are used in keeping with standard polypeptide nomenclature delineated in J. Biol. Chem., 243:3552-59 (1969). It should be noted that all amino-acid residue sequences are represented herein by formulae whose left and right orientation is in the conventional direction of amino- terminus to carboxy-terminus. Furthermore, it should be noted that a dash at the beginning or end of an amino acid residue sequence indicates a peptide bond to a further sequence of one or more amino-acid residues. Amino acids with nonpolar R groups include: Alanine, Valine, Leucine,

Isoleucine, Proline, Phenylalanine, Tryptophan and Methionine. Amino acids with uncharged polar R groups include: Glycine, Serine, Threonine, Cysteine, Tyrosine, Asparagine and Glutamine. Amino acids with charged polar R groups (negatively charged at pH 6.0) include: Aspartic acid and Glutamic acid. Basic amino acids (positively charged at pH 6.0) include: Lysine. Arginine and Histidine (at pH 6.0). Amino acids with phenyl groups include: Phenylalanine, Tryptophan and Tyrosine. Particularly preferred substitutions are: Lys for Arg and vice versa such that a positive charge may be maintained; GIu for Asp and vice versa such that a negative charge may be maintained; Ser for Thr such that a free -OH can be maintained; and GIn for Asn such that a free NH2 can be maintained. Amino acids can be in the "D" or "L" configuration. Use of peptidomimetics may involve the incorporation of a non-amino acid residue with non-amide linkages at a given position.

Amino acid substitutions may also be introduced to substitute an amino acid with a particularly preferable property. For example, a Cys may be introduced as a potential site for disulfide bridges with another Cys. A His may be introduced as a particularly "catalytic" site (i.e., His can act as an acid or base and is the most common amino acid in biochemical catalysis). Pro may be introduced because of its particularly planar structure, which induces β-turns in the protein's structure. The detectable marker labels most commonly employed for these studies are radioactive elements, enzymes, chemicals which fluoresce when exposed to ultraviolet light, and others.

A number of fluorescent materials are known and can be utilized as labels. These include, for example, fluorescein, rhodamine, auramine, Texas Red, AMCA blue and Lucifer Yellow. A particular detecting material is anti-rabbit antibody prepared in goats and conjugated with fluorescein through an isothiocyanate.

The proteins and peptides of the present invention can also be labeled with a radioactive element or with an enzyme. The radioactive label can be detected by any of the currently available counting procedures. The preferred isotope may be selected from 3H, '3C, 15N, 14C, 32P, 35S, 35Cl, 51Cr, 57Co, 58Co, 59Fe, 90Y, 123I, 131I1 and 186Re.

Enzyme labels are likewise useful, and can be detected by any of the presently utilized calorimetric, spectrophotometric, fluorospectrophotometric, amperometric or gasometric techniques. The enzyme is conjugated to the selected particle by reaction with bridging molecules such as carbodiimides, diisocyanates, glutaraldehyde and the like. Many enzymes which can be used in these procedures are known and can be utilized. The preferred are peroxidase, β-glucuronidase, β-D-glucosidase, β-D-galactosidase, urease, glucose oxidase plus peroxidase and alkaline phosphatase. U.S. Patent Nos. 3,654,090; 3,850,752; and 4,016,043 are referred to by way of example for their disclosure of alternate labeling material and methods.

A basic description of nucleic acid amplification or PCR (polymerase chain reaction) is described in Mullis, U.S. Patent No. 4,683,202, which is incorporated herein by reference. The amplification reaction uses a template nucleic acid contained in a sample, two primer sequences and inducing agents. The extension product of one primer when hybridized to the second primer becomes a template for the production of a complementary extension product and vice versa, and the process is repeated as often as is necessary to produce a detectable amount of the sequence.

The inducing agent may be any compound or system which will function to accomplish the synthesis of primer extension products, including enzymes. Suitable enzymes for this purpose include, for example, E.coli DNA polymerase I, thermostable Tag DNA polymerase, Klenow fragment of E.coli DNA polymerase I,

T4 DNA polymerase, other available DNA polymerases, reverse transcriptase and other enzymes which will facilitate combination of the nucleotides in the proper manner to form amplification products. The oligonucleotide primers can be synthesized by automated instruments sold by a variety of manufacturers or can be commercially prepared based upon the nucleic acid sequence of this invention.

As used herein, the term "chip" means any solid support including, but not limited to silicon, glass, polypropylene, polystyrene, cellulose, plastic and paper. Accordingly, the term "protein chip" means a protein covalently bound to a solid support including, but not limited to silicon, glass, polypropylene, polystyrene, cellulose, plastic and paper. The "protein" component of a protein chip as used herein is the ligation product of an oligopeptide and a recombinantly expressed protein or portion thereof, the peptide being the component covalently bound to the solid support. Additionally, as used herein, the term "antibody chip" means an antibody or the antigen-binding portion thereof covalently bound to a solid support as the ligation product of an oligopeptide and a recombinantly expressed antibody protein or portion thereof, the peptide being the component covalently bound to the solid support. Furthermore, as used herein, the term "antigen chip" means an antigen covalently bound to a solid support as the ligation product of an oligopeptide and a recombinantly expressed antigenic protein or portion thereof, the peptide being the component covalently bound to the solid support. Moreover, the term "protein chip protein" refers to the protein component of the protein chip which is the ligation product produced by the methods disclosed by the present invention. EXAMPLES

Material and Methods

Expression vector and mutation construction

The RT coding DNA from the Q258C-RT construct (Sarafianos et al, 2003) was ligation independent cloned into pCDF-2 Ek/LIC with the LlC Duet™ Minimal

Adaptor (Novagen) according to manufacturer's recommendations. The termini of the p66 insert (ORF-2) contained the restrictions sites for the enzymes Ndel and Xhol while the termini of the p51 insert (ORF-l)contained Ncol and Sad restriction sites(New England Biolabs). To remove any residues added to the expressed protein by the vector, Ndel and Ncol were used to remove all DNA between the start codon and the insert. The RT encoding dual expression vector is call pRTl .

Mutagenesis was completed using methylated overlap extension ligation independent cloning (MOE-LlC). The following methylated primers were used: revcasLIC3-GCCCGAAGAGGAGC[2OMeG]CCGGTTTCTTTACCAGACTCGAG (SEQ ID NO: 122); forvectLlC3-CTCCTCTTCGGGC[2'OMeC]CGCCAGCACATGGACTCG (SEQ ID

NO: 123);

RevvectLlC2-GGAGAAAGCCC[2'OMeG]GGTATGGCATGATAGCGCC (SEQ ID

NO: 124); orf2vrevLIC2-ACGCGGGCGGCCG[2'OMeU]GGATCCTTACGCCCCGC (SEQ ID

NO: 125);

ForcesLIC-CGGGCTTTCTCCT(2'OMeC)CTCTCCCTTATGCGACTCC (SEQ ID

NO: 126); orfcasforLIC-CGGCCGCCCGCGTG(2'OMeG)TTGATCTCGATCCCGCG (SEQ ID NO: 127); orf2vrevLIC-CACGCGGGCGGCCG(2'OMeT)GGATCCCCCCGGGTCC (SEQ ID

NO: 128).

See Figure 1 for the location and pairing of these primers on pRTl .

To minimize false positive colonies the vector was restriction digested with the appropriate restriction enzymes to remove the ORF protein coding DNA that was to be replaced (Ncol and Sacl for ORF-I or Ndel and Xhol for ORF-2). For ORF-2 5 μl of vector (250 ng/μl) were digested in a 20 μl volume with 1 μl Ndel (20,000 units/ml) and 1 μl Xhol (20,000 units/ml) for one hour at 37°C with NEBuffer2 (New England Biolabs). For p66, mutagenesis overlap extension PCR was performed using mutated overlap segments with the 2'-O-methylated primers orfZvrevLIC and revcasLIC3 to amplify the flill insert with PfuUltra™ π Fusion HS DNA Polymerase (Stratagene). A typical overlap extension PCR was performed with 1 μl of each template, 1 μl (20 pmols) of each primer, 39 μl water, 1 μl (25 mM each) dNTPs, 5 μl 1 OX PfuUltra buffer, and 1 μl PfuUltra™ Il Fusion HS DNA Polymerase (Stratagene). The PCR program is listed: 3 minutes at 95°C; followed by 5 cycles of 1 minute at 95°C, 1 minute at 50°C, and 30 seconds at 72°C; 30 cycles of 30 seconds at 95°C, 30 seconds at 53°C, and 45 seconds at 72°C; ending with a final extension step of 10 minutes at 72°C.

In a separate reaction tube, the digested vector was PCR amplified with oligonucleotides or£2vrevLIC2 and forvectL!C3. The vector PCR was performed with 0.5 μl of the template (50 ng digested vector), 1 μl (20 pmols) of each primer, 40.5 μl water, 1 μl (25 mM each) dNTPs, 5 μl 1OX PfuUltra buffer, and 1 μl PfuUltra™ II Fusion HS DNA Polymerase (Stratagene). The PCR program is listed: 3 minutes at 95°C; followed by 30 cycles of 30 seconds at 95°C, 30 seconds at 54°C, and 90 seconds at 72°C; ending with a final extension step of 10 minutes at 72°C.

The PCR products were then gel purified from a 0.5 or 1% agarose gel using the QIAquick gel extraction kit (see Appendix). The concentrations were determined by UV absorbance and 0.04 pmols of vector and insert were mixed at a 1 :1 insert to vector molar ratio in a buffer containing 25 mM Tris pH 8.0, 5 mM MgCl2, 0.025 mg/ml BSA, and 2.5 mM DTT in a 20 μl volume. The mixture was heated to 70°C and cooled slowly over two hours in a water bath. Once cooled to ~40°C, 1 μl of 25 mM EDTA was added and the mixture incubated at room temperature for 5 minutes before being desalted by Centri-Sep column (Princeton Separations) or ethanol precipitation (Donahue βt al, 2002). 5 μl of desalted annealed DNA was added to electrocompetent NovaBlue cells (Novagen) and electroporated according to manufacturer's recommendations. The resulting colonies were tested by colony PCR and miniprepped using QIAprep Spin Miniprep kit (Qiagen) according to manufacturer's recommendations. Expression and purification of HRV14 3C protease

HRV 14 3C protease was expressed in BL2) ~CodonPlus®-RIL competent cells and grown on Luria-broth (LB) agar plates containing 35 mg/liter streptomycin and 0.1% glucose. A single colony was grown overnight in LB + 35 mg/liter streptomycin and 0.5% glucose at 37°C with shaking. The overnight culture was then inoculated in a 100-fold dilution, and the solution was incubated at 37°C with shaking. Typical LB volume was 0.5 - 1.0 liters. When an OD600 of 0.9 was reached, the cells were cooled to room temperature and induced with 0.5 mM isopropyl β-D-1- thiogalactopyranoside and incubated for 17 hours at 17°C prior to pelleting and storage at -80°C. Nickel column purification was performed exactly as described for RT. Concentration is measured using the Bio-Rad Protein Assay (Bio-Rad) and the protein is stored at 1 mg/ml in a 50% glycerol solution.

Expression and purification of RT

Plasmids were transformed into BL21-CodonPlus®-RIL competent cells and grown on LB-agar plates containing 35 mg/liter streptomycin and 0.1% glucose. A single colony was grown overnight in LB + 35 mg/liter streptomycin and 0.5% glucose at 37°C with shaking. The overnight culture was then inoculated in a 100 fold dilution, and the solution was incubated at 37°C with shaking. Typical LB volume was 250 ml to 4 liters. When an OD600 of 0.9 was reached, the cells were induced with 1 mM IPTG and incubated for three hours prior to pelleting and storage at -80°C.

Nickel column purification was performed according to the manufacturer's recommendations (Qiagen) with the following modifications: no lysozyme was added to the lysate, 600 mM NaCl instead of 300 mM was used in each of the standard buffers, 0.1% Triton X-I OO was added to the lysate and wash buffers, and an extra high-salt wash step was performed with 1.2 M NaCl. Following elution the yield of RT was checked by OD280 (OD280/3. I x dilution factor)and a 1 : 100 by weight ratio of HRV 14 3C protease to RT is added. The protease treated solution was incubated at 4°C overnight. The solution was buffer exchanged 20-fold into buffer A (50 mM diethanolamine pH 8.9) using an Amicon Ultra-15 Centrifugal unit with an Ultracell- 30 membrane (Millipore). The solution was filtered with a 0.22 micron filter, and 10- 20 mg was loaded onto a monoQ column (Amersham Biosciences) equilibrated with buffer A. The column was washed with buffer A and the samples eluted during a one hour gradient from 0 to 25% buffer B (buffer A + 1 M NaCI) with a flow rate of 4 ml/minute. The monoQ column purification step effectively removed the protease. The RT was buffer exchanged and concentrated to 20 mg/ml in 10 mM Tris pH 8.0 and 75 mM NaCl. The concentrated RT was aliquoted and stored at -80°C or 4°C for immediate crystallization.

Crystallization

The RT was screened unliganded, with a 2.5-fold molar excess of NNRTl, or with a 5-fold molar excess of RNHI using the hanging drop vapor diffusion method (0.17 mM RT with 0.425 mM NNRTI or 0.85 mM RNHl). Depending on the number of samples being screened, EasyXtal DG-Tools (Qiagen) or Linbro Plates (Hampton Research) crystallization trays were used for screening. Drop size for initial screening was 1 μl protein plus 1 μl resevoir solution. The well contained 500 μl of solution for Linbro Plates and 750 μl for EasyXtai DG-Tools. For the initial screening a RT- reference screen was used. Based on visually identified crystal hits, further optimization was used. RT52A crystals were produced in a matrix of 24 conditions from 9-12 % PEG 8000, 50 mM imidazole pH 6.0-6.8, 10 mM spermine, 15 mM MgSO,), and 100 mM ammonium sulphate. The origin and age of PEG 8000 used was found to be very important. PEGs will react to light and oxygen, which results in small PEG products and a drop in pH. The change in pH can have a dramatic effect on the final pH of the crystallization solution (e.g. pH of 6.5 to 6.0). With new lots of PEG 8000, 4% PEG 400 was used as an additive with an appropriate decrease in the pH of buffer was used to reproduce crystals with old lots of PEG 8000. In addition to the matrix, the Additive Screen (Hampton Research) was used with the base solution 11 % PEG 8000, 50 mM imidazole pH 6.4, 10 mM spermine, 15 mM MgSO4, and 100 mM ammonium sulfate. All successful crystallization experiments were performed at 4°C with protection from light and temperature fluctuations by placement inside a Styrofoam box.

Drops that did not contain crystals after three days were microseeded with crushed RT52AMNRTI crystals. Microseeding was performed by crushing several preferably otherwise unusable RT52A/NNRTI crystals on a glass plate followed by use of the Seed Bead kit (Hampton Research) according to manufacturer's recommendations. Total volume of well solution used was typically 100 μl. Several dilutions of this seed stock were made and tested on a subset of the crystallization drops that were to be seeded. A 30-gauge needle was used to streak seed the drops. The seed solutions were stored overnight in a drawer at 4°C and the seeded drops were checked after 24 hours. Based on the number of crystals in the seeded drops, further seeding was performed. After all the drops were seeded the seed stock was stored for future use at -80°C.

NNRTI/RT52A crystals were found to be very stable with no loss in X-ray diffraction quality after four months at 4°C. Unliganded and RNHI/RT52A crystals however were found to deteriorate both in X-ray diffraction and visual qualities with in one week of appearing. RT69A/RNHI crystals were stable for weeks.

Data collection

Crystals of RT52A were flash-cooled by immersion into liquid nitrogen after briefly dunking the crystal into cryoprotection solution containing well solution plus 27% ethylene glycol and the inhibitor at the same concentration as in the hanging drop. Best results were found when using MicroMounts (MiTeGen) for mounting the crystals. Data for screening and data set collection were obtained at Advanced Photon Source (APS) at Argonne National Laboratory (ANL), SER-CAT beamline 191D, Cornell High Energy Synchrotron Source (CHESS) Fl and Al beamlines, and National Synchrotron Light Source (NSLS) beamlines X25 and X29. The diffraction data were indexed, processed, scaled and merged using HKL2000I (Otwinowski et ai, 1999). The resolution of the data was estimated using the last resolution shell with preferred values for completeness, R-merge, and the ratio of I to σ(I).

Dynamic light scattering

Samples were tested using the DynaPro-MS800 dynamic light scattering/molecular sizing instrument (Protein Solutions, Inc.). 20 μl of 1 mg/ml RT in 10 mM Tris pH 8.0 and 75 mM NaCl was tested after centrifugation at 14.000 g for 2 minutes. Each experiment consisted of no less then 25 measurements. Data analyses were performed using DynaPro Instrument Control Software for Molecular Research DYNAMICS (version 5.26.60).

CD spectroscopy RT samples were diluted in 2 mM HEPES (pH 8.2) and 75 mM NaCl to a final concentration of 0.12 mg/ml (0.5 ml total volume) and centrifuged at 15,00Og for 2 minutes before measurements. CD spectra were recorded before and after melt from 200 to 260 nm on an AVTV Circular Dichroism Spectrometer Model 215. The thermal stability assay was performed starting at 4°C and increased in 0.2°C increments to 70°C with 5 second measurements at 222 nm taken at each increment.

RT activity assays

DNA-dependent DNA polymerase (DDDP) processivity assay

The DDDP processivity assay was done by Paul Boyer (laboratory of Stephen Hughes, NCI-Frederick Cancer Research and Development Center, Frederick, Maryland)as previously described (Boyer et α/., 2002). The WT HTV-I RT was produced as described previously (Boyer et al, 1999). The primer-47 (New England Biolabs) was 5'-end labeled and thenannealed to single-strand M13mpl8 DNA (New England Biolabs). The final concentration of template-primer(T/P) in each reaction mixture was approximately 2.5 nM; theRT was in molar excess (85 nM). The cold trap, poly(rC) oligo(dG), was added in excess relative to RT (300 nM) afterthe RT was allowed to bind to the labeled T/P. The extensionproducts were suspended in 2x gel loading buffer (Ambion) andheated at 65°C to denature the samples. A 15 hour electrophoresis of loaded samples was performed on an alkaline agarose gel (Sambrook et ai, 1989). The products werevisualized by exposure to X-ray film.

RNase H assay Activity measurements were done by Paul Boyer (NCI-Frederick Cancer Research and Development Center, Frederick, Maryland)as in Boyer et a!., (2004). Briefly, The RNA oligonucleotides were 5'-end labeled and then annealed to synthetic DNA oligonucleotides by heating and slow cooling. A 0.2 μM concentration of T/P was suspended in a total reaction volume of 12 μl containing 25 niM Tris (pH 8.0), 50 mM NaCI, 5.0 mM MgC12, 100 μg of bovine serum albumin/ml, 10 raM CHAPS, and 1 U of Superasin (Ambion)/μl. The reactions were initiated by the addition of the 75 ng of the indicated RT and were incubated at 37°C. Aliquots were removed at the indicated time points, and the reaction was halted by addition of 2x gel loading buffer. The reaction products were fractionated on a 15% polyacrylamide sequencing gel. Products were visualized by exposure to X-ray film. Expression/purification/crystallization

RT52A was expressed and purified as described. The NNlBP mutants were produced by site-directed mutagenesis of RT52A as described. The NNIBP mutants were expressed and purified in the same manner as RT52A. Crystallization was performed using the hanging drop vapor diffusion method. RT52A/TMC278 was crystallized by adding 1 μl of RT52A7TMC278 complex at 20 mg/ml to an equal volume of well solution (11% PEG 8000, 15 mM MgSO4, 10 mM spermine, 100 mM ammonium sulfate, 50 mM imidazole, pH 6.8, and 60 mM sodium formate). LlOOI- K103N/TMC278 was crystallized by adding 1 μl of Ll OOI-Kl 03N/TMC278 complex at 20 mg/ml to an equal volume of well solution (8% PEG 8000, 15 mM MgSO4, 10 mM spermine, 100 mM ammonium sulfate, and 50 mM imidazole, pH 6.2). Kl 03N- Y181C/TMC278 was crystallized by adding 1 μl of K103N-Y181C /TMC278 complex at 20 mg/ml to an equal volume of well solution (12% PEG 8000, 15 mM MgSO4, 10 mM spermine, 100 mM ammonium sulfate, and 50 mM sodium citrate, pH 5.0).

Data collection and structure solution

Crystals of RT52A were flash-cooled by immersion into liquid nitrogen after briefly dunking the crystal into cryoprotection solution containing well solution plus 27% ethylene glycol and the inhibitor at the same concentration as in the hanging drop. X-ray diffraction data were collected at the Cornell High Energy Synchrotron

Source (CHESS) Fl, and Advanced Photon Source (APS) at Argonne National Laboratory, SER-CAT beamline 191D. The diffraction data were indexed, processed, and scaled using HKL2000 (Otwinowski et al, 1999). Structure determination was performed by Kalyan Das. The previously reported HlV-I RT/R129385 (PDB code 1 S9E) structure was used as a template in obtaining molecular replacement solutions for the RT52A/TMC278 complex structure. Rigid body refinement of the initial model, broken into 13 separate segments (one fragment per subdomain except for two fragments each from p66 and p51 fingers and palm) reduced the starting R-factors by about 4-5%, indicating significant interdomain rearrangments in the RT52A/TMC278 structure compared to that in the starting model. The final model for the complex was obtained after cycles of modei building in O (Jones et al., 1991), COOT (Emsley and Cowtan, 2004), and restrained refinements using CNS 1.1 (Brunger et al., 1998) and REFMAC (Murshudov et al., 1997). Similar molecular replacement and refinement steps were used in obtaining the remaining two structures in which the RT52A/TMC278 structure was used as the template. The X-ray data, refinement statistics, and unit eel] statistics are listed in Table 7.

Gel Purification Protocol Gel purification was found to give the highest yields when done with QIAquick Gel Extraction Kit (Qiagen). A modified protocol based on the manufacturer's recommendations: (1) Excise the DMA from the agarose gel with a clean, sharp razor. Minimize the size of the slice; (2) Weigh the gel slice in a colorless tube. Add 3 volumes (where volume in ml = mass in mg, example: 100 mg of gel is 100 μl of buffer) of Buffer QG if the gel slice mass is more than 200 mg otherwise add 0.6 ml; (3) Incubate at 50°C for 10 minutes and vortex every 2-3 minutes. The gel slice must be completely dissolved; (4) Add 10 μl of 3 M sodium acetate, pH 5.0, and vortex. The solution would be yellow unless dye from the gel was in the slice; (5) Add 200 μl (or one gel volume) of isopropanol to the sample and vortex; (6) Place up to 750 μl of the solution into a QIAquick column. Centrifuge using a table-top centrifuge for 30 seconds. The bottom reservoir can be discarded and any additional solution can be added and centrifuged; (7) To wash add 0.75 ml of Buffer PE to the column and let sit for 3 minutes. Then centrifuge for 30 seconds; (8) Discard the flow through and centrifuge for 1 minute; (9) Place the QIAquick column into a clean 1.5 ml eppendorf. Elute by adding 50 μl of Buffer EB (10 niM Tris-Cl, pH 8.5, 50°C) and letting the column sit at room temperature for one minute. A second elution will improve the yield by -30%. The second elution is done with 30 μl Buffer EB; (10) Reduce the volume of the solution by speedvacuuming the elution to the desired volume.

EXAMPLE 1: Engineering HIV-1 Reverse Transcriptase for Improved Crystallization

1. Co-expression and mutant cloning

A co-expression system was utilized to facilitate high-throughput subunit specific mutagenesis of RT. Figure 1 shows the modified pCDF vector with p51 in open reading frame (ORF) 1 and p66 in ORF-2. Both ORFs have unique restriction sites for subunit specific cloning. This expression system allowed for high yield expression (~40 mg/liter) under standard expression conditions (Figure IA). In the expectation of creating many RT mutants, a rapid and inexpensive mutagenesis system was sought. Donahue et at. (2002) proposed a ligation independent cloning technique, which uses terminator primers to create 12-15 nucleotide overhangs on the insert and vector. The insert and vector are annealed and transformed into bacteria, thereby avoiding any post-PCR enzymatic steps. The terminating residue in the primer is a 2'-O-methylated nucleotide, which causes early termination of thermostable polymerases Taq or Pfu. There are two major problems with this technique: 1. The 2'-O-methylated primers cost ~$100 per pair; and 2. The site of 2'- O-methylation has a 20% mutation rate. A modified version of the terminator primer technique was developed for rapid mutagenesis of RT called methylated overlap-extension ligation independent cloning (MOE-LIC). MOE-LIC uses overlap-extension mutagenesis (Ho et al, 1989) and terminator primers outside the ORF to avoid unwanted mutagenesis of the coding or regulatory regions. Overlap-extension PCR can also be used to insert a completely new insert or modify the termini of a previously constructed insert (Horton et al, 1989). For the co-expression system a total of four terminator primer pairs were required at a cost of approximately S400, which could be used for over a thousand reactions (Figure 1C).

2. Mutagenesis and crystallization A mutagenesis strategy to alter the crystallization of RT was developed to combine several methodologies: 1. Disrupt or enhance common crystal contacts in the known RT crystal forms; 2. Remove high B-factor patches, primarily disordered termini; 3. Reduce surface entropy by mutagenesis of lysine and glutamic acid patches to alanine; 4. Make use of the wealth of information of RT crystallization by the multiple research groups that have studied RT; and 5. Avoid mutating conserved residues. The starting template was chosen due to its successful application to DNA cross-linking studies (RTlA depicted in Figure IB). Table 3 shows the list of RT variants that were made for crystallization trials and the diffraction resolution of the crystals. Table 4 describes the 18 crystallization conditions used as a starting screen for each mutant. The 18 conditions were chosen for their successful use in previous

RT crystallization trials (Clark et al, 1995, Chan et al., 2001, Rodgers et al, 1995, Hogberg et at., 1999, and unpublished data).

Table 3. List of all RT constructs and crystallization results.

Table 4. Crystallization Trial Kit developed for mutant RTs. The first round of mutagenesis/crystallization produced constructs RTl-IO.

None of these proteins produced crystals diffracting to beyond 10 A resolution. The termini were then optimized for crystallization based on the notion that trimming the termini to residues visible in the electron density. The hexahistidine (6XHis) purification tag on the C terminus of p51 was repositioned to the N and C termini of p66 and p51 in constructs RT12-14. Expression results showed that RT13A with a N- terminal HRV14 3C cleavable 6XHis-tag gave the highest yield of monodispersed protein, as measured by dynamic light scattering. The use of HRV14 3C protease to remove the 6XHis~tag post-purification resulted in a N terminus with only an extra glycine (from the proteolytic cleavage site) compared to the natural terminus of the protein. The C terminus of p66 was terminated at residue 555 based on tandem mass spectroscopy results of RT crystals, indicating the last five residues to be proteolytically cleaved. The C terminus of p51 was truncated at 428 based on the indicated importance of this terminus in crystallization. RT13A was then used as the template in a third round of mutagenesis/crystallization.

3. Third round of mutagenesis/crystallization

Constructs RT21-35 were then produced and crystallized unliganded and complexed with the NNRTIs CL32543 and TMC278. The new termini allowed for superior diffraction in several of the third round mutants compared to the first round of mutants. TMC278 co-crystals diffracted X-rays to higher resolution with RT24A than had been achieved with any previous RT construct. The diffraction resolution reached 3.3 A but was very anisotropic and twinned which did not allow for structure determination. RT22A contains a PCR serendipitous mutation F 160S but was used for crystallization as accidental mutations have historically been a source of improved crystallization (Braig et al, 1994, Pautsch et at, 1999). Figure 2A shows the different crystal forms from the third round mutants.

A surprising result, where Q258C-RT which had been cross-linked to a RNA/DNA substrate but then had lost its substrate and crystallized unliganded with diffraction to 2.5 A resolution, gave an important clue as to how to proceed. Crystals of Q258C-RT without a crosslinked substrate had never diffracted to better than 4 A before, and the importance of the Q258C mutation was then considered. It was decided to revert residue 258 to glutamine in the fourth round of mutagenesis, which focused on RT24A, the construct that gave crystals with the highest resolution diffraction. It was also hypothesized that the anisotropy and twinning, of the RT24A/TMC278 crystal's diffraction, was originating with the TMC278's ability to wiggle and jiggle in evasion of the resistance mutations. In order to limit TMC278's flexibility in the NNRTI-binding pocket, two NNRTI resistance mutants were designed. RT51A contains mutations LlOOI and K103N while RT55A encodes K103N and Y181C. Both of the NNRTl double mutants are clinically significant, develop high resistance to NNRTIs, yet marginally resist inhibition by TMC278 with respective EC5oS 2.70 and 1.70 nM compared to 0.51 nM with wild-type RT (de Bethune el al. online poster 2005).

4. Diffraction of RT52A crystals and its derivatives RT52A, which is mutant RT24A without the Q258C mutation, was found to crystallize quickly (hours to days) and when complexed with NNRTIs could give high-resolution diffraction. Table 5 displays the data sets collected with RT52A, RT51A, and RT55A. The resolution of many of the NNRTl complexed data sets is without precedence for RT. RT52A/NNRTI crystals have symmetry of space group C2 with approximate cell parameters a=160-165, b=71 -74, c=107-114 A, G=D= 90 and D - 99-103°. This unit cell is novel when compared with all crystal structures of HIV-I RT in the Protein Data Bank (Berman et ah, 2000). Impressively, mutagenesis of the C terminus of p51 to delta 447 (which is present in IB l) alone changes the unit cell to that seen with IBl complexed with NNRTIs, but with a loss in diffraction to 2.7 A resolution (RT52B in Table 5).

Table 5. Collected X-ray diffraction datasets of RT with inhibitors.

5. Enzymatic assays of RT52A Proteins RT35A, RT51A? RT52A, and RT55A were tested for DNA- dependent DNA polymerase processivity and RNase H activity. Figure 3A shows that RT52A has similar processivity as WT HJV-I RT (RT co-expressed with HIY-I protease), with RT51A having a diminished processivity and RT55A an increase. These results show that mutations KI72A/K173A do not cause dramatic changes in the polymerase activity of RT. RT35A does not contain the lysine patch mutation and appears to have significantly increased processivity when compared to WT. The cause of this increased processivity is not clear, but it appears that the lysine patch mutation (K172A/K173A) causes a shift back to the processivity of the WT RT. Each of the mutants has similar RNase H activities (Figure 3B).

6. RT52A structure

The electron density of TMC278 with RT52A is shown in Figure 4. The crystal structures of TMC278 with and without NNRTI-resistance mutations unambiguously verifies the mechanism of inhibition when normally very effective resistance mutations are present.

7. RT52A limitations

RT52A and its derivatives were very successful for NNRTI structure determination but did not achieve the same quality of diffraction for RNase H inhibitors (RNHIs) or when unliganded. For the fifth round of mutagenesis constructs were made to test the importance of each of the changes made to produce RT52A, and constructs that gave new crystal forms in round three were updated with the C258Q reversion and tested with RNHTs as well as NNRTIs to find a superior construct for RNase H studies. Constructs RT66A-RT69A were designed based on the electron density seen in RT52A structures. The mature RT52A termini were found to all be essential for diffraction, and mutation of three residues at a crystal contact I135A/N136A/E138A were found to create a new crystal form.

Impressively, the construct RT69A produced crystals that gave high-resolution diffraction with two RNHIs while giving only 3.0 A resolution diffraction with NNRTIs (Table 5). RT69A contains the accidental mutation F 160S which is required for the crystal's improved diffraction. RT69A produces crystals within days but on average does not crystallize as quickly as RT52A. Thermal stability assays using circular dichroism have not shown significant changes in the stability of the mutants that would lead to the observed improvement in diffraction quality (Table 6). Tm is the apparent melting temperature of the protein. RT samples were diluted in 2 mM HEPES (pH 8.2) and 75 mM NaCl to a final concentration of 0.12 mg/ml (0.3 ml total volume) and centrifuged at 15,00Og for 2 minutes before measurements. The thermal stability assay was performed starting at 4°C and increased in 0.2°C increments to 70°C with 5 second measurements at 222 nm taken at each increment. The rate of temperature increase was 4.5° per 10 minutes.

Thermal melting was not reversible.

Table 6. Thermal stability as measured by circular dichroism of RT mutants.

Discussion A mutagenesis/expressioπ/purification system was created to allow for rapid testing of RT variants engineered for crystallization. The location of the 48 residues that were mutated is shown in Figure 5. The distribution of the mutations was chosen to give the greatest variation to crystallization.

The exact mechanism of the improvement in crystallization has not been fully identified. It is clear from testing various mutants with reversions of the changes to Q258C-RT constituting RT52A that each of the mutations is required. Most of the reversions either caused a loss in crystallization or diminished diffraction quality. The Q258C mutation causes a change in space group to P6222 and a loss in diffraction quality while reverting the C terminus of p5 I to delta 447 (the proteolytic cleavage site in IBl RT) causes the unit cell to be similar to that seen with IBl RT. Examination of the RT52A/NNRTI electron density shows that the crystal packing does not allow for extension of the p51 C terminus far beyond 428. The mutations K172A/K173A are near the symmetry-related p51 N terminus in the crystal lattice; however, the N terminus of p51 is disordered and probably not forming a crystal contact. The importance of the lysine patch mutation has been experimentally verified but the mechanism of its importance is not clear. Figure 6 demonstrates the dramatic change in crystal contacts from 1B1/NNRTI crystals to RT52A/NNRTI and RT69A/RNIII — a considerable increase in the number of RT regions involved in crystal contacts can be seen. The similarity in crystal contacts with RT52A and RT69A may indicate a possible mechanism for the improved crystallization. If the mutations bias a conformation without affecting the activities of the protein, then this conformation may be responsible for the improved diffraction quality. Thermal stability assays using circular dichroism have not shown significant changes in the stability of the mutants that would lead to the observed improvement in diffraction quality (Table 6).

Protein engineering for crystallization, when prior structural knowledge isn't available, is primarily based on reducing disorder. The disorder can be in the form of long side chains, non-organized termini, flexible linkers, and other regions of high thermal energy. Recombinant technology adds another tool that can be used with purification technology to increase the homogeneity of a protein sample (decrease the disorder). When prior crystal structure knowledge exists it is possible to add an additional form of engineering in which known crystal contacts are enhanced or disrupted. Crystal contact disruption by mutagenesis can be shown through this and other work to be a very powerful technique for finding new crystal forms (Camara- Artigas et al, 2001, Charron et al, 2002, Honegger el al, 2005, Johnson et al, 2003, and Oubridge et al, 1995). Figure 7 summarizes the X-ray diffraction resolution of crystals for each of the mutants tested.

A flexible protein like RT exists in many conformations in solution and can become relatively homogenous by the addition of a ligand, which may favor the stability of a single conformation or a subset of conformations. Different types of ligands induce different RT conformations and therefore different crystal forms. In the process of protein engineering for crystallization, it became clear that the engineering must be ligand type specific. The ligand specificity of crystallization has led to protein engineering being applied to other types of RT complexes that have been resistant to structural studies in the past. RT69A is the first of the successful constructs tested with ligand specificity in mind. Unfortunately, the mutation F160S affects a residue involved directly with nucleotide binding, Yl 15. RT69A may not be the optimal construct for studying RNHIs, but it does show the utility of this approach. Further work with current constructs as well as further mutagenesis is being carried out to study currently intractable RT complexes.

Thus the present invention identified a RT mutant which gave diffraction quality crystals in the presence of TMC278. The superior crystallizabiiity and diffraction quality allowed by crystal engineering shows the usefulness of a systematic reiterative mutagenesis approach for crystallization of important drug targets. This success has led to the new ability of doing high-throughput crystallization of RT with NNRTIs. It is now possible to produce high-resolution diffraction within days of starting crystal trials with new NNRTIs. This provides, therefore, the long-needed, effective method for structure-based drug design through drug candidate co-crystallization studies as well as fragment screening (Hartshorn et at, 2005).

EXAMPLE 2: Structural Studies of Engineered RT with the Potent NNRTI TMC278

1. Increased order in the polymerase region of p66 permits higher resolution crystal structures

Crystals of wild-type RT/TMC278 had never diffracted to better than 8 A after 5 years and thousands of crystallization experiments. The provided crystals of RT52A7TMC278 diffracted to better than 1.8 A resolution. Table 7 and Figure 8 show the statistical quality of structures of RT52A/TMC278. The new crystal form of RT52A/TMC278 is altered in many ways from wild-type crystals, though both use Cl space group symmetry. As described in Table 7B, the unit cell dimensions have decreased with a 9% decrease in solvent content. The smaller unit cell reflects the tighter packing of RT.

In the new crystal form the p66 thumb and finger subdomains are constrained in the crystal lattice resulting in increased order in the crystals. The increased order produces higher resolution diffraction by X-rays. As depicted in Figure 9, the p66 fingers subdomain is bounded by the RNaseH domain and p51 subdomain of two different symmetry-related molecules. Tighter packing restricts the structural heterogeneity of the cleft-open form of RT, and therefore the crystals of RT52A/TMC278 are able to diffract to a surprisingly high 1.8 A resolution.

2. Validity of engineered RT structures

The structure of RT52A in comparison to other NNRTI/RT structures is shown in Figure 10. The RT52A7TMC278 structure has a RMSD of 2.44 A compared to a non-engineered RT structure with NNRTl Janssen-R129385 (Das et ah, 2004). A large shift in a p66 palm subdomain loop (near residue 222) of 6.6 A is responsible for 0.2 A of the RMSD between the TMC278 and R129385. The shift of the palm subdomain loop is seen in other NNRTI structures in the Protein Data Bank and the RIvISD of (lie engineered RT is similar to what it is seen between structures from

Table 7. Data collection and reOnemcnt statistics of RT with TMC278 different strains of HTV-I (~3.0 A). Analyses of the secondary structure of the TMC278 and R129385 crystals shows small differences (Figure 10B). The improved electron density due to the new crystal contacts allows for a clearer delineation of secondary structure in these important regions.

3. Binding of TMC278 to the WT NNIBP

The high-resolution electron density maps precisely define the position of each non-hydrogen atom of the inhibitor (Figure 1 1). The mode of binding is the "horseshoe" mode that has been seen for other DAPY compounds (Das et al, 2004). The "wings" of the NNRTI make π-π stacking interactions with Tyrl δl, Tyrl 88, and Tyr318.

A distinguishing feature of TMC278 to the other DAPY compounds is a cyanovinyl on

"Wing 1." The cyanovinyl is positioned in a hydrophobic tunnel composed of the sidechains of Tyrl88, Phe227, Trp229, and Leu234. The hydrophobic tunnel opens toward the nucleic acid binding cleft near the polymerase active site. The interaction of the cyanovinyl group and the tunnel explains the improved potency of TMC278 compared to other DAPY NNRTIs. The torsional flexibility of the cyanovinyl group should allow TMC278 to bind RT with mutations in the tunnel, such as the Tyrl 88Leu mutation.

4. Binding of TMC278 to Leu100Ile/Lys103Asn double mutant TMC278 overcomes all resistance mutations that it has been tested against.

The mutant that had the greatest effect on the EC50 (the 50% effective concentration) was the double mutant Leul00Ile/Lysl03Asn. The EC50 of the double mutant was 7 nM versus 0.4 nM for wild-type RT. The crystal structure was determined at 2.9 A resolution to elucidate the mechanism that TMC278 uses to overcome this very potent resistance double mutation. The RMSD of RT52A/TMC278 with the 2.9 A LeulOOIle/LyslO3Asn structure is 0.82 A.

Figure 12 shows the clear electron density defining the binding of TMC278 to the mutant RT. One of the interesting features of the structure is that TMC278 develops a hydrogen bond with Asnl03 instead of the hydrogen bond with the IlelOl main-chain carbonyl. The interaction with AsnlO3 should heip overcome the resistance of this mutation due to TMC278 disrupting the hydrogen bond network it normally forms in the unliganded structure. The LeulOOIle mutation causes a steric hindrance in NNIBP. TMC278 "wiggles" by altering its torsional angles and "jiggles" by translating 1.3 A in the pocketto adjust to the steric hindrance. By being able to bind in the NNlBP in multiple conformations, TMC278 is able to inhibit multiple variations of the NNIBP. The wiggling and jiggling phenomenon was first described from a single mutant structure of RT/TMC125 (Das el ah, 2004). This is the first study to directly show multiple conformations of the same inhibitor with different RT mutants.

5. Binding of TMC278 to LyslO3Asn/Tyrl81Cys double mutant Lysl03Asn and Tyrl 8lCys mutants were present in more than 10% of the patients failing retroviral therapy in one study (Cheung el al., 2004). The Lysl03Asn/Tyrl 81Cys double mutant is resistant to all available NNRTIs, but TMC278 has an EC50 of 1.0 nM against it (Guillemont el al., 2005; Janssen el al, 2005). To show how TMC278 avoids the resistance mutations we solved a 2.1 A resolution structure of it with the double mutant Lysl O3Asn/Tyrl 81Cys. The RMSD of RT52A/TMC278 with LyslO3Asn/Tyrl 81Cys structure is 0.61 A. Figure 13 depicts an overlay of RT52A/TMC278 and LyslO3Asn/Tyrl 8 I Cys with TMC278. Similar to the other double mutant, TMC278 makes a hydrogen bond with AsnlO3. Loss of Tyrlδl permits a shift in the Tyrl 83, which partially compensates for the lost interaction with Tyrlδl . This latter observation is especially interesting — Tyrl83 is part of the "YMDD motif," which is highly conserved in all HTV-I, HTV-2, and SlV RTs, and even present in HBV polymerase. The cyanovinyl moiety of TMC278 makes a favorable interaction with the aromatic side chain of Yl 83, essentially "recruiting" a portion of the polymerase active site to help in binding the NNRTI to compensate for loss of stabilizing interactions caused by the cysteine replacement of TyrlSl .

6. Summary of torsional flexibility

Table 8 summarizes the torsional flexibility of TMC278 with the two double mutants structurally determined in this study compared to the wild-type NNTBP protein. It is clear from the change in angles that the torsional flexibility of the cyanovinyl and "Wing 1" allows TMC278 to overcome the resistance mutation Leul00Ile/Lysl03Asn. This is the first study to directly demonstrate strategic flexibility in a series of mutants with the same inhibitor, providing a dramatic confirmation that wiggling and jiggling of an inhibitor can permit activity against a broad range of drug-resistant variants of a target such as HlV-I RT.

Table 8. TMC278 torsional angles.

EXAMPLE 3: Crystal Engineering of HIV-I Reverse Transcriptase for High- Throughput Crystallography

1. Expression vector and mutation construction

The RT coding DNA from the Q258C-RT construct (Sarafianos et al., 2003) was ligation independent cloned (LlC), with all vector-encoded amino acid sequence eliminated by restriction digestion post-LIC, into pCDF-2 Ek/LIC with the LIC Duet™ Minimal Adaptor (Novagen) according to manufacturer's recommendations. The RT-encoding dual expression vector is designated pRTl. Mutagenesis was completed using MOE-LIC. See Figure 17A for the location and pairing of the primers on pRTl . The methylated and non-methylated primers are listed in Table 1 1. To minimize false positive colonies the vector was restriction digested with the appropriate restriction enzymes to remove the ORF protein-coding DNA that was to be replaced (Ncol and Sad for ORF-I or Ndel and Xhol for ORF-2). For ORF-2, 3 μl of vector (250 ng/μl) was digested in a 20 μl volume with 1 μl Ndel (20,000 units/ml) and 1 μl Xhol (20,000 units/ml) for one hour at 37°C with NEBuffer2 (New England Biolabs). For p66. mutagenesis overlap extension PCR was performed using mutated overlap segments with the 2'-O-methylated primers to amplify the full insert with PfuUltra™ II Fusion HS DNA Polymerase (Stratagene). A typical overlap extension PCR was performed with 1 μl of each template, 1 μ! (20 pmols) of each primer, 39 μl water, 1 μl (25 mM each) dNTPs, 5 μl 1OX PfuUltra buffer, and 1 μl PfuUltra™ II Fusion HS DNA Polymerase (Stratagene). The PCR program is listed: 3 min at 95°C; followed by 5 cycles of 1 min at 95°C, 1 min at 50°C, and 30 s at 72°C; 30 cycles of 30 s at 95°C, 30 s at 53°C, and 45 s at 72°C; ending with a final extension step of 10 min at 72°C. The digested vector is [was?would be] amplified in a separate reaction tube with complementary methylated primers.

The PCR products were then gel purified, and 0.04 pmols of vector and insert were mixed at a 1 : 1 insert to vector molar ratio in a buffer containing 25 mM Tris pH 8.0, 5 mM MgCI2, 0.025 mg/ml BSA, and 2.5 mM DTT in a 20 μl volume. The mixture was heated to 70°C and cooled slowly over 2 h in a water bath. Once cooled to ~40°C, 1 μl of 25 mM EDTA was added and the mixture incubated at room temperature for 5 min before being desalted using a Centri-Sep column (Princeton

Separations) or ethanol precipitation (Donahue et ah, 2002). Five μl of desalted annealed DNA was added to electrocompetent NovaBlue cells (Novagen) and electroporated according to manufacturer's recommendations.

2. Expression and purification of RT pRT containing BL21-CodonPlus®-RlL cells were induced with 1 mM IPTG at an OD6oo of 0.9 followed by expression at 37°C for three hours. Ni-NTA purification was performed according to the manufacturer's recommendations (Qiagen) with the following modifications: no added lysozyme, 600 mM NaCI in each of the standard buffers, 0.1% Triton X-100 added to the lysate and wash buffers, and a high-salt wash step performed with 1.2 M NaCI added to the standard wash buffer. After elution the HRV14 3C protease was added (1 : 100 ratio of protease:RT ) and incubated at 4°C overnight. Mono Q was performed as described (Clark et ah, 1995).

The RT was buffer exchanged and concentrated to 20 mg/ml in 10 mM Tris pH 8.0 and 75 mM NaCl. The concentrated RT was aliquoted and stored at -80°C or placed at 4°C for immediate crystallization.

3. Crystallization The RT was screened unliganded, with a 2.5-foId molar excess of NNRTI, or with a 5-fold molar excess of RNHI using the hanging-drop vapor diffusion method. Depending on the number of samples being screened, EasyXtal DG-Tools (Qiagen) or Linbro Plates (Hampton Research) crystallization trays were used for screening. Based on visually identified crystal hits, further optimization was used. RT52A and RT69A crystals were produced in a matrix of 24 conditions from 9-12 % PEG 8000,

50 mM imidazole pH 6.0-6.8, 10 mM spermine, 15 mM MgSO4, and 100 mM ammonium sulfate. All successful crystallization experiments were performed at 4°C. 4. Data collection

Crystals of RT52A were flash-cooled by immersion into liquid nitrogen after briefly dunking the crystal into cryoprotective solution containing well solution plus 27% ethylene glycol and the inhibitor at the same concentration as in the hanging drop. Best results were found when using MicroMounts (MiTeGen) for mounting the crystals. Data for screening and data set collection were obtained at the Cornell High Energy Synchrotron Source (CHESS) Fl and Al beamlines, National Synchrotron Light Source (NSLS) beamlines X25 and X29, and Advanced Photon Source (APS) at Argonne National Laboratory (ANL), SER-CAT beamline 19ID. The diffraction data were indexed, processed, scaled and merged using HKL2000 (Otwinowski et ah, 1999). The resolution of the data was estimated using the last resolution shell values for completeness, R-merge, and the ratio of I to σ(l).

5. RT activity assays

The DDDP processivity assay was done as previously described (Boyer et ah, 2002). The RNase H activity assay was performed as described (Boyer et ah, 2004).

6. Results

Engineered RTs were mutagenized using the novel, flexible and cost effective method of the present invention, known as methylated overlap-extension ligation independent cloning (MOE-LIC). The new RT constructs faciliatates fast and high resolution structure determination that is enhancing the understanding of the enzyme's mechanisms and accelerating the design of improved drugs targeting RT.

The present Example used a co-expression system that allows subunit-specific mutagenesis at multiple positions and the addition of a purification tag on the C or N terminus of the subunit of choice for facile purification. In the initial co-expression construct, the p51 subunit consisted of 428 residues and a hexahistidine purification tag at the C terminus (Huang et al., 1998 and Sarafianos et ah, 2003). The co- expression construct codes for the p66 Q258C mutant, which is used to produce homogenous nucleic-acid cross-linked samples for X-ray crystallographic studies. This plasmid facilitates expression, purification, and crystallization of multiple RT constructs in parallel.

To produce diffraction quality crystals of RT with TMC278, a crystal engineering technique was developed that employs an iterative high-throughput approach to create and test RT mutants for crystallization. The approach examines many RT mutants in parallel for cloning, expression, purification, and crystallization. Based on the quality of the X-ray diffraction from the crystals, the next round of mutagenesis uses the best construct from the previous round as a template and other information obtained from the experiment for optimization. It was thereby attempted to artificially evolve RT for improved crystallization in a cyclic process in which the fittest RT construct from the previous cycle is the parental template for the next cycle (Figure 17).

Co-expression and mutant cloning

A modular co-expression system was chosen to allow high-throughput subunit-specific mutagenesis of RT (Figure 18A). The system allowed for high expression yield (~40 mg/liter) under standard expression conditions. In the expectation of creating many RT mutants, a rapid, high yield, and inexpensive mutagenesis system was sought. Donahue ei al (2002) proposed a ligation independent cloning technique, which uses terminator primers to create 12-15 nucleotide complementary overhangs on the insert and vector. The insert and vector are annealed and transformed into bacteria, thereby avoiding any post-PCR enzymatic steps. The terminating residue in the primer is a 2'-O-methyIated nucleotide, which causes early termination of thermostable polymerases Taq or Pfii (Figure 18B). There are two major limitations with this technique: 1) the 2'-O-methylated primers cost ~$100 per pair and 2) the site of 2'-O-methylation has a 20% mutation rate.

A novel terminator primer technique for rapid mutagenesis of RT called methylated overlap-extension ligation independent cloning (MOE-LIC) was developed by the present invention. MOE-LIC uses overlap-extension mutagenesis (Ho et al., 1989) and terminator primers outside the open reading frame (ORF) to avoid unwanted mutagenesis of the coding or regulatory regions. Overlap-extension PCR provides the flexibility for inserting a completely new sequence or mutagenizing a previously constructed insert (Horton et al, 1989). For the co-expression system a total of three terminator primer pairs (Figure 18A) were required at a cost of approximately $300, which could be used for over a thousand reactions with no additional enzyme cost besides the PCR polymerase. Error rates were found to be extremely low with one unintended mutation found per 30 mutants produced. Mutagenesis and crystallization

A protein engineering methodology for the crystallization of RT was developed by combining several strategies as follows: I) disrupt or enhance common crystal contacts in the existing crystal forms of RT; 2) remove high B-factor patches, primarily disordered termini in the parent C2 RTYNNRTI crystal form; 3) reduce surface entropy by mutagenesis of lysine and glutamic acid patches to alanine (for review Derewenda and Vekiloc, 2006); 4) use the wealth of information about multiple crystal forms of RT (e.g., sequence variations, different sets of crystal contacts, ordered/disordered regions, etc.); 5) avoid mutating conserved residues; and 6) use multiple iterative rounds of mutagenesis/crystallization to improve the X-ray diffraction quality (Figure 17). Figure 18C shows the location of the mutations that were made for crystallization trials (see Table 9 for a complete list of the 59 RT variants and the diffraction resolution of the crystals). Eighteen crystallization conditions, chosen from previously reported crystal lographic studies of HIV-I RT (Clark et al, 1995, Chan et al, 2001, Rodgers et al, 1995, Hogberg et al, 1999, and unpublished data), were used for the initial crystal screening of each RT variant (Supplemental Table 10). Crystallization of individual RT samples was attempted unliganded, with TMC278, and with other NNRTts in parallel.

The first round of mutagenesis/crystallization produced constructs RTl-IO and crystals of RT/TMC278 complexes that diffracted to very poor resolution (Figure 18D). Although none of the constructs produced improved X-ray diffraction quality, one construct where p66 is terminated at residue 555 produced larger crystals than those terminated at residue 560. In the next cycle, the termini for both the p66 and p51 subunits were optimized. Based on the notion that disordered termini residues hinder tight packing in the crystal lattice, any disordered residues at the termini, including purification tags, were removed prior to crystallization. The C termini were truncated at residue 555 for p66 and 428 for p51 based on knowledge from existing RT crystal forms. Of the three round two constructs, RTl 3 A with a N-terminal HRV14 3C cleavable 6XHis-tag gave the highest yield of monodisperse protein, as measured by dynamic light scattering, suggesting the sample as the optimal candidate for crystallization trials. RT13A became the template for the third round of mutagenesis, resulting in constructs RT21-35. The crystals of RT24A/TMC278 complex diffracted X-rays to 3.3 A resolution, which was the best achieved (with TMC278) compared with any previous RT construct. The 3.3 A diffraction dataset was anisotropic and produced multiple diffraction patterns, which did not permit obtaining a reliable complete data set necessary for structure determination. All the above constructs of RT had a p66 Q258C mutation that was used for cross-linking RT to nucleic acid (Huang et al., 1998 and Sarafianos et al, 2003). It was decided to revert residue 258 to glutamine in the fourth round of mutagenesis to reduce any disorder resulting from having a surface cysteine residue not crosslinked to nucleic acid.

New crystal form and high-resolution diffraction from RT52A/NNRTI crystals

RT52A (Figure 18E), which is RT24A with a C258Q reversion, when complexed with TMC278 and other NNRTIs could produce crystals within 1-3 days. The crystals of the RT52A/NNRTI complexes diffracted X-rays to high resolution (often better then 2.0 A resolution). The quality of the 1.8 A RT52A/TMC278 structure (Das et al., 2008) is evident from the high-resolution electron density map for the inhibitor shown in Figure 19A. The structures of RT52A/NNRTI complexes revealed a new crystal form of RT. This new crystal form has preserved the symmetry of its parent crystal space group C2, but with distinctly different unit cell parameters and crystal contacts (Figure 19B-C). Tighter crystal packing of RT52A molecules is evident from a 14% decrease in solvent content and a 19% decrease in unit eel! volume compared to NNRTl complexed with non-engineered RT (construct designated I Bl) (Clark et al, 1995). There is also a near doubling in the number of residues involved in crystal packing (within 4.5 A of each other), from 97 residues to 194, and the surface area involved in crystal contacts, from 1556 A2 to 2707 A2(http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html).

For 1B1-RT/NNRTI crystallization studies, RT was expressed as a single- chain p66 that produces a p51 chain via cleavage at residue 447 by a co-purifying bacterial protease, ultimately yielding p66/p51 heterodimer (Clark et al 1995). Impressively, altering RT52A at the C terminus of p51 to terminate at 447 alone changes the unit cell to that seen with IBl-RT complexed with NNRTIs, but with a significant drop in diffraction resolution to 2.7 A (Table 9). Mutants were then constructed to test each of the changes made to produce RT52A, and each was found to be required for high-resolution X-ray diffraction (Table 9). These results provide clear evidence of the benefit of crystal engineering through RT mutations at multiple sites. Table 9

TABLE 10

The use of drug fragment cocktail screening (Bosch et α/., 2006) is a potentially powerful technique for finding new inhibitors and sites for inhibition, but this approach was less feasible with the earlier, lower resolution RT crystals. Drug fragment cocktails are usually dissolved in DMSO prior to soaking of crystals in the DMSO plus crystallization solution. To determine if RT52A crystals could be used for fragment cocktail screening, crystals were soaked in 10-20% DMSO before and during cryoprotection. No loss in diffraction quality was found with 10% DMSO, and only a moderate decline in diffraction quality was found when 20% DMSO was used (2.0 A versus 1.8 A, data not shown). This result indicates that RT52A is a suitable construct for structure-based drug design through screening for binding υf drug-like small chemical fragments and lead optimization at both existing and novel sites. TABLE 11

The Tm field contains three melting temperature values calculated in the program AmplifX (http://ifrir.nord.univ-mrs.fr/AmplifX). The melting temperatures of the oligonucleotide from the template are calculated three different ways: the standard simple and approximate method: TM=81.5+0.41 *GC-675/N, the second takes the salt concentration in a PCR reaction into account TM = 81.5 +16.6 X loglO([Na+]+[K+])+0.41 x(%GC) - 675/N (with default values: [ Na+]+[K+]=0.05 (5OmM)), the third is the most precise called the bases stacking method TM=∑ΔH/(∑ΔS+0.368xNxin[Na+]+ R x ln[Primer]/4) with R=I.987 and the different ΔH and ΔS taken in Santalucia (1998). Santalucia J PNAS 95 pp!460-1465 (1998)

Validation of RT52A and derivatives

Comparison of the RT52A/TMC278 structure with IBl RT/NNRTI structures showed that the fold for RT, distribution of secondary structure elements, and mode of NNRTI binding (Das et al 2008) are very similar, suggesting no significant impact of crystal engineering mutation on structure and functions of RT. Proteins RT35A (RT52A without the K172A/K173A mutation), RT51A (RT52A+L100I/K103N), RT52A, and RT55A (RT52A+K103N/Y181C) were tested for DNA-dependent DNA polymerase processivity and RNase H activity. Figure 18A shows that RT52A has similar processivity as wϋd-type HlV-I RT (RT p66 co-expressed with HTV-I protease), with RT51A having a diminished processivity and RT55A an increase. Each of the mutants has similar RNase H activities (Figure 18B). Engineering of high-resolution apo-RT crystals

While RT52A successfully produced crystals of RT/NNRTI complex diffracting to high resolution, the unliganded RT52A crystals diffracted only to ~3 A resolution (Table 9). The apo-form of I B l RT (Hsiou et al. 1996) crystallizes with different unit cell parameters compared to those of RT/NNRTI complexes. The difference in the unit cell parameters between RT and RT/NNRTI crystals is a consequence of packing of two structurally distinct (thumb up vs. down) conformations of RT. This may explain why RT52A, which is optimized to produce RT/NNRTI crystals diffracting to high resolution, fails to do so if crystallized without an NNRTI. A different set of mutations may therefore be necessary to obtain a high- resolution apo-RT crystal form.

Subsequent rounds of mutagenesis were focused on obtaining high-resolution crystals of apo enzyme and in complexes with RNase H inhibitor (RNHI) bound or DNA bound RT. One of the successful mutants for apo and RNHI-bound crystals is RT69A which contains an adventitious mutation F160S; this construct yields crystals that diffract X-rays to 1.8 A resolution. Another mutant with improved crystals for RNHIs is RT97A, which contains the mutations P468T/N471D in addition to the RT52A mutations. RT97A produces crystals that diffract X-rays to 2.1 A resolution. Thermal stability assays of various mutants using circular dichroism did not show any significant in-solution stability changes that would lead to the observed improvement in diffraction quality (Figure 19).

Discussion

In general, there is no rationale for crystallization of proteins and improvement of diffraction quality of crystals, although it is highly desirable and remains challenging. Our successful approach of using protein engineering to improve the resolution from ~6 A to 1 ,8 A of a very important HIV-I drug complex has implications in designing anti-AIDS drugs and also provides a rare example of the use of rational approaches in enhancing the diffraction quality of macromolecular crystals. Reverting each of the mutations of RT52A either caused a loss in crystallization or diminished diffraction quality. Further mutagenesis showed that the unit cell of RT52A/KNRTI complexes is primarily defined by the termini mutations, whereas, the other mutations have additive effects in increasing the X-ray diffraction resolution (Figure 20 and Table 9). We propose that the unit cell of RT52A and the very similar unit cell of RT69A (except ~4D difference in b) are determined primarily by the termini of the construct, and the residue substitutions cause a stabilizing effect on the crystallized conformation of RT. The stabilization of a crystallized conformation within the confines of tighter crystal packing is thereby responsible for the improved diffraction (Figure 20). A flexible enzyme like RT that exhibits hinge movements may assume different conformations in solution and becomes relatively homogeneous when complexed with a ligand that may favor a single conformation or a subset of confoπnations. Different types of ligands enrich specific subsets of RT conformations and therefore favor formation of different crystal forms. In the process of protein engineering for crystallization, it became clear that the engineering protocol must be applied and optimized separately for different conformations of RT induced by binding of distinctive types of ligand and substrates. The ligand specificity of crystallization has led to protein engineering being applied to other types of RT complexes that have been resistant to structural studies in the past. RT69A is the first of the successful constructs tested with non-NNRTI ligand specificity in mind.

RT69A contains the mutation F 160S that is located adjacent to the binding cleft for nucleic acid near the polymerase site. Therefore, RT69A may not be the optimal construct for studies near the polymerase active site; however, it is a suitable construct for structural studies of RT in complexes with RNHIs. RT97A does not contain the F160S loss-of-fiinction mutation but with X-ray diffraction resolution of 2.1 A can be readily used for studies of RNHIs. Further work with current constructs as well as further mutagenesis should provide high-resolution structures of RT in different functional states, especially those with bound nucleic acid template-primers.

The approach of the present invention was successful in finding a RT mutant that gave diffraction quality crystals in the presence of TMC278. The superior crystallizability and diffraction quality obtained by crystal engineering demonstrates the usefulness of a systematic iterative mutagenesis approach for improving crystallization of critical drug targets and functionally important macromolecules. This success has led to the feasibility of doing high-throughput crystallization of RT iπ complex with NNRTIs. It is now possible to produce high-resolution diffraction within days of starting crystal trials with a new inhibitor. This opens up new possibilities of structure-based drug design through drug candidate co-crystallization studies as well as fragment screening (Hartshorn et al, 2005).

EXAMPLE 4: High resolution structures of HIV-1 RT/TMC278 complexes: Strategic flexibility explains potency against resistance mutations

1. Expression, Purification, and Crystallization The IBl RT used in earlier crystal lographic studies of RT/NNRTI complexes produced RT/TMC278 crystals that diffracted to only 6 A resolution. To overcome this obstacle and obtain suitable diffraction data for structural studies, a systematic crystal engineering approach that improved resolution was employed. The RT used in the current cjysta I lographic analyses reported here were developed using this strategy. The RTA1MClTS complexes were crystallized from an engineered RT at 20 mg/ml in 10 mM Tris pH 8.0, 75 mM NaCl containing TMC278 with a 5: 1 molar ratio of TMC278 to RT. Crystals were obtained in hanging drop vapor diffusion setups at 4°C. The well solution contained 12% PEG 8000, 100 mM ammonium sulfate, 10 mM MgCI2, 15 mM spermine, and 50 mM imidazole buffer at pH 6.8. The crystals grew to appropriate size for diffraction within one week. Crystals of the RT/TMC278 complexes were dipped for 10 seconds in their respective mother liquors containing 25% ethylene glycol for cryoprotection. The cryoprotected crystals were flash-cooled in liquid N2 and transported to synchrotron sources.

2. Structure Solution Diffraction data were collected from one crystal of each type of RT/NNRTI complex at the Cornell High Energy Synchrotron Source (CHESS) Fl beam line. The data were processed using HKL2000. The engineered RT/TMC278 complexes crystallized in a new crystal form. The previously reported structure of the RT/R147681 complex was used as a starting model for obtaining molecular replacement solutions for the structure of the wild-type RT/TMC278 complex. The 1.8 A resolution structure of the wild-type RTA"MC278 complex was used as the starting model for obtaining the structures of Ϊ100L/K103N mutant and K103N/Y181C mutant RTs in complexes with TMC278. The final models for the three structures were obtained after cycles of model building in COOT and restrained refinement using REFMAC and CNS 1.1. The high resolution structure of the RT/TMC278 complex revealed no metal binding at the polymerase active site. Also, no metal ion with clear coordination geometry could be located at the RNase H active site. An electron density peak that is nearly the positional equivalent of a metal cation at the RNase H active site, however, was assigned as a water as its lacks the proper metal coordination. Coordinates and structure factors for the structures of wild-type RT7TMC278, K103N/Y181CΛTMC278, and L100I/K103NΛTMC278 complexes are available from the Protein Data Bank with PDB IDs 2ZDl, 3BGR, and ZZZ, respectively. HIV-I RT/TMC278 complex

The present invention describes the structure of wild-type HlV-I RT complexed with TMC278 at 1.8 A resolution, using a new RT crystal form engineered by systematic RT mutagenesis. This high resolution structure reveals that the cyanovinyl group of TMC278 is positioned in a hydrophobic tunnel connecting the NNRTI -binding pocket to the nucleic acid-binding cleft. The crystal structures of TMC278 in complexes with the double mutant K103N/Y181C (2.1 A) and L100I/K103N HIV-I RTs (2.9 A)5 demonstrated that TMC278 adapts to bind mutant RTs. In the K103N/Y181 C RT/TMC278 structure, loss of the aromatic ring interaction caused by the Y181C mutation is counter balanced by new interactions between the cyanovinyl group of TMC278 and the aromatic side chain of Yl 83, which is facilitated by an ~1.5 A shift of the conserved Y183MDD motif. In the L100I/K103N RT/TMC278 structure, the binding mode of TMC278 is significantly altered so that the drug conforms to changes in the binding pocket primarily caused by the LlOOI mutation. The flexible binding pocket acts as a molecular "shrink wrap" that makes a shape complementary to the optimized TMC278 in wild-type and drug- resistant forms of HIV-I RT. The crystal structures provide a better understanding of how the flexibility of an inhibitor can compensate for drug resistance mutations.

A systematic protein engineering approach according to the present invention was used to obtain a mutant form of RT that yielded better diffracting crystals of the RT/TMC278 complex. Successful protein engineering included: (i) truncating the termini of the protein; (ii) removing surface lysine and glutamic acid patches; and (iii) altering amino acid residues to make new lattice contacts and/or remove some of the lattice contacts seen in earlier crystal forms. This mutated RT produced crystals of the HIV-I RT/TMC278 complex in a new crystal form that is distinct from the reported crystals of RT/NNRTI complexes. One of the new crystal forms of the engineered RT/NNRTI complexes diffracted X-rays to 1.8 A, significantly better than any of the reported structures of HTV- 1 RT. The L100I/KI03N and K103N/Y181C double mutant (in the p66 subunit only) RTs were designed based on the above construct, and their structures in complexes with TMC278 were determined at 2.9 and 2.1 A resolution, respectively. Results Engineering RT for High Resolution Diffraction

Numerous earlier attempts to obtain a crystal structure of the HTV-I RT/TMC278 complex failed and the best crystals diffracted X-rays to only 6 A resolution. In contrast, the engineered RT/TMC278 complex crystallized in a new foπn and the crystals diffracted X-rays to 1.8 A resolution (Table 2). The structure of wild-type HlV-I RT/TMC278 was determined by molecular replacement using the structure of RT/R147681 (PDB ID 1 S6Q) as the starting model and refined to 1.8 A resolution to R-work and R-free of 0.221 and 0.248, respectively. This high resolution structure of HJV-I RT has excellent stereochemistry (> 91% of amino acid residues are in the most favored regions of the Ramachandran plot with no outliers; Procheck G-factor = 0.25) and a reliable solvent model. The form of recombinant RT (IB l) used in previous structural studies of RT/NNRTI complexes crystallized with the symmetry of space group C2 with unit cell volume -1.6 x 106 A3, one molecule/asymmetric unit, approximate solvent content of 64%, and a Matthews coefficient of 3.4. The p66 fingers and thumb subdomains are flexible and not involved in any significant crystal contacts. Earlier structural studies have shown that NNRTl binding is accompanied by repositioning of the thumb and fingers subdomains, resulting in a conformation of RT with a wide cleft between these mobile subdomains. Comparing the crystal structures of a number of RT/NNRTI complexes revealed that individual NNRTIs have both short-range and long-range effects on the conformation of RT, and affect the precise positioning of the p66 thumb and fingers subdomains. Several DAPY inhibitors, by virtue of their structural flexibility and compactness, are predicted to have the ability to bind RT in more than one conformation. These different binding modes for a single NNRTI may also lead to differences in the positions of the fingers and thumb. In the context of a crystal lattice, this heterogeneity in the arrangement of RT molecules would reduce the resolution of X-ray diffraction from the crystal. An engineered RT variant (RT52A) crystallized in a new form with the symmetry of C2 space group and a unit cell volume of -1.3 x 106 A3. The unit cell volume and the solvent content are reduced by -18% and 28%, respectively, compared to the C2 crystal form obtained with the parental RT. The lower solvent content of -56% with a Matthews coefficient of 2.8, compared to 64% solvent of the old C2 unit cell with a Matthews coefficient of 3.4, reflects significantly tighter packing of the RT molecules in the crystal lattice. The new crystal form involves new protein contacts; of the new contacts, a set of back-to-back interactions between the p66 thumb and p66 fingers of symmetry-related RT molecules may be critical in stabilizing the positions of p66 thumb and fingers subdornains in the new crystal lattice (Supplementary Fig. Sl). The tighter packing of the engineered RT molecules and the specific intermolecular interactions seen with this form of RT may have contributed to the higher order and high resolution (1.8 A) diffraction. A total of

113,072 unique reflections were used to refine the structure of one RT molecule, which is about 2.3 times the number of observations (49,347 reflections) used in refining the published highest resolution (2.2 A) structure (PDB ID I VRT) of an HIV-

1 RT/NNRTI complex. The substantial increase in experimental measurements leads to higher accuracy and greater overall reliability of the current structure.

Individual subdomains of the engineered wild-type (RT52A) RT/TCM278 structure and IBl RT/TMC120 structure are highly similar. The overall Ca atom superposition of the structures had an rmsd of 1.6 A, primarily due to small differences in the relative positioning of the subdomains. The Ca superposition of the binding pocket regions of both structures (residues 98-1 10, 178-190, and 226-240 of the p66 subunit) showed an rmsd of 0.85 A. The overall similarity in the binding of the two DAPY compounds also suggests only subtle or modest effects of the crystal- engineered mutations on the inhibitor binding. The engineered RT52A also exhibited both DNA polymerization and RNase H activities similar to IBl RT. 1.8 A resolution structure of the wild-type HTV-I RT/TMC278 complex

Overall, the structure of the p66/p51 RT heterodimer (Figure 21 A) in the HIV- 1 RT/TMC278 complex resembles the open-cleft conformation seen in the previous structures of RT/NNRTI complexes. The electron density maps (Figure 21B) unambiguously defined the position and conformation of TMC278 in the structure of HTV-I RT/TMC278 complex. TMC278 has a conformation that is similar to the horseshoe conformation seen with other DAPY inhibitors, with the three aromatic rings connected by two linking amino groups, and a cyanovinyl (acrylonitrile) substituent that is unique to TMC278 (Figure 21C). The torsion angles of the rotatable bonds (τl-τ4) of TMC278 have values similar to those of the prototype DAPY analog TMC120 (R147681/dapivirine) bound to RT, although the two structures were determined in two different crystal foπns using two different RT constructs.

TMC278 makes important contacts with a number of key amino acids in the KNRTI binding pocket (Figure 22). The hydrogen bond between a linker nitrogen atom of TMC278 and the main-chain carbonyl oxygen of KlOl is conserved in the binding of many NNRTIs. The second linker nitrogen is involved in a water- mediated hydrogen bond network with the main-chain carbonyl group of E138 of the p51 subunit (Figure 22A). The dimethylphenyl ring and its attached 4-cyanoviπyl group interact with the hydrophobic core of the binding pocket. The cyanovinyl group is positioned to fit into a hydrophobic tunnel formed by the side chains of amino acid residues Yl 88, F227, W229, and L234; this tunnel opens toward the nucleic acid-binding cleft (Figure 22B). A similar tunnel was seen in the binding of a cyanovinyl-containing iodo-pyridinone (IOPY) NNRTI (PDB ID: 2B5J). In the free TMC278 molecule, the cyanovinyl group is expected to be coplanar with the dimethylphenyl ring. However, in the RT-bound conformation, the plane of the cyanovinyl group is inclined 45° to the plane of the dimethylphenyl ring. The extensive interactions of the cyanovinyl group with the hydrophobic tunnel may explain why TMC278 is the most potent of the DAPY analogs.

The high resolution structure provides a reliable solvent model. The amino acid residues KlOl and Kl 03 are solvent exposed (Figure 22A) and, if mutated, each can confer NNRTI resistance. In the RT/TMC278 structure, the Nζ atom of Kl 03 interacts with two water molecules whereas the corresponding Nζ of KlOl interacts with four oxygen atoms: the carbonyl oxygen of G99, both carboxyl oxygen atoms of E138 (of the p51 subunit), and a water molecule. The location of the KlOl-NQ atom in the TMC278 complex is similar to that in the recently published structure of the HIV-I RT/GW420867X complex; however, the identification of the interaction between KlOl-Nζ and four surrounding oxygen atoms including one from a solvent water molecule defines a novel polar environment for KlOl. The different environments for and interactions of KlOl and Kl 03 may help account for the differences in resistance seen when these two lysines are mutated, even though both of their side chains point toward a common putative entrance to the NNRTI-binding pocket.

Structure of the K103N/Y181C double mutant RT/TMC278 complex K103N and Y181C are the two resistance mutations most frequently observed in patients treated with NNRTIs, and viruses carrying these mutations show high levels of resistance to existing NNRTIs. However, TMC278 inhibits K103N, Y181C, and K103N/Y181C RT mutants at an EC50 < 1 nM. The crystal structure of the K103N/Y181C mutant RT/TMC278 complex was determined at 2.1 A resolution with R-work and R-free of 0.228 and 0.269, respectively . Superposition of this structure onto the wild-type RT/TMC278 structure revealed no major conformational changes for the bound TMC278 (Figure 23). The number of distances < 4.5 A between pairs of atoms, one from RT and the other from TMC278, was used as an indicator of the extent of the hydrophobic interactions between RT and TMC278. In the K103N/Y181C mutant RT/TMC278 complex, the number of such distances is 51, which is almost same number of distances in the wild-type RT/TMC278 complex. A slight tilt (5°) of τ3 results in displacement of the dimethylphenyl-4-cyanovinyl group away from the mutated Yl 81C side chain. The interaction between the dimethylphenyl ring of TMC278 and the aromatic side chain of Y181 is lost, and a void is created by the mutation. In the structure of the mutant RT, the amino acid residue Y 183, which is part of the conserved YiaaMDD motif at the polymerase active site, is shifted by -1.5 A toward the NNRTI-binding pocket, permitting it to participate in the binding of TMC278 by binding the cyanovinyl group (Figure 23).

The ability of the cyanovinyl group of TMC278 to recruit Yl 83 helps to compensate for the loss of interactions due to Yl 8 IC mutation; the involvement of

Yl 83 in this compensatory interaction is particularly fortuitous and significant because Y 183 is completely conserved in all HTV-I sequences. This mode of compensatory interaction is different from that observed for another NNRTI, HBY

097, which developed a hydrogen bond with the thiol group of the mutated C181 side chain; in the K103N/Y181 C mutant RT structure, the thiol group of C181 has a hydrogen bond with a water molecule at the equivalent position of the Oh atom of

Yl Sl in the wild-type RT/TMC27S structure. Subtle conformational changes (as reflected by the torsion angles τ of TMC278 in the K103N/Y181 C mutant RT/TMC278 complex enhance the interactions of the cyanovinyl group with the modified hydrophobic tunnel (Figure 22B), supplementing the contributions of the novel interactions with Y 183. The extent of the interaction between the other mutated amino acid, N103, and TMC278 is comparable to the interaction between K103 of wild-type RT and TMC278: the number of distances < 4.5 A between the atoms of TMC278 and the amino acid K/N103 are 16 and 17, respectively, in the wild-type and the mutant structures. Structure of the L100I/K103N mutant RT/TMC278 complex

Among the known NNRTI-resistance mutations the L1001/K103N double mutation has the greatest effect on the potency of TMC278. However, TMC278 still inhibits the double mutant at ~8 nM EC50 (Table 1). The crystal structure of L100I/K103N mutant RT/TMC278 complex was determined at 2.9 A resolution. The refined structure has R and R-free of 0.240 and 0.299, respectively. In the wild-type RT/TMC278 structure, LlOO is near the center of the pocket and primarily interacts with the central pyrimidine ring of TMC278; Kl 03 is located on the other side of the pyrimidine ring. Comparison of the structures of the L100I/K103N mutant RT/TMC278 and wild-type RT/TMC278 complexes (Figure 24) shows that B- branching of 1100 in the LlOOI mutant would lead to steric conflict with the inhibitor if TMC278 were to bind in a conformation similar to that seen in the wild-type RT/TMC278 complex; in the mutant structure the Cγ2 atom of Il 00 would be only ~2 A away from the position of the central pyrimidine ring of TMC278 when it is bound to wild-type RT. However, when TMC278 binds to the L1001/K103N mutant RT the drug undergoes significant conformational (wiggling) and positional (jiggling) rearrangements compared to the position is which it binds to wild-type RT (Figure 24). To avoid steric conflict with the LlOOI mutation TMC278 shifts away from 1100 and towards Nl 03 (Figure 24A); the position of the entire inhibitor molecule is displaced by -1.5 A in the pocket. The number of distances < 4.5 A between TMC278 and 1100 is 13 in the complex with the L1001/K103N mutant RT, which is considerably less than the 28 and 30 distances < 4.5 A in the complexes with wild- type RT and K103N/Y181C mutant; however, in compensation, the number of protein-ligand distances < 4.5 A for residue 103 increases from 16 and 17 in the wild- type and K103WY181C mutant structures, respectively, to 27 in the L100I/K103N mutant RT/TMC278 structure. In the L100I/K103N complex, the rotatable torsion angles τl-τ5 of TMC278 are changed by 18, 18, 5, 22, and 45°, respectively, with respect to the wild-type RTATMC278 complex. Unlike its configuration in the wild-type RT/TMC278 and Y 181C/K103N mutant RT/TMC278 structures, the cyanovinyl group is almost co- planar with the dimethylphenyl ring in the L100I/K103N mutant RT/TMC278 structure. In the L100I/K103N complex structure the amino acid residues in the NNRTI-binding pocket are rearranged to optimize the inhibitor-protein interactions. This contrasts with an earlier proposal that the basis of the effects of the LlOOI mutation was a loss of interactions with Y181 and Yl 88. However, analysis of all of the structural results shows that LlOOI introduces a significant distortion in the NNRTI-binding pocket. NNRTIs that do not have the ability to wiggle and jiggle and adapt their shape to the various pockets found in the NNRTI-resistant RTs fail against the known mutants either because their binding is susceptible to steric hindrance, because they lose key hydrophobic interactions, or because mutations like K103N interfere with entry of the NNRTIs into the pocket. Role of the cyanovinyl group of TMC278

The cyanovinyl group of TMC278 is not present in the other DAPY analogs. Analysis of the crystal structures suggests that the cyanovinyl group contributes to the enhanced potency of TMC278 relative to the other DAPY analogs, and that this moiety helps TMC278 to retain potency against NNRTI-resistance mutations. As has already been discussed, the cyanovinyl group is positioned in a cylindrical tunnel connecting the NNRTI-binding pocket to the nucleic acid-binding cleft that resembles a "piston and ring" structure (Figure 22B). The extent of the interactions between the cyanovinyl group and the hydrophobic tunnel is conserved despite rearrangements in RT and TMC278 that accompany the pocket mutations (Supplementary Table S I). Apparently, the maintenance of cyanovinyl group interactions with RT is critical for retaining the potency of TMC278 against a broad range of NNRTI-resistance mutations. Analysis of the torsional flexibility clearly demonstrates how TMC278 is resilient in overcoming the effects of drug-resistance mutations. A 2D infrared spectroscopic study of TMC278 complexed with the engineered RT52A HTV-I RT revealed that the conformational distribution of drug- protein complexes is relaxing on the tens of picoseconds timescale; i.e., TMC278 loses structural "memory" of its binding mode within lens of picoseconds. These motions are consistent with the concept that TMC278 is flexible even when bound to HlV-I RT and can change its conformation to adapt to the elastic NNRTI- binding pocket.

Implications for Drug Design High resolution structures of RT provide opportunities for understanding inhibitor-protein interactions with greater accuracy, more reliable determination of the structural effects of resistance mutations, and for systematic structure-based drug design targeting the NNRTI-binding pocket. The opening of the tunnel to the nucleic acid-binding site suggests the possibility of extending NNRTIs so that they could interact directly with the conserved residues involved in dNTP and/or nucleic acid binding, a concept that has been previously proposed. The interactions of the TMC278 cyanovinyl group with the hydrophobic tunnel enhances the binding of the inhibitor, and the group is also important for the potency of the inhibitor against drug- resistance mutations. Comparison of structures of TMC278 in complexes with L10QI/K103N and wild-type RT clearly demonstrated the importance of strategic flexibility (wiggling) and repositioning (jiggling).

The RT-bound conformations of TMC278 are somewhat different from each other and from its free-state low-energy conformation obtained using the molecular modeling software Schrddinger (http://www.schrodinger.com/). However, the total energy calculated for the different conformations of TMC278 are not significantly different from its free-state low-energy conformation. It is expected that a small molecule would bind to a receptor approximately at its low-energy conformation. The fact that TMC278 can achieve near-low-energy conformations when bound to different forms of HTV-I RT explains why TMC278 maintains its high potency against the mutant RTs.

The HTV-I RT binding pocket for NNRTIs is flexible and can accommodate a diverse range of small molecule chemotypes. The binding pocket flexibility can be described as a "molecular shrink wrap" phenomenon in which the protein structure adapts and can form a complementary shape to surround the bound inhibitor. Analysis of the K103N/Y181C mutant RT/TMC278 structure reveals how TMC278 can take advantage of the structural flexibility of RT5 inducing localized changes in the protein that lead to new interactions wilh Yl S3 that compensate the loss of the hydrophobic interaction caused by the Y 181 C mutation. The fact that compensatory changes can occur both in the protein and in the drug suggests that optimal drug design strategies should carefully consider and take advantage of the flexibility of both the inhibitor and protein. Considering the potential flexibility of both the protein and the drug should be strategic considerations in early stages of programs to design drugs that are intended to be broadly effective against targets that readily mutate and develop drug resistance.

This invention may be embodied in other forms or carried out in other ways without departing from the spirit or essential characteristics thereof. The present disclosure is therefore to be considered as in all respects illustrative and not restrictive, the scope of the invention being indicated by the appended Claims, and all changes which come within the meaning and range of equivalency are intended to be embraced therein.

The documents cited and listed herein, related to the above disclosure and particularly to the experimental procedures and discussions. The documents should be considered as incorporated by reference in their entirety.

We claim:

1. An isolated nucleic acid molecule encoding a peptide comprising the amino acid sequence of SEQ LD NO: 1.

2. An isolated nucleic acid molecule encoding a peptide comprising the amino acid sequence Of (SEQ ID N0:2).

3. The nucleic acid molecule of claim 1, further comprising SEQ ID NO: 2.

4. An isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HTV-RT) wherein at least one terminal end of the protein is truncated.

5. An isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HTV-RT) wherein: a. the amino-terminus of HtV-RT p66 comprises amino acid residues MVPISP (SEQ ID NO: 4); b. the nucleic acid molecule encodes alanine at amino acid residue 172 ofp66; c. the nucleic acid molecule encodes alanine at amino acid residue 173 of p66; d. the nucleic acid molecule encodes serine at amino acid residue 280 of p66; e. the nucleic acid molecule encodes serine at amino acid residue 280 of p51 ; f. the carboxy-terminus of p66 terminates at residue 555; and g. the carboxy-terminus of HIV-RT p51 terminates at residue 428.

6. The nucleic acid molecule of claim 5, wherein the amino-terminus of p51 comprises a human rhinovirus subtype 14 3C (HRV-14 3C) protease cleavage site, wherein the HRV-14 3C protease cleavage site is situated between a hexaHIS purification tag and the p51 coding sequence, thereby facilitating generation of a post-protease amino- terminus of gPISP upon exposure to HRV- 14 3 C protease under standard conditions for HRV- 14 3 C protease activity.

7- The isolated nucleic acid molecule of claim 5, wherein the nucleic acid molecule comprises the nucleic acid sequence of SEQ ID NO 3.

8. A recombinant vector comprising the nucleic acid molecule of claim 5.

9. The nucleic acid molecule of claim 5, wherein the nucleic acid molecule encodes HlV-RT p66 and the amino-terminus of p66 begins with the amino acid residues MVPISP (SEQ ID NO: 121 ).

10. An isolated nucleic acid molecule encoding at least a portion of the amino acid sequence of human immunodeficiency virus reverse transcriptase (HIV-RT), wherein the nucleic acid molecule encodes alanine at amino acid residue 172 of p66.

11. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes HTV RT p66 and wherein the amino terminus of p66 comprises amino acid residues MVPlSP (SEQ ID NO: 121).

12. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes alanine at amino acid residue 173 of p66.

13. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes encodes serine at amino acid residue 280 of p66.

14. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes serine at amino acid residue 280 of p51.

15. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes HlV RT p66 and wherein the carboxy-terminus of p66 terminates at residue 555.

16. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes HTV RT p51 and wherein the amino-terminus of p51 comprises a human rhinovirus subtype 14 3C protease (HRV- 14 3C) cleavage site.

17. The isolated nucleic acid molecule of claim 16, wherein the HRV-14 3C protease cleavage site is situated between a hexaHJS purification tag and the p51 coding sequence, thereby facilitating generation of a post- protease amino-terminus of gPISP upon exposure to HRV-14 3C protease under standard conditions for HRV-14 3C protease activity.

18. The isolated nucleic acid molecule of claim 10, wherein the nucleic acid molecule encodes the carboxy-terminus of p51 terminates at residue 428.

19. A composition comprising the HIV-RT product of the expression of the nucleic acid of claim 3.

20. An isolated nucleic acid or portion thereof wherein the nucleic acid: a. encodes at least a portion of a human immunodeficiency virus (HIV) reverse transcriptase (RT); and b. is capable of hybridizing under standard hybridization conditions to a nucleic acid sequence or complement thereof, of claim 1.

21. The isolated nucleic acid of claim 20 wherein the nucleic acid: a. encodes at least a portion of a human immunodeficiency virus (HTV) reverse transcriptase (RT); and b. is capable of hybridizing under standard hybridization conditions to a nucleic acid sequence or complement thereof, of claim 2.

22. The recombinant vector of claim 8, wherein the vector is a plasmid.

23. A prokaryotic host cell transformed with the vector of claim 22.

24. A eukaryotic host cell transformed with the vector of claim 22.

25. An isolated cell line comprising the nucleic acid of claim 3.

26. A method for generating crystallization variants of an HTV-RT- NNRTI complex, comprising the steps of: a. Truncating at least one terminus of HTV-RT; b. Reducing surface lysine acid regions; and c. Mutating at least one amino acid residue, thereby altering lattice contact from the non-mutated residue.

27. The method of claim 26, wherein step b comprises reducing surface glutamic acid regions.

28. The method of claim 26, wherein step b comprises mutating lysine to alanine.

29. The method of claim 27, wherein step b comprises mutating glutamic acid to alanine.

30. The method of claim 26, wherein step c is systematic mutagenesis.

31. The method of claim 26, wherein step c is achieved by methylated overlap extension ligation independent cloning.

32. The method of claim 26, further comprising the step of selecting mutant HTV- RT for enzymatic activity.

33. The method of claim 26, further comprising the step of crystallizing the mutant HTV-RT.

34. The method of claim 26, further comprising the step of minimizing mutation of conserved amino acid residues.

35. The method of claim 31 , further comprising the step of determining the three dimensional crystal structure of the mutant HfV-RT- NNRTI complex.

36. The HTV-RT- NNRTI complex produced by the method of claim 26.

37. The method of claim 26, wherein the NNRTI is a DAPY compound.

38. The method of claim 27, wherein the DAPY compound is selected from the group consisting of TMC278 and TMC125.

39. A method for identifying HTV-RT inhibitor solvent molecules comprising the steps of: a. Soaking a small molecule fragment into a crystallization variant generated by the method of claim 26, thereby forming an HIV-RT complex with the molecule; b. Determining three dimensional structure of the complex; and c. Determining HTV-RT enzyme activity.

Download Citation


Sign in to the Lens

Feedback