Pathogenic Protein

PATHOGENIC PROTEIN

This invention relates to pathogenic proteins, especially pathogenic proteins in bacteria, and to methods for identifying agents that modulate their activity or function.

The listing or discussion of a prior-published document in this specification should not necessarily be taken as an acknowledgement that the document is part of the state of the art or is common general knowledge.

Tuberculosis, caused by Mycobacterium tuberculosis (M. tuberculosis), has been declared a global emergency and is the most frequent infectious cause of mortality in the world after HIV. The emergence of many strains of M. tuberculosis resistant to the currently available chemotherapeutics, particularly rifampicin and isoniazid, is an additional cause of alarm. There is thus a desperate need for additional drugs against tuberculosis, and thus for the identification of new targets for drug discovery in M. tuberculosis.

The success of M. tuberculosis as a pathogen is largely attributed to its ability to adapt to the host environment. In particular, M. tuberculosis has proved most effective at adapting to long-term residence in macrophage phagosomes where the bacteria are believed to enter a non-replicating dormant phase which is resistant to antibiotics. Such adaptation involves the precise coordination of gene expression via the regulation of transcription. About 190 transcriptional regulators are believed to be encoded by the M. tuberculosis genome (Cole el al., 1998; Camus ef a/., 2002). Insights into this regulation and how it translates into bacterial adaptation may aid in developing new strategies and drugs for treating tuberculosis.

The application of microarray technology to the study of bacterial gene expression during infection has allowed genome-wide analysis of genes that are induced in M. tuberculosis during models of infection. For example, Schnappinger et al (2003) demonstrated that 601 M. tuberculosis genes were differentially expressed upon infection of murine bone marrow macrophages. Of these, 454 were induced and 147 were repressed. For a selected subset of 21 of these genes, the results were confirmed^ using quantitative reverse transcriptase polymerase chain reaction (RTq-PCR). This subset of genes was selected based on their known function or predicted importance. Some were predicted or known to be involved in transcriptional regulation, lipid metabolism, intermediary metabolism or iron acquisition; others were strongly up-regulated in macrophages activated with IFNy compared to naϊve macrophages. The results were confirmed both in cell culture in vitro and in infected tissues of a mouse model in vivo, and were found to be highly correlated. Thus, the transcriptional adaptation of M. tuberculosis to the phagosomal environment of bone marrow derived macrophages was thought to reflect the adaptations that occur in murine tuberculosis. However, other than the few genes specifically mentioned, Schnappinger et al does not provide any indication which of the

601 differentially-expressed M. tuberculosis genes may have a major regulatory role in the pathogenesis of tuberculosis.

Talaat et al (2004) used microarrays to measure gene expression levels in the M. tuberculosis genome from samples of M. tuberculosis in infected BALB/c (immune competent) mice, SCID (immuno-deficient) mice, and grown in broth in vitro. A set of 67 genes was identified as being activated only in M. tuberculosis from the BALB/c mice and not from the SCID mice at 21 days after infection. Of these, a subset of 33 genes was previously found to be up-regulated in macrophages in vitro (Schnappinger et al., 2003). However, other than the genes specifically mentioned, Talaat et al does not provide any indication which of the M. tuberculosis genes may have a major regulatory role in the pathogenesis of tuberculosis.

Comprehensive mutagenesis schemes have also provided insights into M. tuberculosis genes that may be essential for survival and infection. For example, Rengarajan et al (2005) identified 126 M. tuberculosis genes required for survival by screening for transposon mutants that fail to grow within primary macrophages. Many of these were found to be arranged in putative operons involved in a diversity of functions including transport, secretion apparatus and lipid degradation. However, other than the few genes specifically mentioned, Rengarajan et al does not provide any indication which of the 126 essential M. tuberculosis genes may have a major regulatory role in the pathogenesis of tuberculosis.

Sassetti et al (2003) adopted a mutagensis analysis and determined that 194 genes were specifically required for mycobacterial growth in mice. Many of these were unique to mycobacteria and closely-related species, indicating that many of . the strategies . , employed by this group of organisms are fundamentally different from other pathogens. However, other than the few genes specifically mentioned, Sassetti et al does not provide any indication which of the 194 essential mycobacterial genes may have a major regulatory role in pathogenesis. Van der Geize et al (2007) conducted a transcriptomic analysis of Rhodococcus sp strain RHA1 grown on cholesterol and delineated a range of genes necessary for steroid degradation, most of which are conserved in M. tuberculosis.

Although microarray analyses and comprehensive mutagenesis studies have advanced our understanding of gene regulation and function in mycobacteria, such as M. tuberculosis, these techniques have limitations which prevent many of the genes which are important for pathogenesis from being identified. For example, microarray analysis only measures gene expression, whereas for many genes regulation occurs at the level of translation or protein activity. Such regulated genes would not be identified in a microarray analysis. Moreover, although studying gene expression lends itself well to assessing bacterial responses to a host, it may only reflect short-term adaptations to changing host environments. Therefore, for those genes which are important in pathogenesis and whose expression is only briefly altered, the associated change in expression may not be detected depending on the sampling intervals chosen. A limitation of mutagenesis studies is that the survival measured during these experiments represents the cumulative effect of a mutation over time during prolonged intracellular growth. Mutants that are complemented in trans by either host or bacterial factors would not be detected. In addition, Rengarajan et al (2005) reported that there is little correlation between gene expression in macrophages and genes that have been identified as essential.

For each of these reasons, the only reliable and scientifically acceptable way to confirm whether any particular gene is involved in transcriptional regulation in mycobacteria, and in pathogenesis, is to undertake further extensive experiments to ascertain the role of that gene. Only after a gene has had its function elucidated, and its importance established, can the considerable time, effort and cost required to identify agents that regulate the activity or function of that gene, or of the protein that it encodes, be justified.

As part of a study to identify and assess the transcriptional regulators important for pathogenesis of M. tuberculosis, we conducted comparative genomic analyses which revealed that the Rv3574 gene is highly conserved within the Mycobacteria, as well as in closely related Nocardia. In all of these bacteria, Rv3574 and its orthologues are transcribed divergently from a putative acyl-CoA-dehydrogenase named fadE34 (in M. tuberculosis and in several other mycobacteria). This conservation indicates the potential importance of Rv3574.

Using the fast growing non-pathogen M. smegmatis as an experimental model we inactivated the Rv3574 orthologue MSMEG_6042 and examined the effect of this inactivation on genome-wide transcription using microarray analysis. Surprisingly, we identified a large number of genes that were de-repressed in the mutant, and which have orthologues in M. tuberculosis, indicating the importance of MSMEGJ5042 {Rv3574) in transcriptional regulation.

We have also inactivated Rv3574 in M. tuberculosis and have examined the effect of this inactivation on genome-wide transcription using microarray analysis. Similarly, we identified a large number of genes that were de-repressed in the M. tuberculosis mutant, confirming the importance of Rv3574 in M. tuberculosis.

We have characterised the Rv3574 DNA binding site by alignment of the Rv3574/fadE34 intergenic region from M. tuberculosis with the orthologous regions from other species, revealing a highly conserved 14 base pair region. Examination of the sequence showed that it is palindromic with the general consensus TAGAACNNGTTCTA (SEQ ID No: 1 ). We used this consensus sequence to search for additional genes controlled by Rv3574, and identified many instances similar to this consensus sequence in the control regions of M. tuberculosis and M. smegmatis genes, which we predicted to be genes that are repressed by Rv3574. The predictions were confirmed using RTq-PCR and electrophoretic mobility shift assays (EMSAs).

By combining these data, we have defined the Rv3574 regulon in M. smegmatis and M. tuberculosis as containing 83 genes listed in Table 4. Our microarray analysis of gene expression in M. tuberculosis defined the Rv3574 regulon in M. tuberculosis as containing 49 genes listed in Table 5. Many of these genes are involved in lipid metabolism.

From our experiments, the importance of Rv3574 in the transcriptional regulation of M. . tuberculosis has become clearly apparent. Since transcriptional regulation is an important mediator of pathogenesis by allowing bacteria to adapt to their host environment, we now consider that Rv3574 may be central to the pathogenesis of tuberculosis. This is strengthened by the fact that many of the genes in the M. tuberculosis regulon are involved in lipid metabolism and it has been suggested that M. tuberculosis uses fatty acids as a carbon source during in vivo growth (Schnappinger et al., 2003; McKinney et al., 2000).

Rv3574 is an M. tuberculosis gene that encodes a member of the TetR family (Cole et al., 1998; Camus et al., 2002), which is a family of transcriptional regulators with a helix- turn-helix DNA-binding motif that is well represented and widely distributed among bacteria (Ramos et al., 2005). Members of the TetR family are usually repressors and control genes with a diverse range of functions including multidrug resistance, biosynthesis of antibiotics, osmotic stress, breakdown of metabolic intermediates and pathogenicity of gram-negative and gram-positive bacteria (Ramos et al., 2005). The prototype of the TetR family is TetR from the Tn 10 transposon of E. coli which regulates the expression of a tetracycline efflux pump in gram negative bacteria and thereby confers resistance to tetracycline (Orth et al., 2000).

The binding of TetR repressors to DNA is regulated by an interaction between the TetR repressor and a small ligand. For example in the TetR system, expression of the efflux pump is repressed by TetR itself. In the presence of the antibiotic tetracycline, tetracycline binds to TetR inducing a cascade of conformational changes. As a result, the DNA-binding contacts of TetR are disrupted and repression is lifted; the efflux pump is subsequently expressed and tetracycline is pumped out of the cell. Similar on-off mechanisms allowing bacteria to respond to their environments are believed to be used in other TetR family members.

The natural ligand of Rv3574 that regulates the transcriptional activity of Rv3574 is currently unknown. Identifying this ligand or another agent that modulates the activity or function of Rv3574 (e.g. as a mimic or inhibitor of the natural ligand) would have significant implications as a potential target for drug discovery, and for potential therapeutic treatment of tuberculosis via use of inhibitors to the ligand. Similarly, identifying agents which modulate the activity of an Rv3574 orthologue in a nonpathogenic bacterium would also have industrial importance. For example, Rhodococci are soil dwelling bacteria closely related to the Mycobacteria which degrade a broad range of organic compounds. Accelerating or manipulating this degradation may have a significant commercial value. Accordingly, a first aspect of the invention thus provides a method of identifying an agent which modulates at least one activity or function of the M. tuberculosis protein Rv3574 or an orthologue thereof from an actinomycete, the method comprising: providing M. tuberculosis Rv3574 or an orthologue thereof from an actinomycete; providing double stranded DNA (dsDNA) comprising the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) under conditions which allow the Rv3574 or orthologue thereof to bind to the dsDNA, wherein, independently,

X1 is C, G or T, X2 is A, C1 G or T,

X3 is A, C, G or T, X4 is A, C, G or T, X5 is A, C, G or T, X6 is G or T, X7 is A, C, G or T,

X8 is A, C, G or T, and X9 is A, G or T; providing a test agent; and determining whether the test agent modulates at least one activity or function of the Rv3574 or the orthologue thereof.

Preferably, the Rv3574 protein is M. tuberculosis Rv3574 (strain H37Rv) which has the sequence listed in Figure 2 (SEQ ID No: 3), and can be found in the TubercuList database (http://genolist.pasteur.fr/TubercuList).

It is well known that certain polypeptides are polymorphic, and it is appreciated that some natural variation of this sequence may occur. Thus, in an embodiment, the invention is not limited to the M. tuberculosis Rv3574 protein having the sequence listed in Figure 2

(SEQ ID No: 3), but includes naturally occurring variants thereof in which one or more of the amino acid residues have been replaced with another amino acid. In particular, the invention includes M. tuberculosis Rv3574 protein from strains other than H37Rv, in particular strain CDC1551 and strain Erdmann. The sequence of Rv3574 from

M- tuberculosis strain CDC 1551 is identical to that from H37Rv, but it is annotated as starting 132 bp upstream of the H37Rv Rv3574. By M. tuberculosis Rv3574 we also include functional variants thereof. By a "functional" variant of Rv3574 we include a variant that has the ability to bind to dsDNA comprising the sequence TAGAACNN GTTCTA (SEQ ID No: 1 ). Numerous methods of determining protein binding to DNA are well known in the art, some of which, e.g., EMSA, are described herein.

The skilled person will readily understand that it is possible to vary the amino acid residues at non-essential positions within the Rv3574 sequence without affecting its DNA binding activity. Variations include insertions, deletions and substitutions, either conservative or non-conservative. By "conservative substitutions" is intended combinations such as GIy, Ala; VaI, lie, Leu; Asp, GIu; Asn, GIn; Ser, Thr; Lys, Arg; and Phe, Tyr. Thus the invention also includes the use of a modified Rv3574 variant in the screening methods.

It is preferred if the functional variant of Rv3574 has at least 90% sequence identity, more preferably at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the M. tuberculosis Rv3574 amino acid sequence listed in Figure 2 (SEQ ID No: 3).

If the functional variant of Rv3574 has any sequence variation within the putative DNA- binding domain illustrated in Figure 2, it is preferred if the variant has at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the M. tuberculosis Rv3574 amino acid sequence over this region. It is particularly preferred that the 7 residues of Rv3574 that are indicated in Figure 2 as being ones that are in direct contact with the DNA are unchanged from the Rv3574 sequence in Figure 2 (SEQ ID No: 3).

M. tuberculosis is a Gram-positive bacterium in the genus Mycobacterium, class Actinobacteria, order Actinomycetales, and family Mycobacteriaceae. Other than M. tuberculosis, many bacteria within the order Actinomycetales are involved in pathogenesis or are otherwise commercially important. Therefore, it would be useful to identify agents which modulate at least one activity or function of Rv3574 orthologues within these related bacteria. Even in non-pathogenic bacteria, the ability to modulate transcriptional regulation may have industrial application, for example in the degradation of organic compounds by M. smegmatis and Rhodococcus sp. Strain RHA1. Thus the invention includes the use of Rv3574 orthologues from other actinomycetes in a method of identifying an agent that modulates the activity or function of the orthologue.

Examples of suitable Rv3574 orthologues include M. bovis Mb3605 (SEQ ID No: 4), M. marinum MM5069 (SEQ ID No: 5), M. ulcerans MUL4145 (SEQ ID No: 6), M. avium subsp. paratuberculosis MAP0491c (SEQ ID No: 7), M. smegmatis MSEG_6042 (SEQ

ID No: 8) and Nocardia farcinica nfa4470 (SEQ ID No: 9), the sequences of which are listed in Figure 2. These orthologues of Rv3574 in M. bovis, M. marinum, M. ulcerans,

M. avium subsp. paratuberculosis, M. smegmatis and N. farcinica have 100%, 95%,

95%, 84%, 89% and 70%, amino acid sequence identity with Rv3574, respectively.

Another suitable orthologue is the Rhodococcus sp. strain RHA1 kstR protein, the sequence of which is listed in Figure 12 (SEQ ID No: 10) and which has 69% sequence identity with M. tuberculosis Rv3574.

By an M. tuberculosis Rv3574 orthologue, we also include functional variants thereof. By a "functional" variant of an M. tuberculosis Rv3574 orthologue we include a variant that has the ability to bind to dsDNA comprising the sequence TAGAACNNGTTCTA (SEQ ID No: 1).

In the embodiment when the orthologue is one from a bacteria mentioned above with a sequence listed in Figure 2, it is preferred if the functional variant of the orthologue has at least 90% sequence identity, more preferably at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the orthologue sequence listed in Figure 2. Moreover, if the functional variant of a M. tuberculosis Rv3574 orthologue listed in Figure 2 has any sequence variation within the illustrated putative DNA-binding domain, it is preferred if the variant has at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the orthologue sequence over this region.

Due to the exceptionally high level of sequence identity between Rv3574 orthologues across the order Actinomycetales, further orthologues can readily be identified by a person of skill in the art. By an orthologue of Rv3574 we include orthologues in actinomycetes which have at least 60% sequence identity with the M. tuberculosis Rv3574 sequence listed in Figure 2 (SEQ ID No: 3). Preferably, the orthologous protein has at least 65%, 70%, 75%, 80%, 85% or 90% sequence identity with the M. tuberculosis Rv3574 sequence (SEQ ID No: 3), and more preferably at least 95% sequence identity. It is still further preferred if the Rv3574 homologue thereof has an amino acid sequence which has at least 90% sequence identity with the DNA binding domain of M. tuberculosis Rv3574 indicated in Figure 2, and more preferably at least 95% or more.

It is preferred if the orthologue of Rv3574 is an orthologue from a Mycobacterium, a Nocardia (Nocardioides), a Rhodococcus or a Streptomycete.

In an embodiment, the orthologue of Rv3574 is not one from a Corynebacteria.

In the embodiment when the orthologue is from a bacteria other than those mentioned above whose Rv3574 sequence is listed in Figure 2, it is preferred if the functional variant of the orthologue has at least 90% sequence identity, more preferably at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the naturally occurring orthologue sequence. Moreover, if the sequence variation is within the putative DNA-binding domain of the M. tuberculosis Rv3574 orthologue, it is preferred if the variant has at least 95%, or at least 96% sequence identity, and still more preferably at least 97%, or at least 98%, or at least 99% sequence identity with the naturally occurring orthologue sequence over this region.

The percent sequence identity between two polypeptides may be determined using any suitable computer program, for example the GAP program of the University of Wisconsin Genetic Computing Group and it will be appreciated that percent identity is calculated in relation to polypeptides whose sequence has been aligned optimally. The alignment may alternatively be carried out using the Clustal W program (Thompson et al., 1994). The parameters used may be as follows: Fast pairwise alignment parameters: K- tuple(word) size; 1, window size; 5, gap penalty; 3, number of top diagonals; 5. Scoring method: x percent. Multiple alignment parameters: gap open penalty; 10, gap extension penalty; 0.05. Scoring matrix: BLOSUM.

The M. tuberculosis Rv3574 or orthologue thereof may be produced using recombinant technology. The recombinant M. tuberculosis Rv3574 or orthologue thereof used in the method may comprise a GST portion or may be biotinylated or otherwise tagged, for example with a 6His, HA, myc or other epitope tag, as known to those skilled in the art. This may be useful in purifying and/or detecting the M. tuberculosis Rv3574 or orthologue thereof. Techniques for cloning, manipulation, modification and expression of nucleic acids, including protein engineering and site-directed mutagenesis and purification of expressed proteins, are very well known in the art and are described for example in Sambrook ef a/(2001).

Alternatively, the Rv3574 or orthologue thereof maybe produced by extracting endogenous Rv3574 or the orthologue thereof from bacteria.

A comparison of the Rv3574 binding sites from the M. tuberculosis and M. smegmatis genes in the Rv3574 regulon identified a number of preferences for the nucleotides within the core dsDNA sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2).

Accordingly, in an embodiment, each independently,

Xi is preferably T, X6 is preferably T, and

X9 is preferably A.

In an embodiment, the dsDNA may comprise the consensus sequence TX2X3AACX4X5GTTX7X8A (SEQ ID No: 11 ), wherein X2, X3, X4, X5, X7, and X8 are as defined above.

In another embodiment, the dsDNA comprises the sequence X1X2X3AACX4X5GTX5X7X8X9 (SEQ ID No: 12), wherein, independently,

X1 Js C, G or T, X2 is A or G,

X3 is A, C, G or T,

X4 is A, C, G or T, but preferably A, C or G,

X5 is A, C, G or T, but preferably C, G or T,

X6 is G or T, X7 is A, C, G or T,

X8 is C or T, and

X9 is A, G or T.

In a more specific embodiment, each independently, X1 is preferably T

X2 is preferably A X3 is preferably G X4 is preferably A or G X5 is preferably T X6 is preferably T X7 is preferably C1 and

X9 is preferably A.

It is preferred if the nucleotide preferences are such that they retain the palindromic format of the consensus sequence.

Preferably when X1 is T, X9 is A.

Preferably when X2 is A, X8 is T.

Preferably when X2 is G, X8 is C.

Preferably when X3 is G1 X7 is C.

However, the inventors consider that the Rv3574 binding motif may not always need to be exactly palindromic since the two halves of the motif are likely to be bound by separate proteins in an Rv3574 dimer. Thus when the dsDNA is not exactly palindromic, it is sufficient that each half of the sequence allows binding to M. tuberculosis Rv3574 or the orthologue thereof.

The dsDNA may comprise a sequence selected from the group consisting of TAGAACATGTTCCA (SEQ ID NO: 13), TAGAACATGTTCTA (SEQ ID No: 14), TAGAACGTGTTCCA (SEQ ID NO: 15) and TAGAACGTGTTCTA (SEQ ID No: 16).

In another embodiment, the dsDNA may comprise the sequence TAGAACX4X5GTTCTA (SEQ ID No: 1 ), or a variant thereof which contains one or two alternative nucleotides at any of the positions other than X4 and X5.

Further analysis of the Rv3574 binding motifs located within the M. tuberculosis genome identified a number of preferences for the nucleotides within the core dsDNA sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2). Accordingly, in an embodiment the Rv3574 is from M. tuberculosis and the dsDNA comprises the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 18), wherein, independently,

X1 is G or T, X2 is A or G1

X3 is A, G or T,

X4 is A, C, G or T, but preferably A, C and G,

X5 is A, C, G or T, but preferably C or T,

X6 is G or T, X7 is A, C or G,

X8 is C or T, and

X9 is A, G or T.

In a more specific embodiment the Rv3574 is from M. tuberculosis and, each independently,

X1 is preferably T, X2 is preferably A, X3 is preferably G, X4 is preferably G, X5 is preferably T1

X6 is preferably G or T1 X7 is preferably C, X8 is preferably C or T, and X9 is preferably A.

Typically when the Rv3574 is from M. tuberculosis the dsDNA may comprise the sequence TAGAACGTGTTCCA (SEQ ID No: 15) or the sequence TAGAACGTGTTCTA (SEQ ID No: 16).

Further analysis of the Rv3574 binding motifs within the M. smegmatis genome identified a number of preferences for the nucleotides within the core dsDNA sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2).

Accordingly, in an embodiment the Rv3574 orthologue is from M. smegmatis and the dsDNA comprises the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 19), wherein, independently, Xi is preferably T,

X2 is preferably A,

X3 is preferably G,

X4 is A, C, G or T, but preferably A or G, X5 is A, C, G or T, but preferably C or T,

X6 is preferably T,

X7 is preferably C,

X8 is preferably C or T, and

Xg is preferably A.

Typically when the Rv3574 orthologue is from M. smegmatis the dsDNA may comprise a sequence selected from the group consisting of TAGAACACGTTCCA (SEQ ID No: 20), TAGAACACGTTCTA (SEQ ID NO: 21 ), TAGAACATGTTCCA (SEQ ID No: 13), TAGAACATGTTCTA (SEQ ID NO: 14), TAGAACGCGTTCCA (SEQ ID No: 22), TAGAACGCGTTCTA (SEQ ID NO: 23), TAGAACGTGTTCCA (SEQ ID No: 15) and TAGAACGTGTTCTA (SEQ ID NO: 16).

Further analysis of the Rv3574 binding motifs within the M. tuberculosis and M. smegmatis genomes identified a number of preferences for the nucleotides immediately adjacent to the core X1X2X3AACX4X5GTX6X7X8X9 motif (SEQ ID No: 2). Thus, preferably, the dsDNA comprises the sequence X1OXIX2X3AACX4X5GTX6X7X8X9X11X12 (SEQ ID NO: 24) wherein X1-X9 are as defined above, and wherein, independently, X10 is A, C or T, Xn is A, C, T or G, and X12 is A, C or T.

In an embodiment, each independently, X10 is preferably C X11 is preferably G and X12 is preferably T.

More preferably, when Xi0 is C, Xi1 is G.

Thus, in an embodiment, the dsDNA may comprise a sequence selected from the group consisting of CTAGAACACGTTCCAGT (SEQ ID No: 25), CTAGAACACGTTCTAGT

(SEQ ID No: 26), CTAGAACATGTTCCAGT (SEQ ID No: 27), CTAGAACATGTTCTAGT (SEQ ID No: 28), CTAGAACGCGTTCCAGT (SEQ ID No: 29), CTAGAACGCGTTCTAGT (SEQ ID NO: 30), CTAGAACGTGTTCCAGT (SEQ ID No: 31) and CTAGAACGTGTTCTAGT (SEQ ID NO: 32).

We have also identified a preference for further adjacent nucleotides. Thus, preferably, the dsDNA comprises the sequence X13X1OXiX2X3AACX4X5GTX6X7X8XgXIiXIa (SEQ ID No: 33) wherein XrX12 are as defined above, and wherein X13 is A, G or T.

In an embodiment, X13 is preferably A.

Preferably, when X13 is A, X12 is T.

Thus in an embodiment, the dsDNA comprises the sequence selected from the group consisting of:

ACTAGAACACGTTCCAGT (SEQ ID NO: 34)

ACTAGAACACGTTCTAGT (SEQ ID NO: 35)

ACTAGAACATGTTCCAGT (SEQ ID NO: 36)

ACTAGAACATGTTCTAGT (SEQ ID NO: 37) ACTAGAACGCGTTCCAGT (SEQ ID NO: 38)

ACTAGAACGCGTTCTAGT (SEQ ID NO: 39)

ACTAGAACGTGTTCCAGT (SEQ ID NO: 40) and

ACTAGAACGTGTTCTAGT (SEQ ID NO: 41 ).

In a more specific embodiment, the dsDNA comprises the sequence TGCCCACTAGAACGTGTTCTAATAGTGCT (SEQ ID NO: 42).

In another embodiment the dsDNA comprises the sequence CNNNC[No-6]TAGAACGTGTTCTAATANNNNNNNNGNNCNNNNGTC AAGNNNNNNNNNNTNNNNNC (SEQ ID NO: 43), wherein each N independently may be A, C, G or T.

Where the Ns in the above sequence fall between the -10 and -35 positions upstream of the transcription start site, the precise number of Ns maybe important to maintain the spacing between the -10 and -35 elements for prokaryotic transcription. In other circumstances, the actual number of Ns may be less important. In another embodiment the dsDNA comprises a consecutive nucleotide sequence of at least 14 base pairs which is part of the control region of a gene selected from the 83 genes listed in Table 4. By control region we mean a 100 base pair section of DNA immediately upstream (5') of the protein coding region of the gene.

It is also preferred if the dsDNA is at least 15 base pairs in length which is the minimum length of dsDNA that is bound by TetR. The dsDNA may be 16 or 17 or 18 or 19 base pairs in length. More preferably the dsDNA is at least 20 base pairs in length. It is widely appreciated in the art that the DNA fragment used in certain DNA binding assays should be 20 base pairs (bp) or longer, with the recognition sequence at least 4 base pairs from each end of the fragment (Sambrook et al., 2001 ). Thus the dsDNA may be at least 21 bp or at least 22 bp or at least 23 bp or at least 24 bp or at least 25 bp or at least 30 bp or at least 40 bp or at least 50 bp in length. In one specific embodiment, the dsDNA is 29 bp in length.

In an embodiment, it is preferred if the dsDNA is no more than 100 base pairs in length, and preferably no more than 50 bp in length. Typically, the length of dsDNA used in some types of DNA binding assays ranges from 20-100 base pairs (Sambrook et al., 2001 ).

It is appreciated that the dsDNA of the present invention can be produced by a variety of methods. The dsDNA may be generated by de novo synthesis of two complementary strands which are subsequently annealed together (Sambrook et al., 2001 ). Alternatively, the dsDNA may be produced by the polymerase chain reaction (PCR)1 whereby the specific DNA sequence is enzymatically amplified using two specific primers which flank the target DNA and which themselves become incorporated into the amplified DNA (Saiki et al., 1988). In this method, it is appreciated that the target DNA can be contained within a \/ector derived from a genomic clone. Examples of such vectors include yeast artificial chromosomes, bacterial artificial chromosomes, bacteriophage P1 vectors, P1 artificial chromosomes and cosmids (Sambrook et al., 2001 ). Alternatively, the dsDNA may be a restriction fragment obtained by treating a vector containing the specific DNA sequence with appropriate restriction enzymes. Alternatively, the dsDNA may be contained within a vector itself. Vectors can be propagated by replication in a suitable host and subsequently isolated. Suitable hosts for the propagation of particular vectors and methods of isolating such vectors are very well known in the art.

Conditions suitable for binding of a protein such as Rv3574 to dsDNA are well known in the art and are described by Sambrook et al (2001), and in Example 1. A suitable buffering composition might include 20 mM HEPES-potassium hydroxide (pH 7.9)/50 mM potassium chloride/2 mM magnesium chloride/0-4 mM spermidine/0-0.2 MM zinc acetate/0.1 μg/ml bovine serum albumin/10% (v/v) glycerol and 0.5 mM dithiothreitol

(Sambrook et al., 2001 ). Another suitable composition would include 15 mM HEPES- sodium hydroxide (pH 7.9)/50mM potassium chloride/50 mM potassium glutamate/50 mM potassium acetate/5 mM magnesium chloride/20 μM zinc sulphate/2 μg/ml bovine serum albumin/5% (v/v) glycerol and 0.1% (w/v) NP-40 (Greisman and Pabo, 1997;

Wolfe et al., 2000).

Since many of the genes we identified as being regulated by M. tuberculosis Rv3574 are known or predicted to be involved in lipid metabolism, in one embodiment, the test agent may be a lipid. The lipid may be selected from the group consisting of a fatty acid, a mycolic acid, a trehalose-6,6-dimycolate, a glycerophospholipid, a triglyceride, a phosphatidylinositol maπnoside, a phospholipid, a sphingolip'id or a steroid, e.g. a sterol. It is appreciated that the lipid may be saturated or unsaturated and may be subject to one or more modifications including methylation.

Examples of suitable saturated fatty acids include butyric acid (C2:0), caproic acid (C4:0), caprylic acid (C8:0), capric acid (C10:0), lauric acid (C12:0), myristic acid (C14:0), palmitic acid (C16:0), stearic acid (C18:0), arachidic acid (C20:0), and behenic acid (C22:0), or derivatives thereof such as acyl-CoA derivatives.

Examples of suitable unsaturated fatty acids include oleic acid (C18:1 ), linoleic acid (C18:2), alpha-linoeic acid (C18:3), arachidonic acid (C20:4), eicosapentaenoic acid (C20:5), docosahexaenoic acid (C22:6), and erucic acid (C22:2) or derivatives thereof such as acyl-CoA derivatives. It is appreciated that the carbon atoms in the chain adjacent to either side of the double bond can occur in either the cis or trans configuration. We have demonstrated that the expression of the M. smegmatis gene MSMEG_6038 is induced by cholesterol. Thus, in one embodiment, the test agent may be cholesterol or a derivative thereof.

In another particular embodiment the test agent may be palmitate/palmitic acid, or a derivative thereof such as an acyl-CoA derivative.

The test agent may also be a breakdown product of a fatty acid, such as acetyl-CoA and propionyl-CoA.

The test agent may also be a ketosteroid, such as 4-androstene-3,17-dione (AD) or 3- hydroxyl-9, 10-seconandrost-1 ,3,5(10)-triene-9, 17-dione (3-HSA).

We have shown that one of the activities or functions of Rv3574 and its orthologues in other mycobacteria is the ability to bind to the dsDNA sequences defined above. Thus the step of determining whether the test agent modulates at least one activity or function of M. tuberculosis Rv3574 or the orthologue thereof may comprise determining whether, and optionally to what extent, the test agent modulates binding of the Rv3574 or the orthologue thereof to the dsDNA.

TetR proteins are normally repressors of gene expression, with their ligands usually causing de-repression (Ramos et al., 2005). Therefore, the test agent may be one that causes de-repression by preventing or disrupting binding of the M. tuberculosis Rv3574 or orthologue thereof to the dsDNA. For example, the test agent may prevent binding to the dsDNA by directly blocking the DNA binding site of the M. tuberculosis Rv3574 or orthologue thereof. Alternatively, the test agent may bind to the M. tuberculosis Rv3574 or orthologue thereof and induce a conformational change which prevents or decreases binding to the dsDNA.

In order to distinguish between these two possibilities the Rv3574 or orthologue thereof, the dsDNA and the test agent may be combined in different orders. For example, to determine whether the test agent prevents binding of the Rv3574 or the orthologue thereof to the dsDNA, the test agent is typically contacted with the dsDNA before the Rv3574 or orthologue thereof. Alternatively, the test agent may be provided after the Rv3574 or orthologue thereof has been allowed to bind to the dsDNA. In this instance, the test agent may disrupt or decrease binding between the Rv3574 or orthologue thereof and the dsDNA, or may enhance the binding between the Rv3574 or orthologue thereof and the dsDNA.

Several techniques are available in the art to detect and measure DNA-protein binding which are suitable for use in the present invention. One such technique used in Example 1 is the EMSA or gel-shift assay. It is routinely used to follow the purification of DNA binding proteins, to establish affinity binding constants and to study protein-protein assemblies on gene sequences (Sambrook et al., 2001 ; Murphy et al., 2001 ). The assay relies on the premise that protein-DNA complexes migrate more slowly through a non- denaturing polyacrylamide gel than free DNA fragments; that is, they have a different electrophoretic mobility. Typically, the M. tuberculosis Rv3574 or orthologue thereof is incubated with the dsDNA comprising the sequence defined above. Typically, the DNA is labelled with a marker (e.g. 32P), whereas the M. tuberculosis Rv3574 or orthologue thereof will be unlabelled. The reaction products are then analysed by electrophoresis and a difference in electrophoretic mobility is indicative of a protein-DNA interaction. To determine the effect of the test agent on this interaction, the agent is either incubated with the M. tuberculosis Rv3574 protein or orthologue thereof before addition to the DNA1 or incubated with the DNA before addition of the protein. The affinity and specificity of this protein-DNA interaction can be further determined by conducting competition experiments using DNA fragments containing a binding site for M. tuberculosis Rv3574 or orthologues thereof or other unrelated DNA sequences.

A variant of the EMSA is the supershift-EMSA which can also be used to detect protein- DNA binding by using specific antibodies. The supershift-EMSA is the same as the EMSA but with the addition of an antibody against the specific protein (i.e. M. tuberculosis Rv3574 or orthologue thereof), to the reaction mixture before, during, or after formation of the protein-DNA complex (Sambrook et al., 2001 ; Kako et al., 1998). Following electrophoretic separation , binding of the antibody to the protein causes the mobility of the complex to shift to a larger size ("supershift") due to the formation of a ternary complex between the antibody, DNA-binding protein and the DNA probe.

Enzyme-linked immunosorbent assay (ELISA) techniques have also been used to allow the detection of protein-DNA binding (Shen ef al., 2002). Briefly, DNA fragments comprising the sequence defined above are immobilised onto a solid phase such as the wells of a 96-well polystyrene plate. The sample containing a purified protein, or a complex mixture of proteins (such as nuclear or whole cell extract preparations) is then incubated in the well and non-bound components of the sample removed by washing. Finally, an antibody specific for the putative bound protein is added and the protein- antibody complex detected. Binding of the antibody can be accomplished and detected using standard ELISA techniques with colorimetric, fluorescent, or chemiluminescent detection (Sambrook et a/, 2001).

Another technique useful for determining protein-DNA binding is DNAse footprinting (Sambrook et al., 2001 ). This is based on the observation that a protein bound to DNA will often protect the DNA from enzymatic cleavage. Radiolabeled DNA is cut by the enzyme deoxyribonuclease (DNAse) and the fragments analysed by electrophoresis to detect the resulting cleavage pattern. The cleavage pattern of the DNA in the presence of the Rv3574 or orthologue thereof is compared to the cleavage pattern in the presence or absence of the test agent. If the test agent affects binding of the M. tuberculosis

Rv3574 or the orthologue to the dsDNA, the detected "footprint" will be different.

Yet another technique that can be used to determine protein-DNA binding is a surface plasma resonance (SPR) assay as described in Plant et al (1995). In this approach, the dsDNA comprising the sequence defined above is secured to a flat sensor chip in a flow chamber, after which a solution containing the DNA binding protein, i.e. M. tuberculosis Rv3574 or an orthologue thereof is passed over the DNA in a continuous flow. Light is directed at a defined angle across the chip and the resonance angle of reflected light measured. A protein-DNA interaction causes this angle to change. By measuring the change of this angle over time, in the presence and absence of the M. tuberculosis Rv3574 or orthologue thereof, equilibrium constants can be determined and on and off rates estimated. The test agent can then be added, either to the DNA before the M. tuberculosis Rv3574 or the orthologue, or to the M. tuberculosis Rv3574 or the orthologue before its addition to the DNA, and its effect on DNA binding can be determined.

It is appreciated that the test agent may either enhance, reduce or eliminate binding of the M. tuberculosis Rv3574 or an orthologue thereof to the DNA. Since Rv3574 is a TetR protein, which are considered to be repressors of gene expression (Ramos et al., 2005), enhanced binding of the M. tuberculosis Rv3574 or an orthologue thereof to the DNA may enhance repression of gene expression. Thus, in an embodiment, the method comprises determining whether the test agent enhances Rv3574 binding to the dsDNA, wherein an agent that enhances Rv3574 binding to the dsDNA may act as a permanent repressor of a gene controlled by Rv3574 or the orthologue, such as those genes listed in Table 4.

By contrast, if the test agent reduces or eliminates binding of the M. tuberculosis Rv3574 or the orthologue thereof to the DNA, gene expression may be de-repressed.

In another embodiment, the method comprises determining whether the test agent decreases or eliminates Rv3574 binding to the dsDNA, wherein an agent that decreases or eliminates Rv3574 binding to the dsDNA may act as a de-repressor of a gene controlled by Rv3574 or the orthologue, such as those genes listed in Table 4.

Another activity or function of Rv3574 or the orthologue thereof that may be affected by a test agent is the expression of a reporter gene operably linked to the dsDNA. Thus, in one embodiment, the step of determining whether the test agent modulates at least one activity of function of M. tuberculosis Rv3574 or an orthologue thereof comprises determining whether the test agent modulates expression of a reporter gene operably linked to the dsDNA.

In those embodiments it is appreciated that the dsDNA may be longer than 100 bp to allow for the length of the reporter gene.

By a reporter gene we include genes which encode a reporter protein whose activity may easily be assayed, for example β-galactosidase, chloramphenicol acetyl transferase (CAT) gene, luciferase or Green Fluorescent Protein (see, for example, Tan et al., 1996).

The reporter gene may be fatal to the cells, or alternatively may allow cells to survive under otherwise fatal conditions. Cell survival can then be measured, for example using colorimetric assays for mitochondrial activity, such as reduction of WST-1 (Boehringer). WST-1 is a formosan dye that undergoes a change in absorbance on receiving electrons via succinate dehydrogenase. By a reporter gene we also include a gene whose expression is controlled by M. tuberculosis Rv3574 or an orthologue thereof, such as those genes listed in Table 4 and Table 5.

Several techniques are available in the art to detect and measure expression of a reporter gene which would be suitable for use in the present invention. Many of these are available in kits both for determining expression in vitro and in vivo.

For example, levels of mRNA transcribed from a reporter gene can be assayed using RT-PCR. The specific mRNA is reverse transcribed into DNA which is then amplified such that the final DNA concentration is proportional to the initial concentration of target mRNA.

Levels of expression can also be determined by measuring the concentration of protein encoded by the reporter gene. Assaying protein levels in a biological sample can occur using any suitable method. For example, protein concentration can be studied by a range of antibody based methods including immunoassays, such as ELISAs and radioimmunoassays. In one such assay, a protein-specific monoclonal antibody can be used both as an immunoadsorbent and as an enzyme-labelled probe to detect and quantify a specific protein. The amount of the protein present in the sample can be calculated by reference to the amount present in a standard preparation using a linear regression computer algorithm. In another ELISA assay, two distinct specific monoclonal antibodies can be used to detect the specific protein. In this assay, one of the antibodies is used as the immunoadsorbent (primary antibody) and the other as the enzyme- labelled probe (secondary antibody).

Suitable enzyme labels include those from the oxidase group, which catalyze the production of hydrogen peroxide by reacting with substrate. Glucose oxidase is particularly preferred as it has good stability and its substrate (glucose) is readily available. Activity of an oxidase label may be assayed by measuring the concentration of hydrogen peroxide formed by the enzyme-labeled antibody/substrate reaction. Besides enzymes, other suitable labels include radioisotopes such as iodine (125I, 121I), carbon (14C), sulfur (35S), tritium (3H), indium (112In), and technetium (99mTc), and fluorescent labels such as fluorescein and rhodamine, and biotin.

The concentration of a specific protein expressed by a marker gene may also be detected in vivo by imaging, for example when testing an agent in a mouse model of tuberculosis.

Many of the genes that we have shown or predicted to be regulated by M. tuberculosis RV3574, listed in Table 4 or Table 5, are enzymes. Therefore, determining the expression of a reporter gene may comprise measuring the activity of an enzyme encoded by a suitable reporter gene such as those listed in Table 4 or Table 5. Enzyme assays typically measure either the consumption of substrate or production of product over time. It is appreciated that a large range of methods exist for determining the concentrations of substrates and products such that many enzymes can be assayed in several different ways as is well known in the art (e.g. Bergmeyer (1974)).

We have demonstrated that M. tuberculosis Rv3574 is involved in the regulation of a set of 49 genes listed in Table 5 and therefore we believe it may be central to the pathogenesis. Modulating this regulation may therefore have therapeutic benefits for the treatment of tuberculosis. Modulating the gene regulation by orthologues of Rv3574 may also have industrial application, for example in the degradation of organic compounds by M. smegmatis. It is therefore desirable to identify agents which affect the ability of M. tuberculosis Rv3574 or an orthologue thereof to regulate expression.

Since Rv3574 encodes a TetR-type repressor, it is appreciated that an agent which causes a decrease in expression of a reporter gene may be one that enhances the repression in vivo. Thus in one embodiment the method involves identifying a test agent which decreases the expression of a reporter gene, wherein a test agent that decreases expression of the reporter gene may act as a permanent repressor of a gene whose expression is controlled by M. tuberculosis Rv3574 or an orthologue thereof, such as those genes listed in Table 4 or Table 5.

It is further appreciated that an agent which causes an increase in expression of a reporter gene may be one that decreases the repression in vivo. Thus in an alternative embodiment the method involves identifying a test agent which enhances the expression of a reporter gene, and wherein a test agent that enhances the expression of the reporter gene may act as a de-repressor of a gene whose expression is controlled by M. tuberculosis Rv3574 or an orthologue thereof, such as those genes listed in Table 4 or Table 5. In one embodiment the method is performed in vitro. By in vitro we include both cell-free assays and cell-based assays. For example, the method may be performed in isolated macrophages. The method could be performed in an actinomycete, preferably selected from the Mycobacteria, the Nocardia (Nocardioides), the Rhodococci and the Streptomycetes. In a further example,- the method could be performed in any bacteria or cell line that can be easily manipulated within a laboratory e.g. Escherichia coli, Streptomyces and Corynebacteria.

In an alternative embodiment, the method is performed in vivo, for example in a mouse model of tuberculosis.

The above aspect of the invention includes screening methods to identify drugs or lead compounds of use in treating a disease or condition caused by the bacterium (e.g. tuberculosis). It is appreciated that screening assays which are capable of high throughput operation are particularly preferred.

It is appreciated that in the methods described herein, which may be drug screening methods, a term well known to those skilled in the art, the test agent may be a drug-like compound or lead compound for the development of a drug-like compound.

The term "drug-like compound" is well known to those skilled in the art, and may include the meaning of a compound that has characteristics that may make it suitable for use in medicine, for example as the active ingredient in a medicament. Thus, for example, a drug-like compound may be a molecule that may be synthesised by the techniques of organic chemistry, less preferably by techniques of molecular biology or biochemistry, and is preferably a small molecule, which may be of less than 5000 daltons and which may be water-soluble. A drug-like compound may additionally exhibit features of selective interaction with a particular protein or proteins and be bioavailabie and/or able to penetrate target cellular membranes or the blood:brain barrier, but it will be appreciated that these features are not essential.

The term "lead compound" is similarly well known to those skilled in the art, and may include the meaning that the compound, whilst not itself suitable for use as a drug (for example because it is only weakly potent against its intended target, non-selective in its action, unstable, poorly soluble, difficult to synthesise or has poor bioavailability) may provide a starting-point for the design of other compounds that may have more desirable characteristics.

Thus in one embodiment the method further comprises modifying a test agent which has been shown to modulate at least one activity or function of the Rv3574 protein or the orthologue thereof, and testing the ability of the modified agent to modulate at least one activity or function of the M. tuberculosis Rv3574 protein or the orthologue thereof.

The method may further comprise testing the ability of the modified agent to modulate at least one activity or function of M. tuberculosis Rv3574 or an orthologue thereof in a Mycobacterium or a Nocardia (Nocardiodes) or a Rhodococcus or a Streptomycete.

The method may further comprise testing the ability of the modified agent to modulate at least one activity or function of M. tuberculosis Rv3574 or an orthologue thereof in M. tuberculosis or M. smegmatis.

Once a candidate agent which modulates at least one activity or function of Rv3574 or the orthologue thereof has been identified, it may be desirable to test its effect in a suitable model in vivo. For example, the method may comprise testing the ability of the test agent or modified agent to modulate at least one activity or function of M. tuberculosis Rv3574 or an orthologue thereof in an in vivo model of a disease or condition caused by a pathogenic actinomycete. Preferably, the actinomycete is M. tuberculosis. A suitable experimental model of tuberculosis is the experimental infection of a mouse with M. tuberculosis administered intravenously, intranasally or by aerosol. Bacterial numbers are measured in the lung, liver and spleen. Details vary according to route of administration, mouse strain, bacterial dose. Flynn (2006) provides a review of this and other suitable animal models.

The method may further comprise determining whether the test agent or modified agent modulates or affects disease severity, duration or progression in the in vivo model of the disease or condition caused by a pathogenic actinomycete, sich as tuberculosis.

In a further embodiment, the method may also comprise the step of formulating an agent which has the ability to modulate at least one activity or function of M. tuberculosis Rv3574 or an orthologue thereof into a pharmaceutically acceptable composition. Therefore, the invention also includes a pharmaceutical composition comprising an agent which has the ability to modulate at least activity or function of M. tuberculosis Rv3574 or an orthologue thereof that has been identified as described above, and a pharmaceutically acceptable carrier, diluent or excipient.

While it is possible for an agent to be administered alone, it is preferable to present it as a pharmaceutical formulation, together with one or more acceptable carriers. The carriers) must be "acceptable" in the sense of being compatible with the agent of the invention and not deleterious to the recipients thereof. Typically, the carriers will be water or saline which will be sterile and pyrogen free.

The aforementioned agents or a formulation thereof may be administered by any conventional method including oral, which is preferred, as well as parenteral (e.g. subcutaneous or intramuscular) injection. A suitable method of administration is intranasal or inhalation administration. Here, the agent or formulation is conveniently delivered in the form of a dry powder inhaler or an aerosol spray presentation from a pressurised container, pump, spray or nebuliser with the use of a suitable propellant, e.g. dichlorodifiuoromethane, trichlorofluoromethane, dichlorotetrafluoro-ethane, a hydrofluoroalkane such as 1 ,1,1,2-tetrafluoroethane (HFA 134A3 or 1 ,1 ,1,2,3,3,3- heptafluoropropane (HFA 227EA3), carbon dioxide or other suitable gas. The pressurised container, pump, spray or nebuliser may contain a solution or suspension of the active compound, e.g. using a mixture of ethanol and the propellant as the solvent, which may additionally contain a lubricant, e.g. sorbitan trioleate. Capsules and cartridges (made, for example, from gelatin) for use in an inhaler or insufflator may be formulated to contain a powder mix of a formulation in accordance with the present invention and a suitable powder base such as lactose or starch. The aforementioned agents or formulation thereof can also be delivered using ultrasonic nebulisation techniques.

The treatment may consist of a single dose or a plurality of doses over a period of time.

A second aspect of the invention provides a kit of parts for identifying an agent which modulates at least one activity or function of M. tuberculosis Rv3574 protein or an orthologue thereof comprising: an isolated M. tuberculosis Rv3574 protein or orthologue thereof, and an isolated dsDNA molecule comprising the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) wherein, independently, X, isO, G or T X2 is A, C, G or T X3 is A, C, G or T

X4 is A, C, G or T X5 is A, C, G or T X6 is G or T X7 is A, C, G or T X8 is A, C, G or T, and

X9 is A, G or T.

Preferences for the M. tuberculosis Rv3574, the orthologue, and the dsDNA, including both its length and sequence, are each as defined above with respect to the first aspect of the invention. The M. tuberculosis Rv3574 or orthologue thereof and dsDNA can be obtained using any of the techniques described above.

It is appreciated that the kit of parts may further comprise a reporter gene (operably- linked to the dsDNA) to assess the test agent's effect on expression. The kit may also comprise a substrate for a protein encoded by the reporter gene. For example, when the reporter gene encodes an enzyme whose activity can be measured by an enzyme assay, the kit may comprise a substrate for the enzyme. Suitable reporter genes include those described above for the first aspect of the invention.

Additional, optional, components for the kit of parts include labels for the isolated DNA and M. tuberculosis Rv3574 or orthologues thereof. For example, the DNA may be radiolabeled with 32P and the M. tuberculosis Rv3574 or orthologues may comprise a GST portion or may be biotinylated or otherwise tagged, for example with a 6His, HA, myc or other epitope tag.

The kit of parts may also comprise instructions for use in a method of identifying an agent which modulates at least one activity or function of the M. tuberculosis Rv3574 protein or the orthologue thereof, as described above.

In an embodiment, the M. tuberculosis Rv3574 protein or orthologue thereof may be bound to the isolated dsDNA. By an isolated protein, we mean one which has been purified from a cellular host and/or which is not substantially associated with any other protein. Preferably, the protein is recombinant. Proteins can be purified from the host cell using standard techniques including gel filtration, affinity chromatography and ion exchange chromatography

(Sambrook et al., 2001). A normal level of purity, as assessed by SDS-PAGE, is 80-

95%. Therefore, preferably the isolated protein is at least 80% pure, or at least 85% pure, and still more preferably at least 90%, or at least 93%, or at least 95%, pure of other proteins. As is known in the art, higher levels of purity, e.g. at least 99%, can be achieved using additional purification techniques.

By an isolated dsDNA, we mean one which has been purified from a cellular host and/or which is not substantially associated with other nucleic acid molecules. A range of well known methods and kits are available for DNA isolation and purification using, for example, ion-exchange chromatography or agarose gel electrophoresis (Sambrook et al., 2001).

We have demonstrated that M. tuberculosis Rv3574 binds to its consensus motif as a dimer. Accordingly, in a further embodiment, the isolated M. tuberculosis Rv3574 protein or orthologue thereof is bound to the dsDNA as a dimer

As described in detail in Example 1 , we have demonstrated that it is possible to identify genes controlled by M. tuberculosis Rv3574 and by M. smegmatis MSMEG_6042 by searching for the consensus DNA motif defined above in the first aspect of the invention. Since we consider that many of these genes are involved in lipid metabolism and in pathogenesis, identifying these genes will increase our understanding of lipid metabolism in these bacteria as well as of how the pathogen adapts to its host environment and evades host immune responses.

Accordingly, a third aspect of the invention is a method of identifying a gene in an actinomycete whose expression is regulated by M. tuberculosis Rv3574 or an orthologue thereof from an actinomycete, the method comprising; identifying, within the control region of a gene, the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) wherein, independently, X1 is C, G or T

X2 is A, C, G or T X3 is A1 C, G or T X4 is A, C, G or T X5 is A, C, G or T X6 is G or T X7 is A, C1 G or T

X8 is A, C1 G or T and X9 is A1 G or T.

Preferences for the dsDNA sequence are as defined above for the first aspect of the invention.

By control region we mean the 100 base pair section of DNA upstream (5') of the protein coding region of the gene as defined for the first aspect of the invention. However, it is appreciated that the gene may be part of an operon, in which case the coding region is typically the section of DNA upstream (5') of the protein coding region of the first gene in the operon.

The method may also include the further step of determining the effect of the M. tuberculosis Rv3574 or the orthologue thereof on the expression of the gene containing the identified sequence.

Testing the effect of M. tuberculosis Rv3574 or an orthologue thereof on the expression of the gene involves measuring expression of the gene in the presence and absence of the M. tuberculosis Rv3574 or the orthologue thereof. Suitable techniques for determining the expression of a gene are as described for the first aspect of the invention.

in a particular embodiment, the contro) regions of an actinomycete are identified by searching a database of an actinomycete in silico. This may involve searching a partial or complete genome. For example, the complete genome sequence of M. tuberculosis (GenBank No. AL123456), M. smegmatis (GenBank No. CP000480) and Nocardia farcinica (GenBank No. AP006618) are publicly available. A further example involves searching the TubercuList database (http://genolist.pasteur.fr/TubercuList). A suitable program for searching genomes for particular motifs is MAST (Bailey et a/., 1998). It is appreciated that the motif consensus may be refined using the motif identification program MEME (Bailey et a/., 1994) and a second MEME/MAST search performed as illustrated in Example 1.

In an alternative embodiment the control regions of an actinomycete are identified by hybridisation of a genomic library from the actinomycete with an oligonuleotide probe comprising the DNA sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) as defined above

It is appreciated that the hybridisation can be conducted with pools of degenerate oligonucleotides comprising the sequence defined above, or with any of the specific sequences referred to above. The level of strigency of the hybridisation may be adjusted appropriately as is well known in the art.

The DNA sequence may be radiolabeled, typically with 32P and used to probe genomic DNA from an actinomycete. The genomic DNA is preferably derived from a library of genomic clones.

It is appreciated that if a gene whose expression is known to be regulated by M. tuberculosis Rv3574 or an orthologue thereof is identified in one species, then the orthologue of that gene can be found in a related species. For example, using a combination of homology and synteny, an orthologue can be identified in a related species and the presence of the Rv3574 binding sequence in the control region can be verified, for example, by sequencing.

Thus, in an alternative embodiment, the control regions of an actinomycete are identified by comparison to an orthologous gene in another actinomycete.

In a further embodiment, the method is performed in a Mycobacterium, a Nocardia (Nocardioides), a Rhodococcus or in a Streptomycete, or in a genomic library obtained from a bacterial species within these genuses.

Preferably, the method is performed in M. tuberculosis or M. smegmatis.

All of the documents referred to herein are incorporated herein, in their entirety, by reference. The invention will now be described in more detail with the aid of the following Figures and Examples.

Figure 1. Comparative genomics of the Rv3574 region in the mycobacteria and other closely related actinomycetes.

In each actinomycete, a fadE gene encoding an acyl-CoA dehydrogenase was found adjacent to but divergently transcribed from Rv3574 and its orthologues. (a) indicates Rv3574 orthologues (b) indicates an acyl-CoA-dehydrogenase (c) indicates a transcriptional regulator of the Lad family and (d) indicates a trehalose-6-phosphate- phosphatase.

Figure 2. Sequence alignment and secondary structure prediction of Rv3574 orthologues.

The sequences were aligned using CLUSTALW (M. tuberculosis Rv3574 (MTB), SEQ ID No: 3; M. bovis Mb3605 (MB), SEQ ID No: 4; M. marimum MM5069 (MM), SEQ ID No: 5; M. ulcerans MUL4145 (MUL), SEQ ID No: 6; M. avium subsp. paratuberculosis MAP0491C (MAP), SEQ ID No: 7; M. smegmatis MSMEG_6042 (MSMEG), SEQ ID No: 8; and Norcardia farcinica nfa4470 (NFA), SEQ ID No: 9). The N-terminal regions of the proteins show clear similarity to the N-terminal part of the TetR family of bacterial regulatory proteins (TetR_N; PFAM00440). The consensus sequence of the TetR family (SEQ ID No: 44) and the amino acids identical or similar (+) to the consensus are shown above the alignment. Below the alignment, invariant positions are indicated by asterisks (*), while highly conserved and weakly conserved positions are indicated by colons and periods, respectively. The secondary structure prediction (h=helix) is also shown below the alignment and is derived from an atomic model of Rv3574 produced using Modeller v9.1 as described in materials and methods. The residues that are involved in DNA contact (as determined from the crystal structure of E. coli TetR and S. aureus QacR) are shown in bold. Amino acid identities in Mycobacterium bovis, Mycobacterium marinum, Mycobacterium ulcerans, Mycobacterium avium subsp. paratuberculosis, Mycobacterium smegmatis and Nocardia farcinica are 100%, 95%, 95%, 84%, 89% and 70%, respectively.

Figure 3. Deletion of kstRMsm in M. smegmatis mc2155.

(A) Genomic organisation of kstRUsm. A 646 bp pair deletion was made in kstRMsm and is represented by the hatched lines. AkstRUsm forward and reverse primers were used in the analysis of the mutant by colony PCR. (B) Colony PCR using AkstRMsm forward and reverse primers of wild-type (mc2155) and mutant (KstR1) genomic DNA showing the 646 bp deletion in the mutant strain.

Figure 4. Growth curve of M. smegmatis mc2155 wild-type and ΔkstRI. Cultures were grown at 37 0C with shaking in Middlebrook (Difco) 7H9 broth containing 10 % OADC and 0.05 % Tween 80. This is a representative experiment which was performed four times (twice with each of two independently generated mutants). Black line indicates mc2155 wild-type; dotted line indicates ΔkstRI mutant.

Figure 5. Expression levels of genes adjacent to kstRMsm in M. smegmatis mc2155 wild-type and ΔkstRI.

The expression levels were measured in mid-log phase aerated cultures using RTq- PCR. The results are expressed relative to sigA which was not significantly different in the mutant compared to the wild-type. Both the fadE34 orthologue (MSMEG_6041) and otsB (MSMEG_6043) were significantly up-regulated (de-repressed) in the mutant compared with the wild type (unpaired Students t-test; p=<0.05). Error bars represent ± 1 standard deviation. Filled bars indicate mc2155 wild-type; empty bars indicate ΔkstRI mutant.

Figure 6. Alignment of the kstR/fadE34 intergenic region in the mycobacteria and other closely related actinomycetes.

Intergenic regions were aligned using ClustalW. Shown are the RstR/fadE34 intergenic regions from M. tuberculosis (SEQ ID No: 45), M. bovis (SEQ ID No: 46), M. marinum (SEQ ID No: 47), M. smegmatis (SEQ ID No: 48), M. avium subsp. paratuberculosis (SEQ ID No: 49) and Norcardia farcinica (SEQ ID No: 50). Putative -35 and -10 regions are shaded in grey. Asterisks indicate residues conserved in all genomes. The alignment shows the presence of a highly conserved inverted palindromic repeat in all species. The repeat sequence is shown in bold and the direction of the palindrome is indicated with arrows. Putative -35 and -10 regions are shaded in grey. Species abbreviations are as in the legend to Figure 2.

Figure 7. Purified KstRMtb binds to a 29 bp sequence containing the putative regulatory motif.

(A) KstRMtb was expressed as a recombinant His6-tagged protein and purified using ion exchange chromatography. F indicates flow through, W indicates wash, M indicates markers and lanes 1-8 correspond to different elution fractions. (B) The protein was purified further by size exclusion chromatography before being used in the EMSA. M indicates markers and lanes 1-13 correspond to different elution fractions.

(C) EMSA of purified KstRMtb with the 29 bp fragment containing the highly conserved palindromic region. Lane 1 , labelled probe only; Lane 2, labelled probe with protein in a

1:1 ratio; Lane 3, labelled probe with protein and with 100 fold excess unlabelled probe; Lane 4, labelled probe with protein and with 150 fold excess of poly (dl-dC).

Figure 8. Purified KstRMtb binds to the 29 bp sequence as a dimer. The standard curve of V1Sv0 versus the log Mr was derived from the peak elution volume (ve) of standard proteins. The void volume (v0) was determined using blue dextran 2000. The molecular masses of His6-tagged KstRMtb, the 29 bp fragment, and the complex formed by them were calculated from the standard curve.

Figure 9. Sequence logo of the kstR motif.

Sequence logos (Crooks et al., 2004) show the relative frequency of each base at each position of the motif. The y axis shows the information content and error bars indicate an approximate, Bayesian 95% confidence interval.

Figure 10. Expression levels of genes flanking the predicted motifs in M. smegmatis mc2155 wild-type and ΔkstRI.

The expression levels were measured in mid-log phase aerated cultures using RTq- PCR. The results are expressed relative to sigA which was not significantly different in the mutant compared to the wild-type. All genes measured were significantly up- regulated (de-repressed) in the mutant compared to the wild-type (unpaired Students t- test; p=<0.05). Error bars represent ± 1 standard deviation. Filled bars indicate mc2155 wild-type; empty bars indicate ΔkstRI mutant.

Figure 11. Genomic organisation of the Rv3492c to Rv3574 (kstR) in M. tuberculosis and M. smegmatis.

The figure illustrates a number of operons within the main region that were de-repressed in the M. smegmatis mutant ΔkstRI , and contain motifs in both M. tuberculosis (MTB) and M. smegmatis (MSMEG). The gene numbers (as annotated) are given below the M. smegmatis genes. The Rv numbers are given for the M. tuberculosis gene unless the gene is named. The genes de-repressed in the microarray analysis of M. smegmatis ΔkstRI are indicated in black. kstRMsm, the gene that was knocked out, is indicated by hatched lines. The genes for which we have no data are white. The presence of the motif is indicated by a * and ** denotes separate motifs. The binding of KstRMtb was confirmed in a number of these motifs and these and indicated by bold asterisks (see also Table 3). The genes that are part of the predicted regulon in M. tuberculosis are indicated with bold outlines. This region contains a number of genes essential in vivo, up-regulated in macrophages, up-regulated by palmitic acid and up-regulated by cholesterol (see Table 4). The region has been broken down into four parts for convenience. The first two are contiguous, whereas there are gaps between the other regions, containing genes that are not part of the region.

Figure 12. Amino acid sequence of Rhodococcus sp. strain RHA1 Rv3574 orthologue (SEQ ID No: 10).

Figure 13. Induction of MSMEG_6038 gene expression by growth on cholesterol. The figure illustrates the cholesterol induction of the expression of the M. smegmatis gene MSMEG_6038 and shows that this induction is to a large degree dependent on the prescence of KstR. M. smegmatis strains mc2155 were grown in liquid media containing either glycerol (white block) or cholesterol (black vertical lines block).

Example 1: A highly conserved actinomycete TetR-type transcriptional repressor controls a large lipid metabolism regulon that is essential for in vivo survival of Mycobacterium tuberculosis

SUMMARY Mycobacterium tuberculosis survival in vivo has been shown to be dependent on its ability to use lipids as an energy source. The TetR-type regulator Rv3574 is highly conserved in the actinomycetes, and is essential for growth in mice, and its expression is induced in macrophages. We constructed a Mycobacterium smegmatis mutant lacking its Rv3574 orthologue (MSMEG_6402) and showed that it represses its own transcription. We identified a palindromic sequence motif upstream of Rv3574 orthologues, and used this to identify a motif present intergenically both in M. smegmatis and in M. tuberculosis. The M. tuberculosis Rv3574 protein was shown to bind to these motifs as a dimer. Combining these data with microarray experiments carried out with the M. smegmatis null mutant, we identified a direct regulon of 83 genes in M. smegmatis, and 74 genes in M. tuberculosis. Many of these genes are predicted to be involved in lipid metabolism, and other work has shown many to be induced by growth on cholesterol in rhodococci, and palmitate in M. tuberculosis. We conclude that this regulator, designated elsewhere as KstR, controls the expression of genes used for utilization of diverse lipids as energy sources and possibly also anabolically, probably imported through the Mce4 system. There is evidence that at least 19 of the genes in the M. tuberculosis kstR regulon are required for in vivo growth, illustrating the importance of this system in pathogenesis. Recently, a paper was published in which the Rhodococcus sp. strain RHA1 Rv3574 orthologue is referred to as kstR (Van der

Geize et al., 2007). By Rv3574 we also mean kstR and the terms are used interchangeably.

INTRODUCTION

The success of Mycobacterium tuberculosis as a pathogen (Corbett et al., 2003) lies partly in its ability to adapt to varying conditions within the host. This adaptation depends on the co-ordination of gene expression via the regulation of transcription; in M. tuberculosis this is achieved by the collective action of the 190 transcriptional regulators that the genome encodes (Camus et al., 2002; Cole et al., 1998). The importance of these genes in pathogenesis is illustrated by the observations that in many cases, inactivation of genes encoding sigma factors (Ando et al., 2003; Calamita et al., 2005; Chen et al., 2000; Sun et al., 2004) or two-component regulatory systems (Malhotra et al., 2004; Martin et al., 2006; Parish et al., 2003; Perez et al., 2001 ; Rickman et al., 2004; Walters et al., 2006; Zahrt and Deretic, 2001 ) causes severe attenuation in vivo. However, the identity of the genes controlled by the majority of the transcription factors in M. tuberculosis and the functional roles of these genes in vivo remain largely unknown.

The application of microarray technology to the study of bacterial gene expression during infection has allowed genome-wide analyses of genes important in pathogenesis. We previously reported a meta-analysis (Kendall et al., 2004) of the data from studies in M. tuberculosis and showed that there was (perhaps surprisingly) very little correlation between the lists of genes that were induced during infection (Schnappinger et al., 2003; Talaat et al., 2004) and those that were essential for infection (Rengarajan et al., 2005; Sassetti and Rubin, 2003).

Rv3574 is a member of the TetR family of transcriptional regulators. These proteins are often repressors and are widely distributed among bacteria regulating a number of diverse processes (Ramos et al., 2005). The prototype for this group is TetR from the

Tn 70 transposon of E. coli which regulates the expression of a tetracycline efflux pump in Gram-negative bacteria (Orth et al., 2000). Other members of the TetR family include Staphylococcus aureus QacR, which regulates the expression of a multi-drug transporter (Schumacher et al., 2001), and M. tuberculosis EthR, which regulates the expression of ethA, a monooxygenase that catalyses two steps in the activation of ethionamide, an antibiotic used in tuberculosis treatment (Baulard et al., 2000; Dover et al., 2004).

In this work we have examined the function of Rv3574 in order to clarify its role in M. tuberculosis. Our bioinformatic analyses indicate that the Rv3574 gene is highly conserved within the mycobacteria, and accordingly we have studied the function of orthologues in both M. tuberculosis and the fast-growing non-pathogen Mycobacterium smegmatis. We have inactivated Rv3574 in M. tuberculosis and the Rv3574 orthologue in M. smegmatis, and have identified a large number of genes that are de-repressed in the mutants. We have identified a conserved regulatory motif present in the upstream regions of the genes in the M. smegmatis regulon and also describe the same motif in M. tuberculosis. We show that recombinant M. tuberculosis Rv3574 binds to this motif as a dimer and describe the regulon for Rv3574 in both M. tuberculosis and in M. smegmatis. The functional relevance of the regulon in pathogenesis is discussed.

MATERIALS AND METHODS

Bacterial strains and culture conditions

Cultures of M. smegmatis mc2155 were grown at 37°C with shaking in Middlebrook (Difco) 7H9 broth containing 10% oleic acid-albumin-dextrose-catalase supplement (Becton Dickinson) and 0.05% Tween 80. For growth of M. smegmatis mc2155 in cholesterol, cultures were grown at 37°C with shaking in Dubos (Difco) broth containing 10% acid-albumin-dextrose-catalase supplement (Becton Dickinson), 0.05% Tween 80 and 1mM cholesterol. Cultures of M. tuberculosis H37Rv were grown at 370C in the same media as above with rolling. Hygromycin (50μg/μl), kanamycin (20μg/μl), 5-bromo- 4-chloro-3-indolyl-β-D-galactopyranoside (X-gal, 50μg/μl) and sucrose (2%w/v) were used for selection as appropriate. Escherichia, coli DH5α was used as a host for cloning and E. coli BL21(DE3) (Stratagene) was used as a host for expression of recombinant Rv3574. Both E. coli strains were grown in LB and kanamycin (50μg/μl) was used for plasmid selection and maintenance.

Table 1A. Bacterial plasmids used

Deletion ofkstRMsm and kstRMtb by homologous recombination

A 646bp deletion in MSMEG6042 (kstRUsm) was made in M. smegmatis mc2155 by homologous recombination (Parish and Stoker, 2000). Briefly, a 3.5kb fragment containing the entire kstRMsm gene and flanking regions was PCR amplified from mc2155 genomic DNA using ΔkstRMsm forward and reverse primers (Table 2). The primers had BamHl-HindlU sites (shown in upper case in Table 2) introduced into them to enable cloning of the 3.5kb fragment into p2NIL resulting in plasmid pCS1. A deletion was made in pCS1 by inverse PCR using inv_kstRMsm forward and reverse primers (Table 2) and re-ligation of the BgIW digested PCR fragment. One of the BgIW sites was present in the genome and the other was introduced in the inv_kstRMsm reverse primer. The deletion in the resulting plasmid pCS2 was confirmed by sequencing (MWG) across the junction (data not shown). Finally, the Pad cassette was inserted into pCS2 resulting in the suicide delivery vector pCS3.

A 322 bp deletion in Rv3574 fkstR^ was made in M. tuberculosis H37Rv by homologous recombination (Parish and Stoker, 2000). Briefly, a 3.0kb fragment containing the entire kstRMtb gene and flanking regions was PCR amplified from H37Rv genomic DNA using ΔkstRMtb forward and reverse primers (Table 2) and the resulting product cloned into p2NIL using SamHI sites within the genomic fragment resulting in plasmid pSK28. A deletion was made in pSK28 by inverse PCR using inv_kstRMtb forward and reverse primers (Table 2) resulting in plasmid pSK36. The deletion in pSK36 was confirmed by sequencing (MWG) across the junction (data not shown). Finally, the Pad cassette was inserted into pSK36 resulting in the suicide delivery vector pSK37.

pCS3 and pSK37 were electroporated into competent mc2155 and H37Rv respectively (Parish and Stoker, 1998), and single crossovers were selected for on medium containing hygromycin, kanamycin and X-gal. For both mc2155 and H37Rv, a single blue kanamycin and hygromycin resistant colony was streaked onto fresh media without any selective markers and incubated at 37DC for 3-5 days to allow the second crossover to occur. Serial dilutions were plated onto media containing sucrose and X-gal to select for double crossovers. Potential double crossovers (white sucrose resistant colonies) were screened for kanamycin sensitivity and confirmed by colony PCR. The resulting mutants were called ΔkstRI (deletion of KstRMsm in M. smegmatis) and ΔkstRI Mtb

(deletion of KstRMtb in M. tuberculosis)

RNA extraction

RNA for microarray analysis and RTq-PCR was extracted from both wild type mc2155 and ΔkstRI by direct sampling into guanidinium thiocyanate (GTC). Briefly, 10ml of culture was sampled into 40ml of 5M GTC to prevent further transcription. The culture was pelleted by centrifugation (20 min, 4,000rpm, 4°C) and resuspended in 200μl of water. The cultures were transferred to screwcap tubes containing 0.5ml of 0.1mm zirconia/silica beads (Biospec) and 700μl of buffer RLT (Qiagen) was added. The bacteria were lysed using a Mini-BeadBeater™ (BioSpec) and cell lysates were recovered by centrifugation (5 min, 13,000rpm, 4°C). RNA was purified from the lysate using an RNeasy kit (Qiagen) and treated with DNase (Qiagen) according to the manufacturer's instructions. Finally, the samples were eluted in 30μl of RNase-free water, and quantity was assessed using a NanoDrop (NanoDrop technologies).

Reverse transcription reactions for RTq-PCR

RTq-PCR was used for the analysis of the expression of single genes. Prior to reverse transcription RNA was treated with DNase (Invitrogen) for 30 min at 37 "C followed by heat inactivation. Reverse transcription took place in a total volume of 20μl containing 100ng total RNA, 300ng random primers (Invitrogen), 1OmM DTT, 0.5mM each of dCTP, dATP, dGTP and dTTP, and 200units Superscript III (Invitrogen). For primer annealing, RNA and random primers were heated to 65°C for 10 min in a volume of 13μl and then snap-cooled on ice prior to the addition of the remaining components. For reverse transcription, the reactions were incubated at 55°C for 50 min. 1μl (equivalent to 5ng of RNA) of cDNA was used in the RTq-PCR reactions.

RTq-PCR RTq PCR reactions were set up using the DyNAmo SYBR Green qPCR kit (MJ Research) and RTq-PCR was performed using the DNA Engine Opticon® 2 System (GRI). 20 μl reactions were set up on ice containing 1x DNA Master SYBR Green I mix, 1 μl of cDNA product and 0.3 μM of each primer. Sequences of each primer are given in Table 2. Reactions were heated to 95 0C for 10 min before cycling for 35 cycles of 95 0C for 30 seconds, 62 °C for 20 seconds, and 72 0C for 20 seconds. Fluorescence was captured at the end of each cycle after heating to 80 0C to ensure the denaturation of primer-dimers. At the end of the PCR reaction melting curve analysis was performed and PCR products were analysed on an agarose gel to ensure product specificity. The experiment was performed in triplicate and each gene was measured in duplicate giving a total of 6 data points per gene.

Expression and purification of recombinant Rv3574 (KstRMtb)

KstRMtb was PCR amplified from M. tuberculosis H37Rv genomic DNA using pET_KstRMtb forward and reverse primers (Table 2). These primers had Nco\-Hind\\\ sites introduced into them to allow for cloning into the pET30a expression vector. The nucleotide sequences corresponding to the restriction sites are shown in upper case in Table 2 and the start site of kstRMtb is underlined. The resulting plasmid, pSK35, was sequenced verified using the T7 promoter primer (MWG) and used for expression and purification of C-terminally his-tagged KstFW For expression, BL21(DE3) cultures containing plasmid pSK35 were grown at 37°C until mid logarithmic phase. Cultures were induced with 1mM IPTG (isopropyl-beta-D-thiogalactopyranoside) for 2 hrs at 37DC and harvested by centrifugation (10 min, 4000 rpm, 4°C). The cell pellet was re- suspended in 5 ml of lysis buffer (20 mM Hepes pHδ.O, 150 mM NaCI, 1 mM β- mercaptoethanol, 10 mM imidazole) and lysed by passage through a French Press cell. The lysate was centrifuged (25 min, 16,000 rpm, 4°C) and His6-KstRMtb purified by ion exchange chromatography using a HiTrap IMAC column (Amersham) followed by size exclusion chromatography using a S200 10/30 column.

Electrophoretic mobility shift assays (EMSAs) Primers (Table 2) were annealed by heating to 95 0C for 10 min and allowed to cooled slowly to room temperature. The resulting 29 bp probes were end labelled with DIG-11- ddUTP using the DIG gel shift kit, 2nd generation (Roche) according to the manufacturer's instructions. For the binding reaction, 15 pmol of purified his-tagged Rv3574 was incubated with 15 pmol of labelled fragment in binding buffer (20 mM Tris- HCI pH 8.0, 75 mM NaCI,10 mM MgCI2, 0.1 μg of poly-L-lysine, 1 μg of poly (dl-dC)). Specific and non-specific competitors were added for the control reactions. Specific competition reaction mixtures containing 100 fold excess of unlabelled probe and nonspecific competition reaction mixtures contained a 150-fold excess of poly(dl-dC). Incubations were carried out for 30 min at room temperature and reaction mixtures were loaded onto 8 % polyacrylamide gels containing 0.5 X Tris-borate-EDTA. Gels were run, with cooling at 80 to 100 V over 1.5 to 2 hrs. The DNA-protein complexes were contact blotted onto positively charged Hybond-N + nylon membranes (Amersham) and detected by anti-DIG-alkaline phosphatase and the chemiluminescent substrate CSPD as described by the manufacturer (Roche). Membranes were exposed to X-ray film at room temperature for 10 to 30 min.

Molecular weight determination of the protein-DNA complex by size exclusion chromatography (SEC)

The molecular weight of His6-KstRMtb was determined by analytical SEC on a Superdex200 10/30 column. A standard curve of V1Jv0 WaS constructed using the peak elution volume (ve) of the following standards; ovalbumin (43.0 kDa) ribonuclease A (13.7 kDa), albumin (67.0 kDa), chymotrypsinogen A (25.0 kDa) and catalase (232.0 kDa). The void volume (v0) of the column was determined with blue Dextran 2000. All SEC experiments were performed at a flow rate of 0.5 mL/min in 2OmM Hepes pH 8.0, 75mM NaCI, 1OmM MgCI2 and 1mM β-mercaptoethanol. His6-KstRMtb was used at a concentration of 15 μM. Samples containing His6-KstRMtb and the 29 bp annealed primers (Table 2) were incubated on ice for 15 min prior to analysis. Collected fractions were analysed by SDS-PAGE and stained with Coomassie Blue and ethidium bromide to confirm the presence of protein and DNA.

Microarray analysis ofΔkstRI and ΔkstRI Mtb

Microarrays for genome wide expression analysis of ΔkstRI were obtained from the Pathogen Functional Genomics Resource Centre at TIGR (http://pfgrc.tigr.org/). The arrays consist of 6746 different 70-mer single stranded oligonucleotides spotted onto glass slides. The oligonucleotides represent the entire M. smegmatis genome and each oligonucleotide is spotted four times. Microarrays for genome wide expression analysis of ΔkstRI Mtb were obtained from the Bacterial Microarray Groups and St. Georges Hospital, London (http://www.bugs.sgul.ac.uk/). The arrays consist of 4410 gene-specific PCR-amplified products.

Wild-type RNA was competitively hybridised against mutant RNA and the design included a dye-swap. For the labelling reactions, 2-10 μg of RNA was labelled with either Cy3-dCTP or Cy5-dCTP (Amersham Pharmacia Biotech). In each case, 3 μg of random primers (Invitrogen™ life technologies) were annealed to the RNA by heating to 95°C for 5 mi followed by snap-cooling on ice. The labelling reaction contained 0.5 mM each of dATP, dGTP and dTTP, 0.2 mM dCTP, 10 mM DTT, 60 μmol of Cy3-dCTP (or Cy5- dCTP) and 500 units of Superscript Il (Invitrogen™ Life Technologies) in a final volume of 25 μl. The samples were incubated in for 10 min at 25 0C followed by a 90 min incubation at 42 0C in the dark.

The slides were pre-hybridised by incubating in pre-hybridisation buffer (3.5xSSC, 0.1% SDS, 10mg/mL BSA) at 65 0C for 20 min. They were then washed in 400 ml of water followed by 400 ml of isopropanol for 1 min each. The slides were dried by centrifugation (1500 rpm, 5 min, room temperature) and stored in the dark until hybridisation (<1 hr). Microarray hybridisation

Labelled wild-type samples were combined with the corresponding labelled mutant samples and purified using a MinElute PCR Purification Kit from Qiagen. Samples were eluted in 25 μl of water and hybridised onto the array in hybridisation buffer (4 X SSC, 40 % formamide, 0.1 % SDS). The samples were denatured by heating to 95 0C for 2 min before being added to the array. Hybridisation took place under a glass cover-slips in a humidified slide chamber (Corning) submerged in a 65°C water bath for approximately 16 h. Coverslips were removed in wash buffer I (IxSSC1 0.05% SDS) pre-warmed to 65°C, and slides were washed sequentially in buffer I at 65°C for 2 min followed by 2 washes in buffer Il (O.OΘxSSC) at room temperature for 2 min each. Slides were dried by centrifugation (1500 rpm, 5 min, room temperature) and were scanned using an Affymetrix 4I8 scanner. The image files were quantitated using ImaGene 7.0 software (BioDiscovery Inc). The experiment was performed in duplicate and two arrays were used per experiment. As the oligonucleotides were spotted four times on the slides, this gave us a total of 8 data points per ORF.

Microarray data analysis

Data analysis was performed using functions from the limma (linear models for microarray data analysis) and yasma (yet another statistical microarray analysis) (Wernisch et al., 2003) software packages. Differentially expressed genes were identified by linear model using an experimental design for two-color arrays which incorporated biological replicates with dye-swaped technical replicates. Data for control spots, and spots with expression levels in the lower 10% quantile were discarded. This was followed by background correction and rank normalisation. Duplicate spots within the arrays were averaged before performing the linear model fit. False discovery rate adjustment was made using Benjamini and Hochberg's method . The most significant up and down-regulated genes were those with an adjusted P-value less than 0.05.

Bioinformatic analysis

Orthologues of kstRmb were identified using ACT (Carver et al., 2005). Sequence alignments were performed using ClustalW (Thompson et al., 1994). Motif analysis was done using MEME (Bailey and Elkan, 1994) and MAST (Bailey and Gribskov, 1998). Weblogo version 3 beta (Crooks et al., 2004) (http://weblogo.berkeley.edu/) was used to derive the image in Figure 9. An atomic model of KstRMtb was produced as follows: a

BLAST search of the PDB with the KstRMtb sequence was used to select two similar proteins of known structure, which were both TetR-family transcription factors from Rhodococcus RHA 1 (PDB ID: 2GFN) and Bacillus subtilis (PDB ID: 1 SGM) respectively. A structure-based sequence alignment of these proteins was produced using sPDBv (Guex and Peitsch, 1997) which was then used to produce a profile-based multiple sequence alignment in ClustalW. This alignment was then used along with the two template structures as input to Modeller 9v1 (SaIi and Blundell, 1993) using the automodel procedure.

Lipid extraction and analyses Polar and apolar lipids were extracted according to established procedures (Burguiere et a/., 2005). Cells (50 mg) were stirred in 2 ml of methanolicθ.3% saline (100:10, v/v) and 2 ml of petroleum ether for 15 min. The cells were centrifuged at 3000 x g for 5 min. The resulting biphasic solution was separated and the upper layer containing apolar lipids were recovered. An additional 2 ml of petroleum ether was added, mixed, and processed as described above. The two upper petroleum ether fractions were combined and dried under reduced pressure. To extract polar lipids, 2.3 ml of chloroform/methanol/0.3% NaCI (9:10:3, v/v/v), were added to the lower aqueous methanol layer and the solution stirred for 1 h. This mixture was filtered and the filter cake re-extracted twice with 0.75 ml of chloroform/methanol/0.3% NaCI (5:10:4, v/v/v). Chloroform (1.3 ml) and 0.3% NaCI (1.3 ml) were added to the combined filtrates. This mixture was briefly vortexed, allowed to settle, and the lower layer containing the polar lipids recovered and dried under reduced pressure.

The apolar and polar lipid fraction were then each resuspended in chloroform/methanol (2:1, v/v) and 50 μg of crude lipid extract applied to 6.6 x 6.6 cm pieces of Merck 5554 aluminium-backed TLC plates. Plates were developed using several of solvent systems, designed to cover the whole range of lipid polarities, as detailed previously (Dobson,

1985). Apolar lipids were analysed using TLC run in systems A - D whilst the polar lipids were analysed using TLC run in systems D - E. System A TLCs were run thrice in direction 1 (petroleum ether 60-80: ethyl acetate 98:2) and once in direction 2 (petroleum ether 60-80: acetone 98:2). System B TLCs were run thrice in direction 1 (petroleum ether 60-80: acetone 92:8) and once in direction 2 (toluene: acetone 95:5). Petroleum ether/acetone (92:8). Systems C, D and E TLCs were run once in each direction using, for system C chloroform: methanol (96:4) in the first direction and toluene: acetone (80:20) in the second, for system D chloroform: methanol: water (100:14:0.8) in the first direction and chloroform: acetone: methanol: water (50:60:2.5:3) in the second and for system E chloroform: methanol: water (60:30:6) in the first direction and chloroform: acetic acid: methanol: water (40:25:3:6) in the second. Lipids were visualised by staining with 5% ethanolic molybdophosphoric acid followed by careful charring with a heat gun.

The cell wall bound mycolic acids from the above de-lipidated extracts were released by the addition of a 5% aqueous solution of tetra-butyl ammonium hydroxide (TBAH), followed by overnight incubation at 1000C, and methylated as described previously (Alderwick et at., 2005). Mycolic acid methyl esters (MAMEs) were analysed by TLC using silica gel plates (5735 silica gel 6OF2S4, Merck) developed in petroleum ether/acetone (95:5, v/v). TLCs were visualized by charring with 5% molybdophosphoric acid in ethanol at 100 0C to reveal α, α' and epoxy-MAMEs.

RESULTS

Rv3574 is a member of the TetR family of transcriptional regulators and is highly conserved in the mycobacteria.

Orthologues of Rv3574 were identified through a combination of homology and synteny. Comparative genomic analyses showed that the Rv3574 region is highly conserved within the mycobacteria, and is also conserved in the closely related species Nocardia farcinica (Figure 1). In all cases, Rv3574 and its orthologues are transcribed divergently from a putative acyl-CoA-dehydrogenase (M. tuberculosis fadE34). No convincing orthologue was found, however, in the corynebacteria. In Streptomyces coelicolor a highly homologous gene was found (SCO2319, 32% amino acid identity over the whole length of the protein) but with no conservation of synteny. In Mycobacterium leprae, Rv3574 is present as a pseudogene. Rv3574 belongs to the TetR family of transcriptional regulators and the helix-turn-helix DNA binding motif is located at the N- terminal end of the protein. An alignment of the seven mycobacterial and nocardial orthologues shows that the degree of similarity is high over the whole length of the proteins (all >70% amino acid identity) and even higher over the putative DNA-binding domain (91% identity) (Figure 2).

In M. smegmatis, there is an additional gene immediately downstream of Rv3574Msm and transcribed in the same direction. This shows homology to otsB, a trehalose-6- phosphate-phosphatase. There is only a gap of 3 bp between the end of Rv3574Msm and the start of otsB suggesting that these genes form an operon. The gene downstream of the f?v3574Nfa, the orthologue in N. farcinica, is larger than otsB, and with a larger intergenic region, but is also predicted to be a trehalose-6-phosphate phosphatase. Deletion of kstRMsm causes a defect in growth in vitro

We initially focused our studies in the fast-growing non-pathogenic bacterium M. smegmatis. A 646 bp pair deletion, removing the entire N-terminal DNA binding domain, was made in kstRMsπi (Figure 3), producing strain ΔKstRl. As this was an unmarked mutation, it should not affect expression of downstream genes through polarity effects. Axenic growth of ΔKstRl was compared with the wild-type strain (Figure 4), and showed that, although the mutant grew at a similar rate to the wild-type, a slight increase in the lag phase was repeatedly observed. In order to confirm that the phenotype was not caused by a second-site mutation, the experiment was repeated with an independently derived mutant with similar results (data not shown).

Deletion of kstRMSm leads to up-regulation of adjacent genes

It is difficult to predict the genes that regulators control, but it is not uncommon for them to modulate the expression of adjacent genes, so to examine if kstRMsm controlled the expression of adjacent genes, the expression levels of the fadE34 orthologue (MSMEG_6041) and otsB (Figure 1) were measured in both wild-type and ΔKstRl using RTq-PCR. The results (Figure 5) show that both MSMEG_6041 and otsB are strongly up-regulated in the mutant strain (36-fold and 10-fold, respectively). The experiment was repeated with the independently generated mutant and confirmed the up-regulation of MSMEG_6041 and otsB in the mutant (data not shown). These observations suggest that kstRMsm acts as a repressor of transcription both of MSMEG_6041 and of otsB and itself.

KstRMtb binds to a conserved motif within its own promoter region.

TetR-like proteins normally bind to short palindromic DNA sequences (Grkovic et a/., 1998; Orth et a/., 2000; Ramos et ai, 2005). Because the protein-binding constrains the evolution of these nucleotides, regulatory motifs may be identifiable through their conservation relative to neighbouring DNA sequences. So, to try to identify possible KstRMtb binding sites, the intergenic region from kstRuvα /fadE34 (Figure 1) from M. tuberculosis was aligned with the orthologous regions from other species. Figure 6 shows that there is an 18 bp region that is very highly conserved. Examination of the sequence showed that it contains a 14 bp palindrome, with the general format TAGAAC(N2)GTTCTA (SEQ ID NO: 1 ). Additionally, putative -10 and -35 regions were identified with the start of the palindrome overlapping with the end of the -10 sequence. The binding motif is upstream of, but partially overlapping the -10 region, and this would efficiently block binding of the RNA polymerase. A similar arrangement has been shown for TetR, QacR and AcnR (Bertrand et al., 1983; Grkovic et al., 1998; Krug et al., 2005). These observations suggest that this motif represents the KstRMtt> binding site and, taken together with the results on the expression of adjacent genes, that kstRMsm represses itself.

In order to determine whether KstRMtb itself binds directly to this motif (rather than acting indirectly, for example, by controlling an intermediate protein), the protein was expressed as a His6-tagged form and used in electrophoretic mobility shift assays (EMSAs). His6- KstRMtb was purified by Ni2+-affinity chromatography (Figure 7a) followed by size exclusion chromatography (SEC) to 95% purity as judged by SDS-PAGE. The purified protein showed clear binding to the entire kstRMtJfadE34 intergenic region but not to a random piece of DNA of the same size (data not shown). Additionally, the purified protein showed specific binding to various 29 bp DNA probes and to a 24 bp DNA probe (Table 2) containing the highly conserved palindromic region identified above, but not to a 18 bp probe. Binding is also expected to the 30 bp DNA probes in Table 2. Figure 7 shows that a clear DNA retardation of the 29 bp fragment was seen in the presence of the protein (Figure 7b, lane 2). The retardation was lost with a 100-fold excess of unlabelled probe as a specific competitor (Figure 7b, lane 3), but a non-specific competitor did not abolish binding (Figure 7b, lane 4). These observations show that His6-KstRMtb binds directly within its own promoter region to a sequence containing a highly conserved palindrome.

KstRMώ binds to the motif as a dimer E. co// TetR binds to DNA as a homodimer (Orth et al., 2000), S. aureus QacR binds as a dimer of dimers (Grkovic et al., 2001 ) and M. tuberculosis EthR octamerizes on its operator (Engohang-Ndong et al., 2004). In order to study the binding stoichiometry of His6-KstRM7B to the motif, the elution of the protein alone and in the presence of the 29 bp fragment was analysed by SEC and compared with a standard curve of V6Zv0 versus log Mr (Figure 8). The time of elution of KstRMtb was consistent with a molecular mass of 60.2 kDa suggesting that the protein forms a dimer (the predicted monomeric mass is 27.7 kDa). The molecular mass of the 29 bp fragment alone was determined to be 58.9 kDa; note that this substantially exceeds its actual mass of 18.0 kDa due to the inflexible rod structure of DNA in comparison with the globular shap of standard proteins (Reuter et al., 1998). Only one species of KstRMtb-DNA complex with a molecular mass of 118.7 kDA was detected at protein:DNA ratios of 1.4: 1 :1 and 4:1. This is consistent with a complex of dimeric His6-KstRMtb bound to one 29 bp fragment of DNA.

The motif is present in the upstream regions of other genes in both M. tuberculosis and M. smegmatis

The experiments described above show that His6-KstRMtb binds to a 29 bp fragment from within its own promoter as a dimer and, this region contains a highly conserved palindromic sequence overlapping the putative -10 region. In order to identify whether the palindrome was found within the upstream regions of additional genes in both the M. smegmatis and M. tuberculosis genomes, we searched for the motif in a genome wide- scale. Firstly, the promoter regions of the kstR orthologues were used as a training set for the motif identification program MEME (Bailey and Elkan, 1994). This generated a motif profile that was used to search a database of intergenic regions from M. tuberculosis and M. smegmatis using the sister program MAST (Bailey and Gribskov, 1998). This revealed 12 other instances of the motif in M. tuberculosis and 23 in M. smegmatis using the default e-value cut-off of 10. In order to refine the motif consensus sequence, a second MEME/MAST search was performed, this time using a training set consisting of the promoter regions of all 35 genes identified above from both species. This predicted 16 motif instances in M. tuberculosis and 31 in M. smegmatis (Table 3). Many of these are situated between divergently transcribed genes, while in the region between Rv3570c/Rv3571 (and the orthologous genes MSMEG_6038/MSMEG_6039) there are two copies of the motif. Comparative genomics of M. smegmatis and M. tuberculosis showed that, of the genes associated with the motif, a number of them are orthologues and these are indicated in Table 3. The information was used to generate a sequence logo of the kstR motif (Figure 9), the core of which is the palindrome TnnAACnnGTTnnA (SEQ ID No: 51).

The motifs predicted are also regulated by KstR

Two approaches were used in order to obtain experimental evidence for the motif predictions in both M. smegmatis and M. tuberculosis. RTq-PCR was used to compare the levels of expression of the flanking genes in the ΔkstRI mutant (AkstRUsm) with wild- type M. smegmatis, and EMSAs were used to demonstrate the binding of His6-KstRMtb to the predicted M. tuberculosis motifs. The levels of expression from 11 of the predicted motifs in M. smegmatis were measured. If the predicted motif was biologically relevant then the flanking genes would be de-repressed in the ΔkstRI mutant. RTq-PCR analysis showed that all the genes tested were significantly de-repressed in the mutant compared to the wild-type with levels of de-repression ranging from 6-fold (MSMEG_5932) to 155- fold (MSMEG_6038) (Figure 10). EMSAs were carried out to look for binding of HiS6- KstRMτB to 13 of the predicted M. tuberculosis motifs using 29 bp probes, a 24 bp probe and an 18 bp probe (Table 2). Binding was observed in 12 of the motifs when using the 29 bp probes (Table 3) and when using the 24 bp probe (Table 2).

Microarray analysis indicates that a large number of genes are de-repressed in an M. smegmatis ΔkstRI mutant.

The above experiments have shown that the mycobacterial kstR is a highly conserved transcriptional repressor that represses gene expression by binding to an intergenic palindromic motif. In order to obtain a genome-wide picture of genes controlled by kstR, we carried out competitive hybridizations between cDNA from wild-type M. smegmatis and the mutant strain ΔkstRI using M. smegmatis microarrays (full results not. shown).

Using a p-value of 0.05 corrected for multiple testing, a total of 132 genes were significantly upregulated (6-1771-fold), and 27 were downregulated (6-18 -fold).

The microarray analysis showed de-repression of genes flanking 27 of the 31 motifs that we had identified in M. smegmatis. For the other four, although the computational analysis indicates the presence of a motif, and we would predict binding of KstR, a combination of low levels of de-repression, low levels of significance, and the absence of an orthologous gene with a motif in M. tuberculosis means that we currently have no evidence that they are biologically relevant.

In some cases, we identified motifs that we had not previously found next to de- repressed genes. In total, there were four additional motifs in M. smegmatis

(MSMEG_0217 MSMEG_1410, MSMEG_3658 and MSMEG_5940) and two in

M. tuberculosis (Rv3501c and Rv3536c) that were not picked up in the computational search. Three of these motifs were found to overlap with coding sequence of adjacent genes (MSMEG_1410, MSMEG_3658 and MSMEG_5940) and would not have been included in the original search, which was limited to intergenic regions. It is interesting that MSMEG_1410 is highly induced (>200-fold), yet appears to be the last gene is an operon, and is directly downstream of an arginine deiminase gene (MSMEG_1409).

Thus KstR is regulating the last gene in an operon independently of the others. Defining the kstR regulon in M. smegmatis

The genes with altered expression in the microarray analysis are a combination of those where binding of KstR directly affects transcription (which we call here the KstR regulon), and those that are secondary effects. The genes in the KstRMsm regulon were defined by using data from the motif search (Table 3), EMSA analyses (Table 3), RTq-PCR analyses (Figure 10), genome-wide expression data from the microarray analysis

(looking both at fold-change and p-value), comparative genomics with M. tuberculosis using ACT (Carver et al., 2005) and examination of operon organisation. We have in a minority of cases included genes where there are no microarray data, or where the array data are not significant at the 0.05 level, but the fold-change and other factors support their inclusion. The kstRMsm regulon, containing 83 genes, is listed in Table 4.

Transcriptional changes not associated with a KstR motif

We also analysed the genes that were not associated with a KstR motif, in order to determine the KstR-independent transcriptional changes that are likely to be secondary effects (the KstR-independent regulon). We examined fold-change, p-value, genomic organization and took runs of modulated genes into account, and included 99 genes in this group. Most of these (87) were upregulated in the mutant (2-100-fold) and 11 were downregulated (5-16-fold). We were especially convinced by 74 of them (all but two induced in the mutant), as they lie in putative operons. The co-regulation of adjacent genes is particularly strong evidence that the effect is a genuine indirect effect of the kstR deletion, rather than being due to the noise inherent in microarray experiments.

Predicting the kstR regulon in M. tuberculosis There are clear orthologues of most of the genes in the kstRMsm regulon in M. tuberculosis (Table 4). In fact, all the genes in M. tuberculosis that were predicted to contain a motif in the computational search are present in Table 4. The presence of the motif and the orthology with the de-repressed M. smegmatis gene is robust evidence for inclusion of these genes in the M. tuberculosis kstR regulon.

Defining the KstR regulon in M. tuberculosis

In addition to predicting the KstR regulon in M. tuberculosis based on the high degree of homology between M. tuberculosis and M. smegmatis genes, we have conducted a microarray analysis to obtain a direct genome-wide picture of M. tuberculosis genes controlled by KstR. We carried out competitive hybridisations between cDNA from wild- type M. tuberculosis and the mutant strain AkstR1Mtb- This analysis showed that 49 genes are regulated by KstR in M.tuberculosis, many of which were significantly up- and down-regulated (Table 5).

The most striking observation from the data is that there is a large region in both mycobacterial genomes (Rv3492c to Rv3574 (kstR) and MSMEG_5893 to

MSMEG_6043) that contain a number of operons that are de-repressed and associated with the Rv3574 DNA binding motif. Additionally, this region of the genome contains a large number of genes that have been found by others to be induced in macrophages

(Schnappinger et a/., 2003), essential for survival in both macrophages and mice (Rengarajan et al., 2005; Sassetti and Rubin, 2003) and induced by lipids such as palmitic acid and cholesterol (Schnappinger et a/., 2003; Van der Geize et al., 2007). We have illustrated this region in Figure 11.

Functional analysis of the KstR regulon Analysis of the functions of the genes in the KstR regulon was carried out by a combination of BLAST analyses, as well as searching the Tuberculist database (http://genolist.pasteur.fr/TubercuList/) and the literature. It was striking that the predicted functions of most of the genes in the regulon relate to lipid metabolism or to redox reactions. For example, there are ten fad genes (fadA5, fadE26, fadE27, fadE29, fadE28, fadE34, fadDδ, fadD12, fadD17 and fadD19), one ech gene (echA19), one lip gene (HpO) and at least three keto-acyl-CoA-thiolases (Itp2, Itp3 and Itp4). In addition, the mce4 operon is part of this regulon, and it has been suggested that these operons are involved in the import of lipids or are lipid-associated (Mitra et al., 2005; Rosas- Magallanes et al., 2007; Van der Geize et al., 2007). Other genes in the regulon, hsaC and hsaD (formerly bphC and bphD, respectively) have been implicated in both cell wall synthesis (Anderton et al., 2006) and cholesterol degradation (Van der Geize et al., 2007). A striking observation of the KstR regulon is that many of the genes in the Rv3492c to Rv3574 region are induced in macrophages (Schnappinger et al., 2003), essential for survival in both macrophages and mice (Rengarajan et al., 2005; Sassetti et al., 2003) and induced by lipids such as palmitic acid and cholesterol (Schnappinger et al., 2003; Van der Geize et al., 2007) (Table 4). The relevance of this observation and of the role of the KstR regulon in vivo is discussed below.

Cholesterol induction of expression of the M. smegmatis gene MSMEG_6038 Since many of the genes in the KstRMsm regulon are involved in lipid metabolism, we tested whether cholesterol had an effect on the KstR mediated repression of the M. smegmatis gene MSMEG-6038, a gene which was highly induced by deletion of KstR. Cultures of M. smegmatis mc2155 wild type and AkstRMsm were grown in liquid media containing either glycerol or cholesterol, and the expression of MSMEG_6038 measured. In the wild type strain, the presence of cholesterol in the liquid medium caused a marked increase in the expression of MSMEG_6038 relative to only minimal expression in the presence of glycerol (Figure 13 and Table 6). In the AkstRMsm strain, MSMEG_6038 expression levels were high in the presence of both glycerol and cholesterol, although expression was highest in the presence of cholesterol (Figure 13 and Table 6). Thus, the MSMEG_6038 gene is induced by cholesterol and this induction is to a large degree dependent on the presence of KstR. This data is consistent with the cholesterol induction of the MSMEG_6038 homologue in Rhodococcus sp. Strain RHA1 as demonstrated in Van der Geize et ai, 2007.

Table 6. Cholesterol induction

Functional analysis of KstR-independent genes

Of the 99 genes that we predict to have altered expression without direct KstR binding, some were also predicted to be involved in lipid metabolism. Of the others, 31 are predicted to be involved in translation, while other groups included chaperones, the pentose phosphate pathway, dipeptide transport and glycerol catabolism. In addition, a group of genes (MSMEG_0065-MSMEG_0068) show strong homology to the M. tuberculosis esxAB gene cluster which is important for virulence.

Lipid analysis indicates no differences in the cell wall lipid profiles of the ΔkstRI mutant Many of the genes we had identified were predicted to be involved in lipid metabolism, but it was unclear whether they were catabolic or anabolic. We investigated the possibility that they might be anabolic by sequentially extracting apolar and polar lipids from cells and analysing them by thin layer chromatography (TLC) using five solvent systems (A-E). Qualitative analysis of these data showed that the ΔkstRI mutant possesses lipid profiles (TAG, free fatty acid and mycolic acids, TDM, GPLs, PIMs and phospholipids) that were essentially comparable to the wild-type. In addition, the cell wall bound α, α' and epoxymycolic acids were also comparable between both strains (data not shown). This supports the hypothesis that the kstR regulon is largely catabolic, and does not encode enzymes for synthesising novel lipids.

DISCUSSION

Lipid metabolism (both anabolic and catabolic) plays a key role in the pathogenesis of M. tuberculosis. Mycobacteria and other prokaryotes are able to use fatty acids as a sole carbon source via β-oxidation and these pathways are thought to be particularly important for the survival of M. tuberculosis in vivo (Bishai, 2000; McKinney et al., 2000). In addition to using fatty acids as a carbon source during infection, cell wall lipids play a variety of roles in pathogenesis (Rao et al., 2006; Russell et al., 2002).

We have shown that KstR, a transcriptional regulator highly conserved within the actinomycetes, controls the expression of a number of genes involved in lipid metabolism in both M. tuberculosis and M. smegmatis. Many of the genes lie within a large cluster of genes lying adjacent to kstR. This region was also the focus of a very recent study which identified the genes induced by growth of Rhodococcus sp. strain RHA1 on cholesterol (Van der Geize ef al., 2007). Van der Geize et al assigned 26 genes (see Table 4) to the cholesterol degradation pathway, mostly through bioinformatics, together with experimental verification of candidate genes. Although they named Rv3574 as kstR, no experimental evidence was provided as to the role of this gene. By contrast, our results show that most of these genes are controlled by kstR and have the kstR binding motif in their upstream regions. Moreover, we demonstrated that the expression of the M. smegmatis gene MSMEG_6038 is induced by cholesterol. We have confirmed that the binding motif identified here is also present in appropriate sites in the RHA1 genome (data not shown).

Despite the proposed involvement of the rhodococcal kstR regulon in cholesterol degradation, we believe that the situation is more complex in mycobacteria. Firstly, we have demonstrated that the regulon is extremely large (83 genes in M. smegmatis and 49 genes in M. tuberculosis), and it is unlikely that so many genes are required for cholesterol utilisation. Secondly, palmitic acid has also been shown to induce 22 of these genes (including kstR itself) in the kstR regulon in M. tuberculosis (Table 4 and Table 5; see below). We therefore consider that the kstR regulon is involved in the uptake and utilization of a variety of lipids. Without wishing to be bound by any theory, it makes biological sense for the bacteria to have a mechanism that will degrade a variety of lipids rather than being specific, thus limiting its utility.

We examined differences in the data on palmitate, cholesterol and the kstR regulon.

Twenty-two genes in the kstR regulon were induced two-fold or more by growth in palmitate (Table 4), including kstR itself. The low induction ratios seen in these experiments compared to ours and the cholesterol data could be due to technical difficulties working with M. tuberculosis, or could indicate that different molecules derepress the regulon with different affinities. In RHA1 , all 51 genes in the Rv3492c-

Rv3574 region were upregulated in the presence of cholesterol, whereas we found that not all of these were induced in the M. smegmatis kstR mutant (Figure 11 ). This suggests that part of the cholesterol response is under the control of regulators other than KstR.

It is remarkable that 19 of the genes in the kstR regulon have been shown to be essential in vivo in mouse or macrophage models (Rengarajan et a/., 2005; Rosas-Magallanes et at., 2007; Sassetti and Rubin, 2003) (Table 4). These include kstR itself, some of the mce4 operon genes, and genes encoding various other enzymes. Presumably the reason for the essentiality for kstR (where a mutant will express all genes constitutively) differs from the other genes (where a mutation results in loss of function). The induction levels we saw in the kstR regulon were extrememly high so the essentiality of kstR may merely reflect the energy cost of the gene expression; alternatively, there may be times in the infection process where expression of a particular gene in the regulon is detrimental for another reason. The essentiality of the other genes shows how important uptake of these lipids is to the survival of the bacterium, as indicated previously by the observation that the glyoxylate shunt is important in vivo (McKinney et al., 2000).

We identified 99 genes that were induced or repressed in the mutant but did not appear to be directly regulated by KstR. Apart from the ribosomal and chaperone genes, some of these appear to be involved in lipid metabolism, and include a cholesterol dehydrogenase that was induced in RHA1 , suggesting the induction of separate regulatory systems. Other groups of genes that were induced were involved with glycerol utilization and dipeptide transport, which would also be relevant to obtaining energy from different sources. We identified two other transcriptional regulators in the kstR regulon (MSMEG_5554 and MSMEG_5911) that may control these genes in M. smegmatis.

It is efficient for bacteria to utilise exogenous lipids as an energy source in this way, but the expression of such a large group of genes must also use a great deal of energy. This is supported by the observation that many ribosomal and chaperone genes were also induced, suggesting that the induction of the regulon places a strain on the translation apparatus of the cell. It may also explain the growth defect that we observed in the mutant. The induction levels we saw in the direct kstR regulon were extremely high, probably indicating that the repression of most of the genes was very tight in the wild-type strain under the growth conditions used. Our RTq-PCR data on selected genes confirmed that the wild-type expression levels under the conditions used are extremely low. When derepressed in the mutant strain, the genes are expressed at varying levels, but many are transcribed at rates much higher than the housekeeping gene sigA (Figure 10). In contrast, the induction ratios seen in the presence of cholesterol (Van der Geize ef al., 2007) and palmitic acid (Schnappinger ef al., 2003) are lower than we observed in this study and this may reflect low intracellular concentrations of the molecules, or that they derepress the regulon with different affinities.

mce operons

The mce4 operon appears to be a key part of the kstR regulon. Circumstantial data are accumulating that the mce operons (of which M. tuberculosis has four, and M. smegmatis at least five) function as lipid transport systems (Mitra et al., 2005;

Santangelo ef al., 2002; Uchida ef al., 2007; Van der Geize ef al., 2007). The results presented here show that the mce4 operon is co-regulated with other genes involved in fatty acid metabolism and supports the hypothesis that the mce genes are involved in fatty acid degradation/synthesis pathways.

The mce1, 2 and 3 operons all lie next to transcriptional regulators. Both the mce 1 operon and the mce3 operons have also been shown to be repressed by these regulators; mce1 is repressed by a FadR type regulator, Mce1R, and mce3 is repressed by TetR type regulator, Mce3R (Casali ef al., 2006; Santangelo ef al., 2002). In contrast, there is no regulatory gene adjacent to the mce4 operon. The location of the regulators in mce operons 1-3 would be consistent with the adjacent regulators just controlling the one set of genes, but this has not been shown. There is evidence that the four mce operons are expressed at different times (Kumar ef al., 2003), and we suggest that they are each responding to different lipids in their environment. Our data show that the mce4 operon is normally repressed by KstR in aerated exponential cultures. There is evidence that the mce4 operon is expressed in non-aerated stationary phase cultures (Kumar et al., 2003; Kumar et ai, 2005) and also in guinea pigs (Kumar et al., 2003).

KstR is a TetR-type regulator; in this paradigm, repression is controlled by the binding of an inducer molecule. TetR itself binds tetracycline (Ramos et al., 2005), and these ligands are often hydrophobic molecules (for example, the M. tuberculosis EthR protein binds hexadecyl octanoate (Frenois et al., 2004)). Although not TetR-like, E. coli FadR is bound by long chain fatty acids (DiRusso et al., 1998). It is therefore tempting to assume that a similar mechanism occurs with KstR. The induction of the KstR regulon by palmitate and cholesterol supports the hypothesis for a fatty acid ligand and it has been suggested that a portion of the cholesterol molecule could function in signalling (Van der Geize et al., 2007). Additionally, the induction of the regulon upon entry into the macrophage (Schnappinger et al., 2003) and the essentiality of many of the genes in the regulon for in vivo survival (Rengarajan et al., 2005; Sassetti and Rubin, 2003) suggests that the ligand(s) are present inside the host. There is evidence that different FadR proteins from bacteria adapted to particular ecological niches bind to different fatty acid ligands (Iram and Cronan, 2005, 2006), while QacR, a multidrug transporter, binds to diverse cationic lipophilic molecules (Schumacher et al., 2001 ). Given the expanded number of fad genes in mycobacteria, the situation is likely to be complex.

The kstR gene is highly conserved in the non-pathogenic mycobacteria and actinomycetes suggesting that the function of the regulon is not virulence but rather niche adaptation. The M. leprae kstR is present as a pseudogene, implying that any genes in the regulon would be constitutively expressed. In fact all but two genes in the Rv3492c-Rv3574 region are present as pseudogenes (Cole et al., 2001). Presumably this reflects the fact that M. leprae has evolved to exist in a narrower ecological niche, and further characterization of the KstR system may shed light on what lipids M. leprae, which has never been convincingly grown in axenic culture, is unable to utilise.

We have produced a M. tuberculosis kstR regulon using the M. smegmatis microarray data, and the high degree of syntenic similarity and the conservation of KstR motifs give us confidence that this is correct. We initially predicted the M. tuberculosis regulon to consist of 74 genes and this was supported by our data using the M. tuberculosis KstR protein to demonstrate binding to different examples of the motif, and microarray data from other groups showing expression of genes from the regulon under relevant conditions. Our microarray data analysis comparing wild type and AkstRMtb mutant M. tuberculosis strains confirmed KstR regulation in 49 of these 74 genes (Table 5). One reason for the decrease from 74 to 49 is that some genes are upregulated in M. smegmatis that are not in M. tuberculosis. For example, in the mce4 gene cluster, the motif appears to have 'diverged' from that in M. smegmatis to the extent that KstR no longer binds. Such genes will thereof be permanently derepressed in M. tuberculosis which seemingly has no adverse effect. The decrease may also stem partially from the microarray analysis not being sensitive enough to detect smaller changes.

One gene that is present in the M. tuberculosis kstR regulon, but not that of M. smegmatis, is the nat gene encoding arylamine N-acetyltransferase. This lies at the end of a six gene operon starting with Rv3760c (Anderton et a/., 2006). Although the first four genes (Rv2570c-Rv3567c) have orthologues in M. smegmatis (MSMEG_6038 to 6035), there are no orthologues to Rv3566A and nat at that locus (M. smegmatis has a nat gene elsewhere in the genome). Mutants lacking nat are defective in mycolic acid synthesis (Bhakta et at., 2004), indicating an anabolic role for some genes in the kstR regulon. While the kstR regulon has a major catabolic role, it is possible that some of the genes in the regulon are anabolic, although we did not see differences in quantity and abundance of the major cell wall lipids in the M. smegmatis ΔkstRI mutant. This is the situation in E. coli where the fadR gene controls the uptake and degradation of long- chain fatty acids, but also regulates anabolic lipid genes (Dirusso and Black, 2004). The Nat protein can bind to the anti-tubercular drug isoniazid, reducing its efficacy (Sandy et al., 2002). The induction of the kstR regulon in vivo may therefore partially affect the antibiotic resistance of the bacteria.

In conclusion, we have described a large regulon within the mycobacteria. In M. tuberculosis this makes up almost 2% of the genome. Although at least the core of this regulon is highly conserved in non-pathogens, many of the genes are critical in the pathogenesis of M. tuberculosis. Investigating both the regulation of KstR and the functions of the genes in the regulon are likely to provide important new information in our understanding of this major pathogen. Identifying agents that modulate the activity of the gene controlling the regulon is equally likely to provide additional therapeutic agents against this major killer and other related bacteria. Table 2. Primer details.

Ui

Ul OO

CJl CD

Table 3. Motif identification in M. tuberculosis and M. smeamatis.

o

O)

CJ) IV)

Superscripts a-s represent homologous genes i.e. MSMEG_0309 is a homologue of RvO223c.

Table 4. The KstR requlon in M. smegmatis

M. smeqmatis Fold chanqe P-value Mtb Gene Name Function t Alcohol dehydrogenase

MSMEG_0217 f* 197.3 1.5E-03 t RvO 162c adhE1

Esterase

T MSMEG_0302 54.5 2.7E-03 t Rv 1426c lipO Fatty acid CoA synthetase t MSMEG_0304 1 14.4 5.8E-03 t Rv1427c fadD12 CHP t MSMEG 0305 71.4 1.5E-03 t Rv1428c

Aldehyde dehydrogenase t MSMEG_0309 * 82.8 1.3E-03 t RvO223c

Fatty acid CoA synthetase

CJ) T MSMEGJ 098 * 110.9 1.3E-03 t bRv0551c * fadD8

Short-chain type

I MSMEGJ 410 f* 223.7 1.3E-03 i eRv0687 * dehydrogenase/reductase

CHP

T MSMEGJ2644 2.0 2.9E-01 T Rv2800 Membrane protein t MSMEG_2645 * 24.6 4.8E-03 t Rv2799 *

CHP t MSMEG 2789 3.0 1.4E-01 Rv2669

CHP

T MSMEG 2790 7.8 4.0E-02 t Rv2668

Short-chain type

MSMEG 3515 62.0 1.6E-03 dehydrogenase/reductase

M. smegmatis Fold chanqe P-value Mtb Gene Name Function

CHP, pseudogene?

4 MSMEG_3516 22.7 5.3E-03 -

CHP

4 MSMEG_3519 168.0 1.3E-03 t b cRv1894c

Fumarate reductase/succinate t MSMEG_3658 f* 67.1 1.5E-03 4 - dehydrogenase

CHP

4 MSMEG 3843 100.8 1.3E-03 t b cRv1628c

Non-specific lipid transfer

4 MSMEG_3844 205.1 1.3E-03 t bRv1627c - protein

Conserved membrane protein

T MSMEG_5202 * 36.4 33.E-03 4 cRv1132 -

Cholesterol dehydrogenase

4 MSMEG_5228 * 100.8 1.3E-03 t Rv1106c -

CD -fc- Dihydrodipicolinate reductase

4 MSMEG_5286 , 10.2 2.2E-02 Rv1059 dapB

Monooxygenase

T MSMEG 5519 1.8 3.7E-01 Oxidoreducatase

4 MSMEG 5520 NDa ND RvO953c

Putative anti-terminator t MSMEG 5554 4.4 8.2E-02 response regulator

Oxidoreductase

4 MSMEG 5555 ND ND Rv0940c

Short-chain type

4 MSMEG 5584 82.7 1.3E-01 4 RvO927c dehydrogenase/reductase

M. smeqmatis Fold change P-value Mtb Gene Name Function

CHP

I MSMEG_5586 250.7 1.3E-01 I RvO926c -

CHP MCE associated protein

T MSMEG_5893 11.7 2.0E-02 T Rv3492c -

CHP MCE associated protein

T MSMEG_5894 11.7 2.4E-02 T Rv3493c - mce4 operon: lipid transfer t MSMEG_5895 ND ND t Rv3494c mce4Fh mce4 operon: lipid transfer t MSMEG_5896 22.5 5.3E-03 t e>gRv3495c Mce4Eh mce4 operon: lipid transfer t MSMEG_5897 12.1 2.5E-02 t gRv3496c mce4Dh mce4 operon: lipid transfer t MSMEG_5898 ND ND t e>9Rv3497c mce4Ch en mce4 operon: lipid transfer

T MSMEG_5899 20.1 7.0E-03 t 9Rv3498c mce4Bh mce4 operon: lipid transfer t MSMEG_5900 ND ND t e'9Rv3499c mce4Ah r YrbE4B/sup mce4 operon: lipid transfer

MSMEG_5901 27.1 4.7E-03 t 9Rv3500c Bh f* YrbE4A/sup mce4 operon: lipid transfer t MSMEG_5902 32.4 4.1E-03 t e'9Rv3501c Ah

17b-hydroxysteroid t MSMEG_5903 74.7 1.5E-03 T b e'gRv3502c Hsd4Ah dehydrogenase

Ferredoxin t MSMEG_5904J 72.9 1.5E-03 t cRv3503c * fdxD

M. smeqmatis Fold change P-value Mtb Gene Name Function

Acyl CoA dehydrogenase

1 MSMEG_5906 113.4 1.3E-03 4 b e'gRv3504 fadE26h Acyl CoA dehydrogenase

4 MSMEG_5907 84.9 1.3E-03 4 b'c'9Rv3505 fadE27h Fatty acid CoA synthetase

4 MSMEG_5908 59.2 1.7E-03 4 9Rv3506 fadD17h Oxidoreductase

4 MSMEG_5909 ND ND

AraC-like transcriptional

4 MSMEG 5911 22.4 5.8E-03 regulator

Diooxygenase t MSM EG_ 5913 136.8 1.3E-03

Fatty acid CoA synthetase t MSMEG_ .5914 175.4 1.3E-03 t b c gRv3515c fadD19h

Enoyl-CoA hydratase

4 MSMEG_ .5915 251.9 1.7E-03 4 b C gRv3516 * echA19h cytochrome P450 t MSM EG_ _5918 2.8 1.8E-01 t Rv3518c cyp142 monoxygenase

CHP

T MSMEG. _5919 19.4 1.6E-02 4 gRv3519

Co-enzyme F420-dependent t MSMEG. _5920 38.5 4.2E-03 t 9Rv3520c oxidoreductase

CHP

4 MSMEG. _5921 ND ND 4 9Rv3521

3-keto-acyl-CoA thiolase

4 MSMEG 5922 95.5 1.3E-03 4 9Rv3522 Itp4n

M. smeqmatis Fold chanqe P-value Mtb Gene Name Function

3-keto-acyl-CoA thiolase

4 MSMEG_5923 122.0 1.3E-03 I e>gRv3523 Itp3h

Ketosteroid-9a-hydroxylase

I MSMEG_5925 60.3 1.6E-03 i b c sRv3526 kshAh

CHP

MSMEG_5927 61.7 1.6E-03 I b c'9Rv3527

CHP

T MSMEG_5930 3.9 1.4E-01 T Rv3529c - oxidoreductase t MSMEG_5931 8.5 2.8E-01 t Rv3530c -

t * CHP

MSMEG_5932 13.3 1.7E-01 t Rv3531c e'gRv3534 4-hydroxy-2-oxovalerate t MSMEG_5937 47.1 2.3E-03 t b o hsaFh

O) C aldolase

Acetaldehyde dehydrogenase t MSMEG_5939 55.3 2.8E-03 t c'9Rv3535c hsaGh

2-hydroxypentadoenoate t MSMEG_5940 f* 125.9 2.7E-03 T c gRv3536c hsaEh

3-ketosteroid-D1-

I MSMEG_5941 85.5 1.3E-03 I b c d'9Rv3537 f* kstDh dehydrogenase

2-enoyl acyl-CoA hydratase i MSMEG_5943 88.9 1.3E-03 i b c gRv3538 hsd4Bh

Branched-chain 3-ketoacyl-

T MSMEG_5990 ND ND T c'e'9Rv3540c Itp2 CoA thiolase c e'd'9Rv3541 CHP t MSMEG_5991 242.3 2.4E-03 t C

M. smeqmatis Fold chanqe P-value Mtb Gene Name Function

CHP t MSMEG_5992 1771.8 1.3E-03 t c eid Rv3542c -

Acyl-CoA dehydrogenase t MSMEG_5993 309.2 1.5E-03 t c d gRv3543c fadE29

Short/branched chain acyl- t MSMEG_5994 138.2 1.3E-03 t b'c'e'gRv3544 fadE28

C CoA dehydrogenase

Cytochrome P450 125 t b'c'e'9Rv3545

MSMEG_5995 170.2 1.3E-03 t cyp125 c

Acetyl CoA acetyltransferase

I MSMEG_5996 160.2 1.3E-03 I b c gRv3546 Λ fadA5

CysQ family

I MSMEG_5997 147.6 1.3E-03 - -

CHP

I MSMEG_5998 115.7 1.3E-03 I gRv3547 - en oo t MSMEG_6033 23.8 9.6E-03 - HP

-

3-HSA hydroxylase, reductase t MSMEG_6035 114.2 1.3E-03 T b'9Rv3567c hsaBh

3,4-DHSA dioxygenase t MSMEG_6036 121.3 1.3E-03 T 9Rv3568c hsaCh b'c'd'9Rv3569 4,9-DHSA hydrolase t MSMEG_6037 315.4 1.7E-03 t hsaDh c b,c,d,gRv3570 3-HSA hydroxylase, t MSMEG_6038 136.5 1.3E-03 t hsaAh C oxygenase

Ketosteroid-9a-hydroxylase, i MSMEG_6039 66.7 1.6E-03 Cl9Rv3571 ** kshBh reductase

M. smegmatis Fold chanqe P-value Mtb Gene Name Function

CHP i MSMEG_6040 30.2 4.0E-03 I RV3572 -

Acyl-CoA dehydrogenase

T MSMEG_6041 93.2 1.3E-03 T b'9Rv3573c fadE34

TetR regulator

I MSMEG_6042 -12.5 1.6E-02 I b'c'e'9Rv3574 * kstR

Trehalose phosphatase i MSMEG_6043 60.1 1.6E-03 - -

Oxidoreductase

T MSMEG_6474 73.5 1.5E-03 i RvO 139 -

CHP t MSMEG_6475 92.3 1.3E-03 I RvO138 -

(D 3 ND No data. b Induced in palmitic acid at least 1.5 fold (Schnappinger er a/., 2003) c Induced in macrophages (Schnappinger et al., 2003) d Essential for survival in macrophages (Rengarajan ef a/., 2005; Rosas-Magallanes ef a/., 2007) e Essential for survival in mice (Cox ef a/., 1999; Infante ef a/., 2005; Sassetti and Rubin, 2003) f Motifs that were not originally predicted

9 Induced by growth on cholesterol in Rhodococcus sp. strain RHA1 (Van der Geize ef a/., 2007) h Recently assigned to cholesterol degradation pathway (Van der Geize ef a/., 2007). Note that some of these genes were re-named in that study.

' Arrows represent gene direction in relation to the rest of the genome. Sequential runs of genes are grouped together by horizontal lines.

In the most recent annotation of the M. smegmatis genome the gene numbering was not sequential in a number of cases e.g. with

MSMEG_0302/MSMEG_0304 where the genes are adjacent to each other but are not numbered as such.

' MSMEG__5905 is an annotated ORF which would break up this operon, but is only 31 amino acids long, and its location conflicts with

MSMEG_5906, so may be a mis-annotation.

Table 5. The KstRwm, regulon

M. smegmatis M. tuberculosis* Fold change P-value Gene NamtT Function

T MSMEG 0217 f* * RvO162c No data No data adhE1 Alcohol dehydrogenase

t MSMEG_0302 t Rv1426c 1.2 S 5.53E-01 lipO Esterase i t MSMEG_0304 t Rv1427c 1.0 I 9.97E-01 fadD12 Fatty acid CoA synthetase

t MSMEG 0305 * t Rv1428c No data i No data CHP

t MSMEG_ _0309 * t RvO223c No data 1 No data - Aldehyde dehydrogenase

O t T

MSM EG_ _1098 * bRv0551c * 2.15 I 2.86E-02 fadD8 Fatty acid CoA synthetase

4 MSMEG. J 410 f* I eRv0687 1.50 1.77E-01 - Short-chain type dehydrogenase/reductase t MSMEG. _2644 T Rv2800 -1.01 9.73E-01 CHP t t MSMEG _2645 * Rv2799 -1.11 i 7.79E-01 Membrane protein

t MSMEG_2789 t Rv2669 -1.18 7.82E-01 CHP

t MSMEG 2790 T Rv2668 1.01 9.76E-01 CHP

Short-chain type t MSMEG 3515 dehydrogenase/reductase

M. smeqmatis M . tuberculosis'* Fold chanqe P-value Gene Name Function

1 MSMEG_3516 - - CHP, pseudogene?

I MSMEG_3519 * t b cRv1894c * 5.00 7.75E-03 - CHP

T MSMEG_3658 - - Fumarate reductase/succinate dehydrogenase

I MSMEG_3843 * t b cRv1628c * 3.26 1.49E-03 CHP

MSMEG_3844 'T bRv1627c 4.82 6.28E-04 Non-specific lipid transfer protein

t MSMEG_5202 cRv1132 -1.06 8.56E-01 - Conserved membrane protein

MSMEG_5228 * t Rv1106c -1.08 8.92E-01 - Cholesterol dehydrogenase

i MSMEG_5286 * Rv1059 -1.15 6.85E-01 dapB Dihydrodipicolinate reductase

T MSMEG_5519 * - Monooxygenase

MSMEG_5520 RvO953c 1.11 7.84E-01 Oxidoreducatase

Putative anti-terminator response t MSMEG_5554 * regulator MSMEG_5555 Rv0940c * 4.03 1.10E-03 Oxidoreductase

Short-chain type

I MSMEG 5584 i RvO927c 1.02 9.54E-01 dehydrogenase/reductase

M. smegmatis M. tuberculosis'* Fold change P-value Gene Name Function i MSMEG_ 5586 1 RvO926c 2.52 2.03E-01 - CHP t

T MSMEG_ .5893 Rv3492c -1.10 7.55E-01 - CHP MCE associated protein t t MSMEG_ .5894 Rv3493c No data No data - CHP MCE associated protein t t MSMEG_ 5895 Rv3494c No data No data mce4Fh mce4 operon: lipid transfer

t MSMEG_ .5896 t e'9Rv3495c -1.05 8.99E-01 Mce4Eh mce4 operon: lipid transfer t t MSMEG_ .5897 9Rv3496c 1.01 9.75E-01 mce4Dh mce4 operon: lipid transfer t t MSMEG. _5898 e'9Rv3497c -1.57 1.02E-01 mce4Ch rπce4 operon: lipid transfer

N) f t MSMEG. _5899 gRv3498c 1.06 8.75E-01 mce4Bh mce4 operon: lipid transfer t t MSMEG. _5900 e<9Rv3499c -1.14 7.57E-01 mce4Ah mce4 operon: lipid transfer

t MSMEG 1.5901 T 9Rv3500c -1.28 5.13E-01 YrbE4B/supBh mce4 operon: lipid transfer

t MSMEG _5902 * t e>gRv3501c '* -1.34 3.96E-01 YrbE4A/supAh mce4 operon: lipid transfer t t MSMEG _5903 b e gRv3502c 2.22 1.57E-02 Hsd4Ah 17b-hydroxysteroid dehydrogenase

t MSMEG 5904' f cRv3503c 2.88 2.51 E-03 fdxD Ferredoxin

M. smeqmatis M . tuberculosis Fold change P-value Gene Name Function i MSMEG_5906 I b e gRv3504 1.89 3.82E-02 fadE26h Acyl CoA dehydrogenase i MSMEG_5907 I b Cl9Rv3505 1.96 2.01E-01 fadE27h Acyl CoA dehydrogenase i MSMEG_5908 I 9Rv3506 1.37 2.70E-01 fadD17h Fatty acid CoA synthetase

I MSMEG_5909 - - Oxidoreductase

i MSMEG_5911 - - AraC-like transcriptional regulator

t MSMEG_5913 - - Diooxygenase

t MSMEG_5914 T

* b c'gRv3515c * 10.87 3.92E-04 fadD19h Fatty acid CoA synthetase ω

I MSMEG_5915 i b c gRv3516 4.53 6.28E-04 echA19h Enoyl-CoA hydratase

Rv3517 2.06 2.78E-02

t MSMEG_5918 T Rv3518c 4.86 1.07E-03 cyp142 cytochrome P450 monoxygenase

t MSMEG_5919 * I gRv3519 2.81 3.34E-03 CHP

Co-enzyme F420-dependent T MSMEG_5920 * t 9Rv3520c * 1.73 6.39E-02 oxidoreductase I MSMEG 5921 i 9Rv3521 3.20 2.42E-03 CHP

M. smegmatis M . tuberculosis'* Fold change P-value Gene Name Function

I MSMEG_5922 I 9Rv3522 3.19 1.53E-03 Itp4h 3-keto-acyl-CoA thiolase

I MSMEG_5923 i e'9Rv3523 2.43 5.17E-03 Itp3h 3-keto-acyl-CoA thiolase

Rv3424 2.14 9.71 E-03

I MSMEG_5925 * 4 b c gRv3526 * 8.13 6.28E-04 kshAh Ketosteroid-9a-hydroxylase

MSMEG_5927 I b c gRv3527 5.97 4.47E-04 - CHP t t MSMEG_5930 Rv3529c 4.29 1.45E-03 - CHP

t MSMEG_5931 t Rv3530c 9.71 3.92E-04 oxidoreductase

J*. t MSMEG_5932 t * Rv3531c 10.23 4.38E-04 _ CHP

t MSMEG_5937 t b c e>9Rv3534 3.96 1.23E-03 hsaFh 4-hydroxy-2-oxovalerate aldolase

t MSMEG_5939 c'9Rv3535c 3.49 5.17E-03 hsaGh Acetaldehyde dehydrogenase

t MSMEG_5940 t f* c gRv3536c f* 2.62 5.17E-03 hsaEh 2-hydroxypentadoenoate

I MSMEG_5941 i b,c,d,gRv3537 3.84 5.17E-03 kstDh 3-ketosteroid-D 1 -dehydrogenase

I MSMEG_5943 i b c gRv3538 3.17 1.61 E-03 hsd4Bh 2-enoyl acyl-CoA hydratase

M. smegmatis ft 1. tuberculosis* Fold change P-value Gene Name Function t Branched-chain 3-ketoacyl-CoA

MSMEG_5990 t c e gRv3540c 3.64 1.07E-03 Itp2 thiolase t MSMEG_5991 t c e d'9Rv3541 2.44 6.13E-03 - CHP t MSMEG_5992 t c e d'Rv3542c 3.92 1.23E-03 - CHP t MSMEG_5993 T c d'9Rv3543c 2.06 2.27E-02 fadE29 Acyl-CoA dehydrogenase

Short/branched chain acyl-CoA t MSMEG_5994 t b,c,e,gRv3544 4.04 1.45E-03 fadE28 dehydrogenase t MSMEG_5995 t

* b'c e'9Rv3545 * 8.78 3.92E-04 cyp125 Cytochrome P450 125

i MSMEG_5996 4 b c'9Rv3546 4.92 6.09E-04 fadA5 Acetyl CoA acetyltransferase

Ui

MSMEG_5997 - - CysQ family

i MSMEG_5998 4 gRv3547 1.36 3.11 E-01 - CHP

t MSMEG 6033 HP

RV3566A 4.61 1.04E-03

Rv3566c 3.04 1.62E-03

t MSMEG 6035 t bl9Rv3567c 5.60 6.09E-04 hsaBn 3-HSA hydroxylase, reductase

M. smegmatis M . tuberculosis'* Fold change P-value Gene Name Function t MSMEG_6036 t sRv3568c 4.14 8.00E-04 hsaCh 3,4-DHSA dioxygeπase t MSMEG_6037 t b c d gRv3569 3.37 2.52E-03 hsaDh 4,9-DHSA hydrolase t MSMEG_6038 t b,c,d,gRv3570 * ** 8.15 4.38E-04 hsaAh 3-HSA hydroxylase, oxygenase

4 MSMEG_6039 I c gRv3571 4.28 8.00E-04 kshBh Ketosteroid-9a-hydroxylase, reductase

I MSMEG_6040 I Rv3572 2.12 1.29E-02 - CHP

T MSMEG_6041 t

* b'9Rv3573c * 1.48 2.44E-01 fadE34 Acyl-CoA dehydrogenase

I MSMEG_6042 b'c e'9Rv3574 -1.21 6.72E-01 kstR TetR regulator

I MSMEG_6043 - - Trehalose phosphatase

t MSMEG_6474 i RvO 139 -1.06 8.56E-01 Oxidoreductase

t MSMEG 6475 I RvO138 1.30 8.89E-01 CHP

3 ND No data. b Induced in palmitic acid at least 1.5 fold (Schnappinger ef al., 2003) clnduced in macrophages (Schnappinger ef a/., 2003) d Essential for survival in macrophages (Rengarajan et al., 2005; Rosas-Magallanes et al., 2007) e Essential for survival in mice (Cox ef a/., 1999; Infante ef al., 2005; Sassetti and Rubin, 2003) Motifs that were not originally predicted

9 Induced by growth on cholesterol in Rhodococcus sp. strain RHA1 (Van der Geize et al., 2007)

Recently assigned to cholesterol degradation pathway (Van der Geize et al., 2007). Note that some of these genes were re-named in that study.

' Arrows represent gene direction in relation to the rest of the genome. Sequential runs of genes are grouped together by horizontal lines. In the most recent annotation of the M. smegmatis genome the gene numbering was not sequential in a number of cases e.g. with MSMEG_0302/MSMEG_0304 where the genes are adjacent to each other but are not numbered as such.

^ MSMEG_5905 is an annotated ORF which would break up this operon, but is only 31 amino acids long, and its location conflicts with MSMEG_5906, so may be a mis-annotation. k Genes in bold are induced in M. tuberculosis and M. smegmatis; Genes not in bold are induced in M. smegmatis but not in M. tuberculosis.

The 49 genes ofthe M. tuberculosis kstR regulon are RvO551c, RvO687, Rv1894c, Rv1628c, Rv1627c, Rv0940c, Rv3502c, Rv3503c, Rv3504, Rv3505, Rv3506, Rv3515c, Rv3516, Rv3517, Rv3518c, Rv3519, Rv3520c, Rv3521, Rv3522, Rv3523, Rv3424, Rv3526, Rv3527, Rv3529c, Rv3530c, Rv3531c, Rv3534, Rv3535c, Rv3536c, Rv3537, Rv3538, Rv3540c, Rv3541, Rv3542c, Rv3543c, Rv3544, Rv3545, Rv3546, Rv3547, Rv3566A, Rv3566c, Rv3567c, Rv3568c, Rv3569, Rv3570, Rv3571, Rv3572, Rv3573c and Rv3574.

^4

References

Alderwick, LJ., Radmacher, E., Seidel, M., Gande, R., Hitcheπ, P.G., Morris, H. R., Dell, A., Sahm, H., Eggeling, L., and Besra, G. S. (2005) Deletion of Cg-emb in corynebacterianeae leads to a novel truncated cell wall arabinogalactan, whereas inactivation of Cg-ubiA results in an arabinan-deficient mutant with a cell wall galactan core. J Biol Chem 280: 32362-32371.

Anderton, M.C., Bhakta, S., Besra, G. S., Jeavons, P., Eltis, L.D., and Sim, E. (2006) Characterization of the putative operon containing arylamine N-acetyltransferase (nat) in Mycobacterium bovis BCG. MoI Microbiol 59: 181-192. Ando, M., Yoshimatsu, T., Ko1 C, Converse, P. J., and Bishai, W.R. (2003) Deletion of Mycobacterium tuberculosis sigma factor E results in delayed time to death with bacterial persistence in the lungs of aerosol-infected mice. Infect lmmun 71: 7170-7172.

Arruda, S., Bomfim, G., Knights, R., Huima-Byron, T., and Riley, LW. (1993) Cloning of an M. tuberculosis DNA fragment associated with entry and survival inside cells. Science 261: 1454-1457.

Bailey, T.L, and Elkan, C. (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc lnt Conf Intel! Syst MoI Biol 2: 28-36.

Bailey, T.L, and Gribskov, M. (1998) Combining evidence using p-values: application to sequence homology searches. Bioinformatics 14: 48-54. Baulard, A.R., Betts, J. C1 Engohang-Ndong, J., Quan, S., McAdam, R.A., Brennan, P.J., Locht, C1 and Besra, G.S. (2000) Activation of the pro-drug ethionamide is regulated in mycobacteria. J Biol Chem 275: 28326-28331.

Bergmeyer (1974) Methods of Enzymatic Analysis Vol. 4, Academic Press, New York, NY pp.2066-2072. Bertrand, K.P., Postle, K., Wray, LV., Jr., and Reznikoff, W.S. (1983) Overlapping divergent promoters control expression of Tn-ZO tetracycline resistance. Gene 23: 149- 156.

Bhakta, S., Besra, G.S., Upton, A.M., Parish, T., Sholto-Douglas-Vernon, C, Gibson, K.J., Knutton, S., Gordon, S., DaSilva, R.P., Anderton, M. C, and Sim, E. (2004) Arylamine N-acetyltransferase is required for synthesis of mycolic acids and complex lipids in Mycobacterium bovis BCG and represents a novel drug target. J Exp Med 199: 1191-1199.

Bishai, W. (2000) Lipid lunch for persistent pathogen. Nature 406: 683-685.

Burguiere, A., Hitchen, P.G., Dover, LG., Kremer, L1 Ridell, M., Alexander, D.C., Liu, J., Morris, H. R., Minnikin, D.E., Dell, A., and Besra, G.S. (2005) LosA, a key glycosyltransferase involved in the biosynthesis of a novel family of glycosylated acyltrehalose lipooligosaccharides from Mycobacterium marinum. J Biol Chem 280: 42124-42133.

Calamita, H., Ko, C1 Tyagi, S., Yoshimatsu, T., Morrison, N.E., and Bishai, W.R. (2005) The Mycobacterium tuberculosis SigD sigma factor controls the expression of ribosome- associated gene products in stationary phase and is required for full virulence. Cell Microbiol 7: 233-244.

Camus, J. C, Pryor, MJ. , Medigue, C1 and Cole, ST. (2002) Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology 148: 2967-2973. Carver, TJ. , Rutherford, K.M., Berriman, M., Rajandream, MA, Barrell, B. G., and Parkhill, J. (2005) ACT: the Artemis Comparison Tool. Bioinformatics 21 : 3422-3423.

Casali, N., Konieczny, M., Schmidt, M.A., and Riley, L.W. (2002) Invasion activity of a Mycobacterium tuberculosis peptide presented by the Escherichia coli AIDA autotransporter. Infect lmmun 70: 6846-6852.

Casali, N., White, A.M., and Riley, L.W. (2006) Regulation of the Mycobacterium tuberculosis mce1 operon. J Bacteriol 188: 441-449.

Chen, P., Ruiz, R.E., Li, Q., Silver, R.F., and Bishai, W.R. (2000) Construction and characterization of a Mycobacterium tuberculosis mutant lacking the alternate sigma factor gene, sigF. Infect lmmun 68: 5575-5580.

Cole, S.T., Brosch, R., Parkhill, J., Gamier, T., Churcher, C1 Harris, D., Gordon, S.V., Eiglmeier, K., Gas, S., Barry, C. E., 3rd, Tekaia, F., Badcock, K., Basham, D., Brown, D., Chillingworth, T., Connor, R., Davies, R., Devlin, K., Feltwell, T., Gentles, S., Hamlin, N., Holroyd, S., Hornsby, T., Jagels, K., Krogh, A., McLean, J., Moule, S., Murphy, L., Oliver, K., Osborne, J., Quail, M.A., Rajandream, M.A., Rogers, J., Rutter, S., Seeger, K., Skelton, J., Squares, R., Squares, S., Sulston, J. E., Taylor, K., Whitehead, S., and Barrell, B. G. (1998) Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393: 537-544.

Cole, ST., Eiglmeier, K., Parkhill, J., James, K.D., Thomson, N. R., Wheeler, P.R., Honore, N., Gamier, T., Churcher, C, Harris, D., Mungall, K., Basham, D., Brown, D.,

Chillingworth, T., Connor, R., Davies, R.M., Devlin, K., Duthoy, S., Feltwell, T., Fraser,

A., Hamlin, N., Holroyd, S., Hornsby, T., Jagels, K., Lacroix, C, Maclean, J., Moule, S.,

Murphy, L1 Oliver, K., Quail, M.A., Rajandream, M.A., Rutherford, K.M., Rutter, S.,

Seeger, K., Simon, S., Simmonds, M., Skelton, J., Squares, R., Squares, S., Stevens, K., Taylor, K., Whitehead, S., Woodward, J. R., and Barrell, B. G. (2001) Massive gene decay in the leprosy bacillus. Nature 409: 1007-1011.

Corbett, E.L., Watt, C.J., Walker, N., Maher, D., Williams, B.G., Raviglione, M.C., and Dye, C. (2003) The growing burden of tuberculosis: global trends and interactions with the HIV epidemic. Arch Intern Med 163: 1009-1021. Crooks, G. E., Hon, G. Chandonia, J. M. and Brenner, S. E. (2004) WebLogo: a sequence logo generator. Genome Res 14: 1188-1190.

DiRusso, CC, Tsvetnitsky, V., Hojrup, P., and Knudsen, J. (1998) Fatty acyl-CoA binding domain of the transcription factor FadR. Characterization by deletion, affinity labeling, and isothermal titration calorimetry. J Biol Chem 273: 33652-33659. Dirusso, CC, and Black, P.N. (2004) Bacteria! long chain fatty acid transport: gateway to a fatty acid-responsive signaling system. J Biol Chem 279: 49563-49566.

Dobson, G., Minnikin D.E., Minnikin S.M., Partlett J. H., Goodfellow M., and Ridell M. M. M (1985) Systematic analysis of complex mycobacteria/ lipids.

Dover, L.G., Corsino, P.E., Daniels, I. R., Cocklin, S. L., Tatituri, V., Besra, G. S., and Futterer, K. (2004) Crystal structure of the TetR/CamR family repressor Mycobacterium tuberculosis EthR implicated in ethionamide resistance. J MoI Biol 340: 1095-1105.

Engohang-Ndong, J., Baillat, D., Aumercier, M., Bellefontaine, F., Besra, G. S., Locht, C, and Baulard, A.R. (2004) EthR, a repressor of the TetR/CamR family implicated in ethionamide resistance in mycobacteria, octamerizes cooperatively on its operator. MoI Microbiol 51: 175-188. Flynπ, J. L. (2006). Lessons from experimental Mycobacterium tuberculosis infections. Microbes Infect 8, 1179-1188.

Frenois, F., Engohang-Ndong, J., Locht, C, Baulard, A.R., and Villeret, V. (2004) Structure of EthR in a ligand bound conformation reveals therapeutic perspectives against tuberculosis. MoI Cell 16: 301-307.

Gatfield, J., and Pieters, J. (2000) Essential role for cholesterol in entry of mycobacteria into macrophages. Science 288: 1647-1650.

Greisman, H.A. and Pabo, CO. (2004) A General Strategy for Selecting High-Affinity Zinc Finger Proteins for Diverse DNA Target Sites. Science 275: 657-661. Grkovic, S., Brown, M. H., Roberts, NJ., Paulsen, I.T., and Skurray, R.A. (1998) QacR is a repressor protein that regulates expression of the Staphylococcus aureus multidrug efflux pump QacA. J Biol Chem 273: 18665-18673.

Grkovic, S., Brown, M. H., Schumacher, M.A., Brennan, R.G., and Skurray, R.A. (2001 ) The staphylococcal QacR multidrug regulator binds a correctly spaced operator as a pair of dimers: J Bacteriol 183: 7102-7109.

Guex, N., and Peitsch, M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modelling. Electrophoresis 18: 2714-2723. tram, S. H., and Cronan, J. E. (2005) Unexpected functional diversity among FadR fatty acid transcriptional regulatory proteins. J Biol Chem 280: 32148-32156. Iram, S. H., and Cronan, J. E. (2006) The beta-oxidation systems of Escherichia coli and Salmonella enterica are not functionally equivalent. J Bacteriol 188: 599-608.

Kako et al (1998). Examination of DNA-binding activity of neuronal transcription factors by electrophoretical mobility shift assay. Brain Research Protocols 2(4):243-249.

Kendall, S.L., Rison, S.C., Movahedzadeh, F., Frita, R., and Stoker, N. G. (2004) What do microarrays really tell us about M. tuberculosis? Trends Microbiol 12: 537-544.

Krug, A., Wendisch, V.F., and Bott, M. (2005) Identification of AcnR, a TetR-type repressor of the aconitase gene acn in Corynebacterium glutamicum. J Biol Chem 280: 585-595.

Kumar, A., Bose, M., and Brahmachari, V. (2003) Analysis of expression profile of mammalian cell entry (mce) operons of Mycobacterium tuberculosis. Infect lmmun 71: 6083-6087.

Kumar, A., Chandolia, A., Chaudhry, U., Brahmachari, V., and Bose, M. (2005) Comparison of mammalian cell entry operons of mycobacteria: in silico analysis and expression profiling. FEMS Immunol Med Microbiol A3: 185-195. Malhotra, V., Sharma, D., Ramanathan, V.D., Shakila, H., Saini, D. K., Chakravorty, S., Das, T.K., Li1 Q., Silver, R.F., Narayanan, P.R., and Tyagi, J.S. (2004) Disruption of response regulator gene, devR, leads to attenuation in virulence of Mycobacterium tuberculosis. FEMS Microbiol Lett 231: 237-245.

Martin, C, Williams, A., Hemandez-Pando, R., Cardona, P.J., Gormley, E., Bordat, Y., Soto, C.Y., Clark, S.O., Hatch, GJ. , Aguilar, D., Ausina, V., and Gicquel, B. (2006) The live Mycobacterium tuberculosis phoP mutant strain is more attenuated than BCG and confers protective immunity against tuberculosis in mice and guinea pigs. Vaccine 24: 3408-3419. McKinney, J. D., Honer zu Bentrup, K., Munoz-Elias, E.J., Miczak, A., Chen, B., Chan, W.T., Swenson, D., Sacchettini, J. C, Jacobs, W.R., Jr., and Russell, D.G. (2000) Persistence of Mycobacterium tuberculosis in macrophages and mice requires the glyoxylate shunt enzyme isocitrate lyase. Nature 406: 735-738. Mitra, D., Saha, B., Das, D., Wiker, H.G., and Das, A.K. (2005) Correlating sequential homology of McelA, Mce2A, Mce3A and Mce4A with their possible functions in mammalian cell entry of Mycobacterium tuberculosis performing homology modeling. Tuberculosis (Edinb) 85: 337-345.

Murphy et al (2001 ). Use of fluorescently labelled DNA and a scanner for electrophoretic mobility shift assays. Biotechniques 30: 504-511.

Orth, P., Schnappinger, D., Hillen, W., Saenger, W., and Hinrichs, W. (2000) Structural basis of gene regulation by the tetracycline inducible Tet repressor-operator system. Nat Struct Biol 7: 215-219.

Parish, T., and Stoker, N. G. (1998) Electroporation of mycobacteria. Methods MoI Biol 101: 129-144.

Parish, T., and Stoker, N. G. (2000) Use of a flexible cassette method to generate a double unmarked Mycobacterium tuberculosis tlyA plcABC mutant by gene replacement. Microbiology 146 ( Pt 8): 1969-1975.

Parish, T., Smith, D.A., Roberts, G., Betts, J., and Stoker, N.G. (2003) The senX3-regX3 two-component regulatory system of Mycobacterium tuberculosis is required for virulence. Microbiology 149: 1423-1435.

Perez, E., Samper, S., Bordas, Y., Guilhot, C, Gicquel, B., and Martin, C. (2001 ) An essential role for phoP in Mycobacterium tuberculosis virulence. MoI Microbiol 41 : 179- 187. Plant, A.L., Brigham-Burke, M., Petrella, E.C. and O'Shannessy, DJ. (1995) Phospholipid/alkanethiol bilayers for cell-surface receptor studies by surface plasmon resonance Analyt Biochem 226(2): 342-348.

Ramos, J. L., Martinez-Bueno, M., Molina-Henares, AJ. , Teran, W., Watanabe, K., Zhang, X., Gallegos, M.T., Brennan, R., and Tobes, R. (2005) The TetR family of transcriptional repressors. Microbiol MoI Biol Rev 69: 326-356.

Rao, V., Gao, F., Chen, B., Jacobs, W.R., Jr., and Glickman, M.S. (2006) Trans- cyclopropanation of mycolic acids on trehalose dimycolate suppresses Mycobacterium tuberculosis -induced inflammation and virulence. J CHn /nvesf 116: 1660-1667.

Rengarajan, J., Bloom, B.R., and Rubin, EJ. (2005) Genome-wide requirements for Mycobacterium tuberculosis adaptation and survival in macrophages. Proc Natl Acad Sci U S A 102: 8327-8332.

Reuter, M., Kupper, D., Meisel, A., Schroeder, C. and Kruger, D. H. (1998) Cooperative binding properties of restriction endonuclease EcoRII with DNA recognition sites. J Biol Chem 273: 8294-8300. Rickman, L., Saldanha, J.W., Hunt, D.M., Hoar, D.N., Colston, MJ., Millar, J. B., and Buxton, R.S. (2004) A two-component signal transduction system with a PAS domain- containing sensor is required for virulence of Mycobacterium tuberculosis in mice. Biochem Biophys Res Commun 314: 259-267.

Rosas-Magallanes, V., Stadthagen-Gomez, G., Rauzier, J., Barreiro, L. B., Tailleux, L., Boudou, F., Griffin, R., Nigou, J., Jackson, M., Gicquel, B., and Neyrolles, O. (2007) Signature-tagged transposon mutagenesis identifies novel Mycobacterium tuberculosis genes involved in the parasitism of human macrophages. Infect lmmun 75: 504-507.

Russell, D. G., Mwandumba, H. C, and Rhoades, E. E. (2002) Mycobacterium and the coat of many lipids. J Cell Biol 158: 421-426. Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R1 Horn GT, Mullis KB, Erlich HA. (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 239:487-491.

SaIi, A. and Blundell, T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J MoI Biol 234: 779-815. Sambrook, J. and Russell, D.W. (2001) Molecular Cloning A Laboratory Manual. Third Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.

Sandy, J., Mushtaq, A., Kawamura, A., Sinclair, J., Sim, E., and Noble, M. (2002) The structure of arylamine N-acetyltransferase from Mycobacterium smegmatis-an enzyme which inactivates the anti-tubercular drug, isoπiazid. J MoI Biol 318: 1071-1083. Santangelo, M. P., Goldstein, J., Alito, A., Gioffre, A., Caimi, K., Zabal, O., Zumarraga, M., Romano, M.I., Cataldi, A.A., and Bigi, F. (2002) Negative transcriptional regulation of the mce3 operon in Mycobacterium tuberculosis. Microbiology 148: 2997-3006.

Sassetti, CM., and Rubin, EJ. (2003) Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci U S A 100: 12989-12994. Schnappinger, D., Ehrt, S., Voskuil, M. I., Liu, Y., Mangan, J.A., Monahan, I. M., Dolganov, G., Efron, B., Butcher, P. D., Nathan, C, and Schoolnik, G. K. (2003) Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosome! Environment. J Exp Med 198: 693-704.

Schumacher, M.A., Miller, M. C, Grkovic, S., Brown, M. H., Skurray, R.A., and Brennan, R.G. (2001) Structural mechanisms of QacR induction and multidrug recognition. Science 294: 2158-2163.

Shen, Z., Peedikayil, J., Olson, G.K., Siebert, P.D. and Fang, Y. (2002) Multiple transcription factor profiling by enzyme-linked immunoassay. Biotechniques 32(5): 1 168- 1177. Snapper, S.B., Melton, R.E., Mustafa, S., Kieser, T., and Jacobs, W.R., Jr. (1990) Isolation and characterization of efficient plasmid transformation mutants of Mycobacterium smegmatis. MoI Microbiol A: 1911-1919.

Sun, R., Converse, PJ. , Ko, C1 Tyagi, S., Morrison, N.E., and Bishai, W.R. (2004) Mycobacterium tuberculosis ECF sigma factor sigC is required for lethality in mice and for the conditional expression of a defined gene set. MoI Microbiol 52: 25-38.

Talaat, A.M., Lyons, R., Howard, ST., and Johnston, S. A. (2004) The temporal expression profile of Mycobacterium tuberculosis infection in mice. Proc Natl Acad Sci U S A 101: 4602-4607.

Tan, Y., Rouse, J., Zhang, A., Cariati, S., Cohen, P. and Comb, MJ. (1996) FGF and stress regulate CREB and ATF-1 via a pathway involving p38 MAP kinase and MAPKAP kinase-2. EMBO J 15(17): 4629-42.

Thompson, J. D., Higgins, D. G., and Gibson, TJ. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22: 4673- 4680. Uchida, Y., Casali, N., White, A., Morici, L., Kendall, LV., and Riley, L.W. (2007) Accelerated immunopathological response of mice infected with Mycobacterium tuberculosis disrupted in the mce1 operon negative transcriptional regulator. Ce// Microbiol. Van der Geize, R., Yam, K., Heuser, T., Wilbrink, M. H., Hara, H., Anderton, M. C1 Sim, E., Dijkhuizen, L., Davies, J. E., Mohn, W.W., and Eltis, LD. (2007) A gene cluster encoding cholesterol catabolism in a soil actinomycete provides insight into Mycobacterium tuberculosis survival in macrophages. Proc Natl Acad Sci U S A.

Walters, S.B., Dubnau, E., Kolesnikova, I., Laval, F., Daffe, M., and Smith, I. (2006) The Mycobacterium tuberculosis PhoPR two-component system regulates genes essential for virulence and complex lipid biosynthesis. MoI Microbiol 60: 312-330.

Wernisch, L1 Kendall, S. L1 Soneji, S., Wietzorrek, A., Parish, T., Hinds, J., Butcher, P.D., and Stoker, N.G. (2003) Analysis of whole-genome microarray replicates using mixed models. Bioinformatics 19: 53-61. Wolfe SA, Ramm El, and Pabo CO. (2000) Combining structure-based design with phage display to create new Cys(2)His(2) zinc finger dimers. Structure 15; 8(7) 739-50.

Zahrt, T.C., and Deretic, V. (2001) Mycobacterium tuberculosis signal transduction system required for persistent infections. Proc Natl Acad Sci U S A 98: 12706-12711.

CLAIMS

1. A method of identifying an agent which modulates at least one activity or function of the Mycobacterium tuberculosis (M. tuberculosis) protein Rv3574 or an orthologue thereof from an actinomycete, the method comprising: providing M. tuberculosis Rv3574 or an orthologue thereof from an actinomycete; providing double stranded DNA (dsDNA) comprising the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) under conditions which allow the Rv3574 or orthologue thereof to bind to the dsDNA, wherein independently:

X1 is C, G or T X2 is A, C, G or T X3 is A, C, G or T X4 is A, C, G or T X5 Is A, C, G or T

X6 is G or T X7 is A, C, G or T X8 is A, C, G or T and X9 is A, G or T; providing a test agent; and determining whether the test agent modulates at least one activity or function of the M. tuberculosis Rv3574 or the orthologue thereof.

2. A method according to Claim 1 wherein independently, X1 is T

X6 is T, and X9 is A.

3. A method according to Claim 1 wherein, independently, X1 is C, G or T

X2 is A or G X3 is A, C, G or T

X4 is A, C, G or T, but preferably A, C or G X5 is A, C, G or T, but preferably C, G or T X6 is G or T X7 is A, C, G or T X8 is C or T and X9 is A, G or T.

4. A method according to any of Claims 1-3 wherein, independently,

X1 Js T X2 is A X3 is G X4 is A or G X5 is T

X6 is T X7 is C and X9 is A.

5. A method according to any of Claims 1-4 wherein when X1 is T, X9 is A.

6. A method according to any of Claims 1-5 wherein when X3 is G, X7 is C.

7. A method according to any of Claims 1-6 wherein when X2 is A, X8 is T.

8. A method according to any of Claims 1-6 wherein when X2 is G, X8 is C.

9. A method according to any of Claims 1-4 wherein the dsDNA comprises a sequence selected from the group consisting of TAGAACATGTTCCA (SEQ ID No: 13), TAGAACATGTTCTA (SEQ ID NO: 14), TAGAACGTGTTCCA (SEQ ID No: 15) and TAGAACGTGTTCTA (SEQ ID NO: 16).

10. A method according to any of Claims 1-4 wherein the dsDNA comprises the sequence TAGAACX4X5GTTCTA (SEQ ID NO: 17) or a variant thereof which further contains one or two alternative nucleotides at any of the positions other than X4 and X5.

11. A method according to any of Claims 1-10 wherein the dsDNA comprises the sequence X1OX1X2X3AACX4X5GTX6X7X8X9XI IX12 (SEQ ID NO: 24) wherein, independently, X10 is A, C or T Xn is A, C, T or G and

12. A method according to Claim 11 wherein, independently, X10 is a C

Xn is a G and X12 is a T.

13. A method according to Claim 12 wherein the dsDNA comprises a sequence selected from the group consisting of

CTAGAACACGTTCCAGT (SEQ ID NO: 25), CTAGAACACGTTCTAGT (SEQ ID No: 26), CTAGAACATGTTCCAGT (SEQ ID NO: 27), CTAGAACATGTTCTAGT (SEQ ID No: 28), CTAGAACGCGTTCCAGT (SEQ ID NO: 29), CTAGAACGCGTTCTAGT (SEQ ID No: 30), CTAGAACGTGTTCCAGT (SEQ ID NO: 31) and CTAGAACGTGTTCTAGT (SEQ ID No: 32).

14. A method according to any of Claims 11-13 wherein the dsDNA comprises the sequence X13XIOXIX2X3AACX4X5GTX6X7X8X9XIIXI2 (SEQ ID NO: 33) wherein

X13 is A, G or T.

15. A method according to Claim 14 wherein

16. A method according to any of Claims 1-7 and 9-15 wherein the dsDNA comprises the sequence TGCCCACTAGAACGTGTTCTAATAGTGCT (SEQ ID No: 42).

17. A method according to any of Claims 1-16, wherein the dsDNA comprises a consecutive nucleotide sequence of at least 14 base pairs from a control region of a Mycobacterium gene selected from genes 1-83 listed in Table 4.

18. A method according to any of Claims 1-7 and 9-17, wherein the dsDNA comprises the sequence

CNNNCINo^TAGAACGTGTTCTAATANNNNNNNNGNNCNNNNGTCAAGNNNNNNNN NNTNNNNNC (SEQ ID No: 43) wherein N is A, C, G or T.

19. A method according to any of Claims 1-18 wherein Rv3574 is M. tuberculosis Rv3574 (Figure 2).

20. A method according to Claim 19 wherein, independently, X2 is A or G X3 is A, G or T

X4 is A, C, G or T, but preferably A, C or G X5 is A, C, G or T, but preferably C or T

X7 is A, C or G X8 is C or T and

21. A method according to Claim 20 wherein, independently,

X1 is a T

X2 is an A

X3 is a G

X4 Js a G X5 is a T

X7 is a C

X8 is a C or T and

22. A method according to Claim 21 wherein the dsDNA comprises the sequence TAGAACGTGTTCCA (SEQ ID NO: 15) or the sequence TAGAACGTGTTCTA (SEQ ID No: 16).

23. A method according any of Claims 1-18 wherein the orthologue of M. tuberculosis Rv3574 is a protein from a Mycobacterium, a Nocardia (Nocardioides), a

Rhodococcus or a Streptomycete.

24. A method according to Claim 23 wherein the orthologue of Rv3574 is M. bovis Mb3605 protein (SEQ ID No: 4; Figure 2), M. marinum MM5069 protein (SEQ ID No: 5; Figure 2), M. ulcerans MUL4145 protein (SEQ ID No: 6; Figure 2), M. avium subsp. paratuberculosis MAP0491c protein (SEQ ID No: 7; Figure 2), M. smegmatis MSEG_6042 protein (SEQ ID No: 8; Figure 2), Nocardia farcinica nfa4470 protein (SEQ ID No: 9; Figure 2) or Rhodococcus sp. strain RHA1 kstR protein (SEQ ID No: 10; Figure

2).

25. A method according to Claim 24 wherein the orthologue of the M. tuberculosis Rv3574 protein is M. smegmatis MSEG_6042 protein (SEQ ID No: 8; Figure 2).

26. A method according to Claim 25 wherein, independently, X1 JS a T

X2 is an A X3 is a G

X4 is an A, C, G or T, but preferably A or G X5 is an A, C, G or T, but preferably C or T X6 is a T

X7 is a C

X8 is a C or T and X9 is an A.

27. A method according to Claim 26 wherein the dsDNA comprises the sequence selected from the group consisting of TAGAACACGTTCCA (SEQ ID No: 20), TAGAACACGTTCTA (SEQ ID NO: 21), TAGAACATGTTCCA (SEQ ID No: 13), TAGAACATGTTCTA (SEQ ID NO: 14), TAGAACGCGTTCCA (SEQ ID No: 22), TAGAACGCGTTCTA (SEQ ID NO: 23), TAGAACGTGTTCCA (SEQ ID No: 15) and TAGAACGTGTTCTA (SEQ ID NO: 16).

28. A method according to any of Claims 1-27 wherein the dsDNA is at least 20 base pairs in length.

29. A method according to any of Claims 1-28 wherein the dsDNA is no more than 100 bases pairs in length.

30. A method according to any of Claims 1-29 wherein the M. tuberculosis Rv3574 or orthologue thereof is provided bound to the dsDNA.

31. A method according to any of Claims 1-29 wherein the test agent and the dsDNA are provided before the M. tuberculosis Rv3574 or the orthologue thereof.

32. A method according to any of Claims 1-29 wherein the M. tuberculosis Rv3574 or the orthologue thereof and the dsDNA are provided before the test agent.

33. A method according to any of Claims 1-32 wherein the test agent is a lipid.

34. A method according to Claim 33 wherein the lipid is unsaturated.

35. A method according to Claim 33 wherein the lipid is saturated.

36. A method according to any of Claims 33-35 wherein the lipid is methylated.

37. A method according to any of Claims 33-36 wherein the lipid is selected from the group consisting of a fatty acid, a mycolic acid, a trehalose-6,6-dimycolate, a glycerophospholipid, a triglyceride, a phosphatidylinositol mannoside, a phospholipid, a sphingolipid or a steroid.

38. A method according to Claim 33 wherein the lipid is palmitic acid or a derivative thereof.

39. A method according to Claim 33 wherein the lipid is cholesterol or a derivative thereof.

40. A method according to any of Claims 1-39 wherein the step of determining whether the test agent modulates at least one activity or function of Rv3574 comprises determining whether, and optionally to what extent, the test agent modulates binding of the M. tuberculosis Rv3574 protein or orthologue thereof to the dsDNA.

41. A method according to Claim 40 wherein binding of the Rv3574 protein or orthologue thereof to the dsDNA is assessed using an electrophoretic mobility shift assay (EMSA), a supershift-EMSA, DNase I footprinting, an enzyme-linked immunosorbent assay (ELISA) or a surface plasmon resonance assay.

42. A method according to Claim 40 or 41 comprising determining whether the test agent enhances binding of the Rv3574 protein or the orthologue thereof to the dsDNA, wherein an agent that enhances Rv3574 binding to the dsDNA may act as a permanent repressor of a gene controlled by M. tuberculosis Rv3574 or an orthologue thereof such as those listed in Table 4 or Table 5.

43. A method according to Claim 40 or 41 comprising determining whether the test agent decreases or prevents binding of the Rv3574 protein or orthologue thereof to the dsDNA, wherein an agent that decreases or prevents Rv3574 binding to the dsDNA may act as a de-repressor of a gene controlled by M. tuberculosis Rv3574 or an orthologue thereof such as those listed in Table 4 or Table 5.

44. A method according to any of Claims 1-39 wherein the step of determining whether the test agent modulates at least one activity or function of the M. tuberculosis Rv3574 protein or an orthologue thereof comprises measuring the expression of a reporter gene operably linked to the dsDNA.

45. A method according to Claim 44 wherein the reporter gene is selected from a gene encoding chloramphenicol acetyl transferase (CAT), luciferase, β-galactosidase or Green Fluorescent Protein (GFP).

46. A method according to Claim 44 wherein the reporter gene is a gene regulated by Rv3574 selected from those listed in Table 4 or Table 5.

47. A method according to any of Claims 44-46 wherein expression of the reporter gene is determined by measuring the level of mRNA expressed from the reporter gene, measuring the concentration of a protein encoded by the reporter gene or measuring the activity or function of a protein encoded by the reporter gene.

48. A method according to any of Claims 44-47 comprising determining whether the test agent decreases the expression of the reporter gene, wherein a test agent that decreases the expression of the reporter gene may act as a permanent repressor of a gene controlled by M. tuberculosis Rv3574 or an orthologue thereof such as those listed ■ in Table 4 or Table 5.

49. A method according to any of Claims 44-47 comprising determining whether the test agent increases the expression of the reporter gene, wherein a test agent that increases the expression of the reporter gene may act as a de-repressor of a gene controlled by M. tuberculosis Rv3574 or an orthologue thereof such as those listed in Table 4 or Table 5.

50. A method according to any of Claims 1-49 wherein the method is performed in vitro.

51. A method according to Claim 50 wherein the method is performed in a cell.

52. A method according to Claim 51 wherein the cell is selected from an actinomycete or a macrophage.

53. A method according to Claim 52 wherein the actinomycete is selected from the group consisting of a Mycobacterium, a Nocardia (Nocardioides), a Rhodococcus or a Streptomycete.

54. A method according to any of Claims 1-49 wherein the method is performed in vivo.

55. A method according to any of Claims 1-54 further comprising: selecting a test agent which modulates at least one activity or function of the M. tuberculosis Rv3574 protein or the orthologue thereof; modifying the test agent; and testing the ability of the modified agent to modulate at least one activity or function of the M. tuberculosis Rv3574 protein or the orthologue thereof.

56. A method according to any of Claims 1-55 further comprising determining whether the test agent or modified agent modulates at least one activity or function of the

M. tuberculosis Rv3574 protein or the orthologue thereof in an in vivo model of a disease or condition caused by a pathogenic actinomycete.

57. A method according to Claim 56 further comprising determining whether the test agent or modified agent modulates at least one activity or function of the M. tuberculosis Rv3574 protein or an orthologue thereof in an in vivo model of tuberculosis.

58. A method according to Claim 56 or 57 further comprising determining whether the test agent or modified agent modulates or affects disease severity, duration or progression in an in vivo model of tuberculosis.

59. A method according to any of Claims 1-58 further comprising formulating an agent which has to ability to modulate at least one activity or function of the M. tuberculosis Rv3574 protein or an orthologue thereof into a pharmaceutically acceptable composition.

60. A kit of parts comprising an isolated M. tuberculosis Rv3574 protein or orthoiogue thereof; an isolated dsDNA molecule comprising the sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) wherein:

X1 is C, G or T

X2 is A, C, G or T X3 is A, C, G or T

X4 is A1 C, G or T

X5 is A, C, G or T

X6 is G or T

X7 is A, C, G or T X8 is A, C, G or T and

X9 is A, G or T, independently.

61. A kit according to Claim 60 wherein the isolated M. tuberculosis Rv3574 protein or orthologue thereof is as defined in any of Claims 19 and 23-25.

62. A kit according to Claim 60 or 61 wherein the dsDNA is as defined in any of Claims 2-18, 20-22 and 26-29.

63. A kit according to Claim 60-62 wherein the dsDNA is operably linked to a reporter gene.

64. A kit according to Claim 63 further comprising a substrate for detecting the reporter gene.

65. A kit according to any of Claims 60-64 wherein the Rv3574 protein or orthologue thereof is bound to the isolated dsDNA.

66. A kit according to any of Claims 60-65 wherein the isolated M. tuberculosis Rv3574 protein or orthologue thereof is bound to the dsDNA as a dimer.

67. A method of identifying a gene in an actinomycete that is regulated by M. tuberculosis Rv3574 or an orthologue thereof, the method comprising: identifying, within the control region of a gene, the Rv3574 binding sequence X1X2X3AACX4X5GTX6X7X8X9 (SEQ ID No: 2) wherein, independently: X1 Js C1 G Or T

X2 is A, C, G or T

X3 is A, C, G or T

X4 is A1 C, G or T

X5 is A, C, G or T X6 is G or T

X7 is A, C, G or T

X8 is A, C, G or T and

68. A method according to Claim 67 wherein the Rv3574 binding sequence is as defined in any of Claims 2-18, 20-22 and 26-29.

69. A method according to Claim 67 or 68 further comprising: determining whether the M. tuberculosis Rv3574 or the orthologue thereof affects the expression of the gene containing the Rv3574 binding sequence.

70. A method according to Claim 67-69 wherein the sequence is identified by searching a database in silico, by hybridisation with an polynucleotide probe comprising the Rv3574 binding sequence, or by comparison to an orthologous gene in another actinomycete.

71. A method according to any of Claims 67-70 wherein the actinomycete is a Mycobacterium, a Nocardia (Nocardioides), a Rhodococcus or a Streptomycete.

72. A method according to Claim 71 wherein the Mycobacterium is M. tuberculosis.

Sign in to the Lens

Feedback