What are the steps of eukaryotic mRNA processing?

We begin our detailed study of transcription by looking at the synthesis and processing of mRNAs, the molecules that make up the transcriptome and which specify the protein content of the cell. As the central players in genome expression, mRNAs have received the greatest attention from researchers and we now have a detailed picture of how they are produced. Events in bacteria are different in many respects from those in eukaryotes and so we will deal with the two types of organism in different sections. One aspect of eukaryotic mRNA processing - intron splicing - is so important that it requires a section of its own.

Bacterial mRNAs do not undergo any significant forms of processing: the primary transcript that is synthesized by the RNA polymerase is itself the mature mRNA, and its translation usually begins before transcription is complete (Figure 10.1). This coupling of transcription and translation is important in that it allows special types of control to be applied to the regulation of bacterial mRNA synthesis, as will be described on page 277.

Because there is just one bacterial RNA polymerase (Section 9.2.1), the general mechanism of transcription is the same for all bacterial genes. The following descriptions of elongation and termination, given in the context of mRNA synthesis, therefore apply equally well to the synthesis of non-coding RNA.

The chemical basis of the template-dependent synthesis of RNA was shown in Figure 3.5. Ribonucleotides are added one after another to the growing 3′ end of the RNA transcript, the identity of each nucleotide specified by the base-pairing rules: A base-pairs with T or U; G base-pairs with C. During each nucleotide addition, the β- and γ-phosphates are removed from the incoming nucleotide (see Figure 1.6), and the hydroxyl group is removed from the 3′-carbon of the nucleotide present at the end of the chain.

At this stage of transcription, the bacterial RNA polymerase is in its core enzyme form, comprising four proteins, two relatively small (approximately 35 kDa) α subunits, and one each of the related subunits β and β′ (both approximately 150 kDa): the σ subunit that has the key role in initiation has now left the complex (Section 9.2.3). The RNA polymerase covers about 30 bp of the template DNA, including the transcription bubble of 12–14 bp, within which the growing transcript is held to the template strand of the DNA by approximately eight RNA-DNA base pairs (Figure 10.2). The RNA polymerase has to keep a tight grip on both the DNA template and the RNA that it is making in order to prevent the transcription complex from falling apart before the end of the gene is reached. However, this grip must not be so tight as to prevent the polymerase from moving along the DNA. To understand how these apparently contradictory requirements are met, the interactions between the polymerase, the DNA template and the RNA transcript are being examined by X-ray crystallography studies (Section 9.1.3), combined with crosslinking experiments in which covalent bonds are formed between the DNA or RNA and the polymerase, these bonds enabling the amino acids that are closest to the DNA and RNA to be identified. In the current model the DNA template lies between the β and β′ subunits, within a trough on the enclosed surface of β′. The active site for RNA synthesis also lies between these two subunits, with the non-template strand of DNA held within the β subunit and the RNA transcript extruded from the complex via a channel formed partly by the β and partly by the β′ subunit (Research Briefing 10.1; Korzheva et al., 2000).

The structure of the bacterial RNA polymerase. Structural studies have provided insights into the mechanism for transcription elongation and termination in bacteria. One of the most important developments in molecular biology in recent years has been (more...)

Exactly how termination occurs is not known. Current thinking views transcription as a stepwise nucleotide-by-nucleotide process, with the polymerase pausing at each position and making a ‘choice’ between continuing elongation by adding another ribonucleotide to the transcript, or terminating by dissociating from the template. Which choice is selected depends on which alternative is more favorable in thermodynamic terms (von Hippel, 1998). This model emphasizes that, in order for termination to occur, the polymerase has to reach a position on the template where dissociation is more favorable than continued RNA synthesis.

Bacteria appear to use two distinct strategies for transcription termination. About half the positions in Escherichia coli at which transcription terminates correspond to DNA sequences where the template strand contains an inverted palindrome followed by a run of deoxyadenosine nucleotides (Figure 10.3). These intrinsic terminators have been thought to promote dissociation of the polymerase by destabilizing the attachment of the growing transcript to the template, in two ways. First, when the inverted palindrome is transcribed, the RNA sequence folds into a stable hairpin, this RNA-RNA base pairing being favored over the DNA-RNA pairing that normally occurs within the transcription bubble. This reduces the number of contacts made between the template and transcript, weakening the overall interaction and favoring dissociation. The interaction is further weakened when the run of As in the template is transcribed, because the resulting A-U base pairs have only two hydrogen bonds each, compared with three for each G-C pair. The net result is that termination is favored over continued elongation (von Hippel, 1998). This model is easy to rationalize with the known properties of DNA-RNA hybrids, but an alternative hypothesis has been prompted by the result of crosslinking experiments, which have shown that the RNA hairpin makes contact with a flap structure on the outer surface of the RNA polymerase β subunit, adjacent to the exit point of the channel through which the RNA emerges from the complex. Although the flap structure is quite distant (some 6.5 nm) from the active site of the polymerase, a direct connection is made between the two by a segment of β-sheet within the β subunit. Movement of the flap could therefore affect the positioning of amino acids within the active site, possibly leading to breakage of the DNA-RNA base pairs and termination of transcription (Research Briefing 10.1). Additional evidence in support of this model comes from the demonstration that the protein called NusA, which enhances termination at intrinsic promoters, interacts with the hairpin loop and flap structure and may stabilize the contact between the two (Toulokhonov et al., 2001).

The second type of bacterial termination signal is Rho dependent. These signals usually retain the hairpin feature of intrinsic terminators, although the hairpin is less stable and there is no run of As in the template. Termination requires the activity of a protein called Rho, which attaches to the transcript and moves along the RNA towards the polymerase. If the polymerase continues to synthesize RNA then it keeps ahead of the pursuing Rho, but at the termination signal the polymerase stalls (see Figure 10.4). Exactly why has not been explained - presumably the hairpin loop that forms in the RNA is responsible in some way - but the result is clear: Rho is able to catch up. Rho is a helicase, which means that it actively breaks base pairs, in this case between the template and transcript, resulting in termination of transcription.

In bacteria, two mechanisms have evolved for influencing the repeated choice that the polymerase has to make between elongation and termination when copying a template. Both mechanisms are important in regulating the expression of genes contained within operons.

The first process is called antitermination. This occurs when the RNA polymerase ignores a termination signal and continues elongating its transcript until a second signal is reached (Figure 10.5). It provides a mechanism whereby one or more of the genes at the end of an operon can be switched off or on by the polymerase recognizing or not recognizing a termination signal located upstream of those genes. Antitermination is controlled by an antiterminator protein, which attaches to the DNA near the beginning of the operon and then transfers to the RNA polymerase as it moves past en route to the first termination signal. The presence of the antiterminator protein causes the enzyme to ignore the termination signal, presumably by countering the destabilizing properties of an intrinsic terminator or by preventing stalling at a Rho-dependent terminator. Although the mechanics of the process are unclear, the impact that antitermination can have on gene expression has been described in detail, especially during the infection cycle of bacteriophage λ (Box 10.1).

Antitermination during the infection cycle of bacteriophage λ. Bacteriophage λ provides the best studied example of the use of antitermination as a means of regulating gene expression (Friedman et al., 1987). Immediately after entering (more...)

The second type of termination control is called attenuation. This system operates primarily with operons that code for enzymes involved in amino acid biosynthesis, but a few other examples are also known. The tryptophan operon of E. coli (Section 9.3.1) illustrates how it works. In this operon, two hairpin loops can form in the region between the start of the transcript and the beginning of trpE. The smaller of these loops acts as a termination signal, but the larger hairpin loop, which is closer to the start of the transcript, is more stable. The larger loop overlaps with the termination hairpin, so only one of the two hairpins can form at any one time. Which loop forms depends on the relative positioning between the RNA polymerase and a ribosome which attaches to the 5′ end of the transcript as soon as it is synthesized in order to translate the genes into protein (Figure 10.6). If the ribosome stalls so that it does not keep up with the polymerase, then the larger hairpin forms and transcription continues. However, if the ribosome keeps pace with the RNA polymerase then it disrupts the larger hairpin by attaching to the RNA that forms part of the stem of this hairpin. When this happens the termination hairpin is able to form, and transcription stops. Ribosome stalling can occur because upstream of the termination signal is a short open reading frame (ORF) coding for a 14-amino-acid peptide that includes two tryptophans. If the amount of free tryptophan is limiting, then the ribosome stalls as it attempts to synthesize this peptide, while the polymerase continues to make its transcript. Because this transcript contains copies of the genes coding for the biosynthesis of tryptophan, its continued elongation addresses the requirement that the cell has for this amino acid. When the amount of tryptophan in the cell reaches a satisfactory level, the attenuation system prevents further transcription of the tryptophan operon, because now the ribosome does not stall while making the short peptide, and instead keeps pace with the polymerase, allowing the termination signal to form.

The E. coli tryptophan operon is controlled not only by attenuation but also by a repressor (Section 9.3.1). Exactly how attenuation and repression work together to regulate expression of the operon is not known, but it is thought that repression provides the basic on-off switch and attenuation modulates the precise level of gene expression that occurs. Other E. coli operons, such as those for biosynthesis of histidine, leucine and threonine, are controlled solely by attenuation. Interestingly, in some bacteria, including Bacillus subtilis, the tryptophan operon is one of those that does not have a repressor system and so is regulated entirely by attenuation. In these bacteria, attenuation is mediated not by the speed at which the ribosome tracks along the mRNA, but by an RNA-binding protein called trp RNA-binding attenuation protein (TRAP) which, in the presence of tryptophan, attaches to the mRNA in the region equivalent to the short ORF of the E. coli transcript (Figure 10.7). Attachment of TRAP leads to formation of the termination signal and cessation of transcription (Antson et al., 1999).

At the most fundamental level, transcription is similar in bacteria and eukaryotes. The chemistry of RNA polymerization is identical in all types of organism, and the three eukaryotic RNA polymerases are all structurally related to the E. coli RNA polymerase, their three largest subunits being equivalent to the α, β and β′ subunits of the bacterial enzyme. The contacts between the eukaryotic polymerase II, the template DNA and the RNA transcript, as revealed by X-ray crystallography and crosslinking studies (Klug, 2001), are similar to the interactions described for bacterial transcription (see Research Briefing 10.1), and the basic principle that transcription is a step-by-step competition between elongation and termination (see page 275) also holds.

Despite this equivalence, the overall processes for mRNA synthesis in bacteria and eukaryotes are quite different. The most striking dissimilarity is the extent to which eukaryotic mRNAs are processed during transcription. In bacteria, the transcripts of protein-coding genes are not processed at all: the primary transcripts are mature mRNAs. In contrast, all eukaryotic mRNAs have a cap added to the 5′ end, most are also polyadenylated by addition of a series of adenosines to the 3′ end, many contain introns and so undergo splicing, and a few are subject to RNA editing. A function has been assigned to capping, but the reason for polyadenylation largely remains a mystery. With splicing and editing we can appreciate why the events occur - the former removes introns that block translation of the mRNA; the latter changes the coding properties of the mRNA - but we do not understand why these mechanisms have evolved. Why do genes have introns in the first place? Why edit an mRNA rather than encoding the desired sequences in the DNA?

Eukaryotic mRNAs are processed while they are being synthesized. The cap is added as soon as transcription has been initiated, splicing and editing begin while the transcript is still being made, and polyadenylation is an inherent part of the termination mechanism for RNA polymerase II. To deal with all of these events together would be confusing, with too many different things being described at once. We will therefore postpone editing until the end of the chapter, which means it can be dealt with in tandem with similar forms of chemical modification occurring during rRNA and tRNA processing, and we will consider splicing after we have studied capping, elongation and polyadenylation.

Although phosphorylation of the C-terminal domain (CTD) of the largest subunit of RNA polymerase II is the final step in initiation of transcription of mRNA-encoding genes in eukaryotes (Section 9.2.3), it is not immediately followed by the onset of elongation. A somewhat gray area exists in our understanding of the events that distinguish promoter clearance, which refers to the transition from the pre-initiation complex to a complex that has begun to synthesize RNA, and promoter escape, during which the polymerase moves away from the promoter region and becomes committed to making a transcript (Figure 10.8). The opposing effects of negative and positive elongation factors influence the ability of the polymerase to begin productive RNA synthesis, and if the negative factors predominate then transcription halts before the polymerase has moved more than 30 nucleotides from the initiation point. Promoter escape could therefore be an important control point, but how regulation is applied at this stage is not yet known (Lee and Young, 2000).

Successful promoter escape could be linked with capping, this processing event being completed before the transcript reaches 30 nucleotides in length. The first step in capping is addition of an extra guanosine to the extreme 5′ end of the RNA. Rather than occurring by normal RNA polymerization, capping involves a reaction between the 5′ triphosphate of the terminal nucleotide and the triphosphate of a GTP nucleotide. The γ-phosphate of the terminal nucleotide (the outermost phosphate) is removed, as are the β and γ phosphates of the GTP, resulting in a 5′-5′ bond (Figure 10.9). The reaction is carried out by the enzyme guanylyl transferase. The second step of the capping reaction converts the new terminal guanosine into 7-methylguanosine by attachment of a methyl group to nitrogen number 7 of the purine ring, this modification catalyzed by guanine methyltransferase. The two capping enzymes make attachments with the CTD and it is possible that they are intrinsic components of the RNA polymerase II complex during promoter clearance (Proudfoot, 2000).

The 7-methylguanosine structure is called a type 0 cap and is the commonest form in yeast. In higher eukaryotes, additional modifications occur (see Figure 10.9):

  • A second methylation replaces the hydrogen of the 2′-OH group of what is now the second nucleotide in the transcript. This results in a type 1 cap.

  • If this second nucleotide is an adenosine, then a methyl group might be added to nitrogen number 6 of the purine.

  • Another 2′-OH methylation might occur at the third nucleotide position, resulting in a type 2 cap.

All RNAs synthesized by RNA polymerase II are capped in one way or another. This means that as well as mRNAs, the snRNAs that are transcribed by this enzyme are also capped (see Table 9.3). The cap may be important for export of mRNAs and snRNAs from the nucleus (Section 10.5), but its best defined role is in translation of mRNAs, which is covered in Section 11.2.2.

As mentioned above, the fundamental aspects of transcript elongation are the same in bacteria and eukaryotes. The one major distinction concerns the length of transcript that must be synthesized. The longest bacterial genes are only a few kb in length and can be transcribed in a matter of minutes by the bacterial RNA polymerase, which has a polymerization rate of several hundred nucleotides per minute. In contrast, RNA polymerase II can take hours to transcribe a single gene, even though it can work at up to 2000 nucleotides per minute. This is because the presence of multiple introns in many eukaryotic genes (Section 10.1.3) means that considerable lengths of DNA must be copied. For example, the pre-mRNA for the human dystrophin gene is 2400 kb in length and takes about 20 hours to synthesize.

The extreme length of eukaryotic genes places demands on the stability of the transcription complex. RNA polymerase II on its own is not able to meet these demands: when the purified enzyme is studied in vitro its polymerization rate is less than 300 nucleotides per minute because the enzyme pauses frequently on the template and sometimes stops altogether. In the nucleus, pausing and stopping are reduced because of the action of a series of elongation factors, proteins that associate with the polymerase after it has cleared the promoter and left behind the transcription factors involved in initiation (Conaway et al., 2000). Thirteen elongation factors are currently known in mammalian cells, displaying a variety of functions (Table 10.1). Their importance is shown by the effects of mutations that disrupt the activity of one or other of the factors (Conaway and Conaway, 1999). Inactivation of CSB, for example, results in Cockayne syndrome, a disease characterized by developmental defects such as mental retardation, and disruption of ELL causes acute myeloid leukemia.

A second difference between bacterial and eukaryotic elongation is that RNA polymerase II, as well as the other eukaryotic nuclear polymerases, has to negotiate the nucleosomes that are attached to the template DNA that is being transcribed. At first glance it is difficult to imagine how the polymerase can elongate its transcript through a region of DNA wound around a nucleosome (see Figure 2.5). The solution to this problem is probably provided by elongation factors that are able to modify the chromatin structure in some way. In mammals, the elongation factor FACT has been shown to interact with histones H2A and H2B, possibly influencing nucleosome positioning, and less well defined interactions have been demonstrated for other factors (Orphanides and Reinberg, 2000). Yeast possesses a factor called elongator, which has tentatively been assigned a role in chromatin modification because it contains a subunit that has histone acetyltransferase activity (Section 8.2.1; Wittschieben et al., 1999), but so far a homolog of this complex has not been identified in mammals. An intriguing question is whether the first polymerase to transcribe a particular gene is a ‘pioneer’ with a special elongation factor complement that opens up the chromatin structure, with subsequent rounds of transcription being performed by standard polymerase complexes that take advantage of the changes induced by the pioneer.

Virtually all eukaryotic mRNAs have a series of up to 250 adenosines at their 3′ ends. These As are not specified by the DNA and are added to the transcript by a template-independent RNA polymerase called poly(A) polymerase (Bard et al., 2000). This polymerase does not act at the extreme 3′ end of the transcript, but at an internal site which is cleaved to create a new 3′ end to which the poly(A) tail is added.

The basic features of polyadenylation have been understood for some time. In mammals, polyadenylation is directed by a signal sequence in the mRNA, almost invariably 5′-AAUAAA-3′. This sequence is located between 10 and 30 nucleotides upstream of the polyadenylation site, which is often immediately after the dinucleotide 5′-CA-3′ and is followed 10–20 nucleotides later by a GU-rich region. Both the poly(A) signal sequence and the GU-rich region are binding sites for multi-subunit protein complexes, which are, respectively, the cleavage and polyadenylation specificity factor (CPSF) and the cleavage stimulation factor (CstF). Poly(A) polymerase and at least two other protein factors must associate with bound CPSF and CstF in order for polyadenylation to occur (Figure 10.10). These additional factors include polyadenylate-binding protein (PADP), which helps the polymerase to add the adenosines, possibly influences the length of the poly(A) tail that is synthesized, and appears to play a role in maintenance of the tail after synthesis. In yeast, the signal sequences in the transcript are slightly different, but the protein complexes are similar to those in mammals and polyadenylation is thought to occur by more or less the same mechanism (Guo and Sherman, 1996; Manley and Takagaki, 1996).

Polyadenylation was once looked on as a ‘posttranscriptional’ event but it is now recognized that the process is an inherent part of the mechanism for termination of transcription by RNA polymerase II. CPSF is known to interact with TFIID and is recruited into the polymerase complex during the initiation stage. By riding along the template with RNA polymerase II, CPSF is able to bind to the poly(A) signal sequence as soon as it is transcribed, initiating the polyadenylation reaction (Figure 10.11). Both CPSF and CstF form contacts with the CTD of the polymerase. It has been suggested that the nature of these contacts changes when the poly(A) signal sequence is located, and that this change alters the properties of the elongation complex so that termination becomes favored over continued RNA synthesis. As a result, transcription stops soon after the poly(A) signal sequence has been transcribed (Bentley, 1999).

Even though polyadenylation can be identified as an inherent part of the termination process, this does not explain why it is necessary to add a poly(A) tail to the transcript. A role for the poly(A) tail has been sought for several years, but no convincing evidence has been found for any of the various suggestions that have been made. These suggestions include an influence on mRNA stability, which seems unlikely as some stable transcripts have very short poly(A) tails, and a role in initiation of translation. The latter proposal is supported by research showing that poly(A) polymerase is repressed during those periods of the cell cycle when relatively little protein synthesis occurs (Colgan et al., 1996).

The existence of introns was not suspected until 1977 when DNA sequencing was first applied to eukaryotic genes and it was realized that many of these contain ‘intervening sequences’ that separate different segments of the coding DNA from one another (Figure 10.12). We now recognize seven distinct types of intron in eukaryotes, and additional forms in the archaea (Table 10.2). Two of these types - the GU-AG and AU-AC introns - are found in eukaryotic protein-coding genes and are dealt with in this section; the other types will be covered later in the chapter.

Other types of intron. There are eight different types of intron (see Table 10.2). Four types are described in the text: the nuclear pre-mRNA introns of the GU-AG and AU-AC classes, the self-splicing Group I introns, and the introns in eukaryotic pre-tRNA (more...)

Few rules can be established for the distribution of introns in protein-coding genes, beyond the fact that introns are less common in lower eukaryotes: the 6000 genes in the yeast genome contain only 239 introns in total, whereas many individual mammalian genes contain 50 or more introns. When the same gene is compared in related species, we usually find that some of the introns are in identical positions but that each species has one or more unique introns. This implies that some introns remain in place for millions of years, retaining their positions while species diversify, whereas others appear or disappear during this same period. This leads to two competing hypotheses for the evolution of introns:

  • ‘Introns late’ is the hypothesis that introns evolved relatively recently and are gradually accumulating in eukaryotic genomes.

  • ‘Introns early’ is the alternative hypothesis, that introns are very ancient and are gradually being lost from eukaryotic genomes.

These are issues that we will return to in Section 15.3.2 when we study molecular evolution. For the time being, what is important is that a eukaryotic pre-mRNA may contain many introns, perhaps over 100, taking up a considerable length of the transcript (Table 10.3), and that these introns must be excised and the exons joined together in the correct order before the transcript can function as a mature mRNA.

With the vast bulk of pre-mRNA introns, the first two nucleotides of the intron sequence are 5′-GU-3′ and the last two 5′-AG-3′. They are therefore called ‘GU-AG’ introns and all members of this class are spliced in the same way. These conserved motifs were recognized soon after introns were discovered and it was immediately assumed that they must be important in the splicing process. As intron sequences started to accumulate in the databases it was realized that the GU-AG motifs are merely parts of longer consensus sequences that span the 5′ and 3′ splice sites. These consensus sequences vary in different types of eukaryote; in vertebrates they can be described as:

5′ splice site 5′-AG↓GUAAGU-3′

3′ splice site 5′-PyPyPyPyPyPyNCAG↓-3′

In these designations, ‘Py’ is one of the two pyrimidine nucleotides (U or C), ‘N’ is any nucleotide, and the arrow indicates the exon-intron boundary. The 5′ splice site is also known as the donor site and the 3′ splice site as the acceptor site.

Other conserved sequences are present in some but not all eukaryotes. Introns in higher eukaryotes usually have a polypyrimidine tract, a pyrimidine-rich region located just upstream of the 3′ end of the intron sequence (Figure 10.13). This tract is less frequently seen in yeast introns, but these have an invariant 5′-UACUAAC-3′ sequence, located between 18 and 140 bp upstream of the 3′ splice site, which is not present in higher eukaryotes. The polypyrimidine tract and the 5′-UACUAAC-3′ sequence are not functionally equivalent, as described in the next two sections.

The conserved sequence motifs indicate important regions of GU-AG introns, regions that we would anticipate either acting as recognition sequences for RNA-binding proteins involved in splicing, or playing some other central role in the process. Early attempts to understand splicing were hindered by technical problems (in particular difficulties in developing a cell-free splicing system with which the process could be probed in detail), but during the 1990s there was an explosion of information. This work showed that the splicing pathway can be divided into two steps (Figure 10.14):

  • Cleavage of the 5′ splice site occurs by a transesterification reaction promoted by the hydroxyl group attached to the 2′ carbon of an adenosine nucleotide located within the intron sequence. In yeast, this adenosine is the last one in the conserved 5′-UACUAAC-3′ sequence. The result of the hydroxyl attack is cleavage of the phosphodiester bond at the 5′ splice site, accompanied by formation of a new 5′-2′ phosphodiester bond linking the first nucleotide of the intron (the G of the 5′-GU-3′ motif) with the internal adenosine. This means that the intron has now been looped back on itself to create a lariat structure.

  • Cleavage of the 3′ splice site and joining of the exons result from a second transesterification reaction, this one promoted by the 3′-OH group attached to the end of the upstream exon. This group attacks the phosphodiester bond at the 3′ splice site, cleaving it and so releasing the intron as the lariat structure, which is subsequently converted back to a linear RNA and degraded. At the same time, the 3′ end of the upstream exon joins to the newly formed 5′ end of the downstream exon, completing the splicing process.

In a chemical sense, intron splicing is not a great challenge for the cell. It is simply a double transesterification reaction, no more complicated than many other biochemical reactions that are dealt with by individual enzymes. Why then has such a complex machinery evolved to deal with it? The difficulty lies with the topological problems. The first of these is the substantial distance that might lie between splice sites, possibly a few tens of kb, representing 100 nm or more if the mRNA is in the form of a linear chain. A means is therefore needed of bringing the splice sites into proximity. The second topological problem concerns selection of the correct splice site. All splice sites are similar, so if a pre-mRNA contains two or more introns then there is the possibility that the wrong splice sites could be joined, resulting in exon skipping - the loss of an exon from the mature mRNA (Figure 10.15A). Equally unfortunate would be selection of a cryptic splice site, a site within an intron or exon that has sequence similarity with the consensus motifs of real splice sites (Figure 10.15B). Cryptic sites are present in most pre-mRNAs and must be ignored by the splicing apparatus.

The central components of the splicing apparatus for GU-AG introns are the snRNAs called U1, U2, U4, U5 and U6. These are short molecules (between 106 nucleotides [U6] and 185 nucleotides [U2] in vertebrates) that associate with proteins to form small nuclear ribonucleoproteins (snRNPs) (Figure 10.16). The snRNPs, together with other accessory proteins, attach to the transcript and form a series of complexes, the last one of which is the spliceosome, the structure within which the actual splicing reactions occur (Smith and Valcárcel, 2000). The process operates as follows (Figure 10.17):

  • The commitment complex initiates a splicing activity. This complex comprises U1-snRNP, which binds to the 5′ splice site, partly by RNA-RNA base-pairing, and the protein factors SF1, U2AF35 and U2AF65, which make protein-RNA contacts with the branch site, the polypyrimidine tract and the 3′ splice site, respectively.

  • The pre-spliceosome complex comprises the commitment complex plus U2-snRNP, the latter attached to the branch site. At this stage, an association between U1-snRNP and U2-snRNP brings the 5′ splice site into close proximity with the branch point.

  • The spliceosome is formed when U4/U6-snRNP (a single snRNP containing two snRNAs) and U5-snRNP attach to the pre-spliceosome complex. This results in additional interactions that bring the 3′ splice site close to the 5′ site and the branch point. All three key positions in the intron are now in proximity and the two transesterifications occur as a linked reaction, possibly catalyzed by U6-snRNP, completing the splicing process.

The series of events shown in Figure 10.17 provides no clues about how the correct splice sites are selected so that exons are not lost during splicing, and cryptic sites are ignored. This aspect of splicing is still poorly understood but it has become clear that a set of splicing factors called SR proteins are important in splice-site selection. The SR proteins - so-called because their C-terminal domains contain a region rich in serine (abbreviation S) and arginine (R) - were first implicated in splicing when it was discovered that they are components of the spliceosome. They appear to have several functions, including the establishment of a connection between bound U1-snRNP and bound U2AF in the commitment complex (Valcárcel and Green, 1996). This is perhaps the clue to their role in splice-site selection, formation of the commitment complex being the critical stage of the splicing process, as this is the event that identifies which sites will be linked.

SR proteins also interact with exonic splicing enhancers (ESEs), which are purine-rich sequences located in the exon regions of a transcript (Blencowe, 2000). We are still at an early stage in our understanding of ESEs and their counterparts, the exonic splicing silencers (ESSs; Del Gatto-Konczak et al., 1999), but their importance in controlling splicing is clear from the discovery that several human diseases, including one type of muscular dystrophy, are caused by mutations in ESE sequences. The location of ESEs and ESSs indicates that assembly of the spliceosome is driven not simply by contacts within the intron but also by interactions with adjacent exons. In fact, it is possible that an individual commitment complex is not assembled within an intron as shown in Figure 10.17, but initially bridges an exon (Figure 10.18). This model is attractive not only because it provides a means by which contact between an ESE or ESS and an SR protein could influence splicing, but also because it takes account of the large disparity between the lengths of exons and introns in vertebrate genes. In the human genome, for example, the exons have an average length of 145 bp compared with 3365 bp for introns (IHGSC, 2001). Initial assembly of a commitment complex across an exon might therefore be a less difficult task than assembly across a much longer intron.

There is one final aspect of SR proteins that we should address. This is the possibility that a subset of these SR proteins, called CASPs (CTD-associated SR-like proteins) or SCAFs (SR-like CTD-associated factors), form a physical connection between the spliceosome and the CTD of the RNA polymerase II transcription complex, and hence provide a link between transcript elongation and processing. As with some of the polyadenylation proteins (Section 10.1.2), it is probable that these splicing factors ride with the polymerase as it synthesizes the transcript, and are deposited at their appropriate positions at intron splice sites as soon as these are transcribed. Electron microscopy studies have shown that transcription and splicing occur together, and the discovery of splicing factors that have an affinity for RNA polymerase provides a biochemical basis for this observation (Corden and Patturajan, 1997).

When introns were first discovered it was imagined that each gene always gives rise to the same mRNA: in other words, that there is a single splicing pathway for each primary transcript (Figure 10.19A). This assumption was found to be incorrect in the 1980s, when it was shown that the primary transcripts of some genes can follow two or more alternative splicing pathways, enabling a single transcript to be processed into related but different mRNAs and hence to direct synthesis of a range of proteins (Figure 10.19B). In some organisms alternative splicing is uncommon, only three examples being known in Saccharomyces cerevisiae, but in higher eukaryotes it is much more prevalent. This first became apparent when the draft Drosophila sequence was examined (Adams et al., 2000), and it was realized that fruit flies have fewer genes that the microscopic worm Caenorhabditis elegans (see Table 2.1), despite the obviously greater physical complexity of Drosophila, which should be reflected in a more diverse proteome. The most likely explanation for the lack of congruence between the number of genes in the Drosophila genome and the number of proteins in its proteome is that a substantial number of the genes give rise to multiple proteins via alternative splicing. At about the same time, the first human chromosome sequences were obtained and it was recognized that rather than having 80 000–100 000 genes, as suggested by the size of the human proteome, humans have only 35 000 or so genes. It is now believed that at least 35% of the genes in the human genome undergo alternative splicing (Graveley, 2001): the principle ‘one gene, one protein’, biological dogma since the 1940s, has been completely overthrown.

Alternative splicing is now looked on as a crucial innovation in the genome expression pathway. Two examples will suffice to illustrate its importance. The first of these concerns sex, a fundamental aspect of the biology of any organism, and which in Drosophila is determined by an alternative splicing cascade (Chabot, 1996). The first gene in this cascade is sxl, whose transcript contains an optional exon which, when spliced to the one preceding it, results in an inactive version of protein SXL. In females the splicing pathway is such that this exon is skipped so that functional SXL is made (Figure 10.20). SXL promotes selection of a cryptic splice site in a second transcript, tra, by directing U2AF65 away from its normal 3′ splice site to a second site further downstream. The resulting female-specific TRA protein is again involved in alternative splicing, this time by interacting with SR proteins to form a multifactor complex that attaches to an ESE within an exon of a third pre-mRNA, dsx, promoting selection of a secondary, female-specific splice site in this transcript. The male and female versions of the DSX proteins are the primary determinants of Drosophila sex.

The second example of alternative splicing illustrates the multiplicity of mRNAs synthesized from some primary transcripts. The human slo gene codes for a membrane protein that regulates the entry and exit of potassium ions into and out of cells (Graveley, 2001). The gene has 35 exons, eight of which are involved in alternative splicing events (Figure 10.21). The alternative splicing pathways involve different combinations of the eight optional exons, leading to over 500 distinct mRNAs, each specifying a membrane protein with slightly different functional properties. What are the biological consequences of this example of multiple splicing? The human slo genes are active in the inner ear and determine the auditory properties of the hair cells on the basilar membrane of the cochlea. Different hair cells respond to different sound frequencies between 20 and 20 000 Hz, their individual capabilities determined in part by the properties of their Slo proteins. Alternative splicing of slo genes in cochlear hair cells therefore determines the auditory range of humans.

At present we do not understand how alternative splicing is regulated and cannot describe the process that determines which of several splicing pathways is followed by a particular transcript. The players are thought to be the SR proteins in conjunction with ESEs and ESSs, but the way in which they control splice site selection is not known.

One of the more surprising events of recent years has been the discovery of a few introns in eukaryotic pre-mRNAs that do not fall into the GU-AG category, having different consensus sequences at their splice sites. These are the AU-AC introns which, to date, have been found in approximately 20 genes in organisms as diverse as humans, plants and Drosophila (Nilsen, 1996; Tarn and Steitz, 1997).

As well as the sequences at their splice sites, AU-AC introns have a conserved (though not invariant) branch site sequence with the consensus 5′-UCCUUAAC-3′, the last adenosine in this motif being the one that participates in the first transesterification reaction. This points us towards the remarkable feature of AU-AC introns: their splicing pathway is very similar to that for GU-AG introns, but involves a different set of splicing factors. Only the U5-snRNP is involved in the splicing mechanisms of both types of intron. The roles of U1-snRNP and U2-snRNP are taken by a previously discovered complex that had never been assigned a function. U11/ U12-snRNP, and an entirely new U4atac/U6atac-snRNP have subsequently been isolated, completing the picture.

The splicing pathways for the ‘major’ and ‘minor’ types of intron are not identical but many of the interactions between the transcript and the snRNPs and other splicing proteins are remarkably similar. This means that AU-AC introns, rather than simply being a curiosity, are proving useful in testing models for interactions occurring during GU-AG intron splicing. The argument is that a predicted interaction between two components of the GU-AG spliceosome can be checked by seeing if the same interaction is possible with the equivalent AU-AC components. This has already been informative in helping to define a base-paired structure formed between the U2 and U6 snRNAs in the GU-AG spliceosome (Tarn and Steitz, 1996).

Postingan terbaru

LIHAT SEMUA