Proteins

This blogpost is basically one final round-up of all the stuff I've not studied about proteins yet. This includes how they bind to DNA, as well as the different protein structures, how those structures are kept together, and then an extra bit about protein folding. 

Protein-DNA binding

When I discussed DNA structure, I didn't go into too much detail about how the double helix exists in the first place. In hindsight, it might be easy to act like the double helix is just something that forms naturally, without any outside interference, but really it's down to histones.

You see, the double helix is supercoiled around proteins called histones, which contain vast amounts of lysine and arginine. These two amino acids are both basic, with amine groups on their sidechains, which will ionise. It's this ionisation which helps keep the DNA coiled up in a double helix shape, as I'll explain later. In fact, this is the key protein-DNA binding interaction that I've been taught.

Obviously these interactions can be undone, such as by using helicase to unzip the double helix into two separate strands for DNA replication...which I have blogged about before

But you may also be curious about how we got here in the first place - how are proteins built up? And what kinds of intermolecular interactions keep a protein together?

  • Primary structure

Proteins consist of amino acids, which are arranged in a certain sequence. They consist of a series of peptide bonds formed between each amino acid; these are also known as amide bonds, if you're a chemistry student.

This peptide bond is noticeably planar, owing to strong p-orbital overlap between the carbon and nitrogen, as well as resonance of a lone pair on the nitrogen across the carbonyl group. There is an associated dihedral angle to the peptide bond, Ω, though, and you can still rotate around two bonds besides the peptide link:

  • Rotate around the bond between the carbonyl and the adjacent carbon, and you'll get a dihedral angle, ψ. Do the same but between the carbonyl and terminal nitrogen, and you'll get a dihedral angle, φ. The values these angles can take are ultimately determined by steric - whether the atoms are too close to each other - factors. We can even plot all the dihedral angles which are permitted on what's known as a Ramachandran plot, essentially mapping out all the possible different conformations of a peptide.

Dihedral angles in a peptide bond

This is especially important as we start to consider what a protein is like in 3D - after all, it's not just a series of letters drawn out in almost incomprehensible sequences on first glance. Instead, they twist and contort, taking up various different shapes so that they can ultimately fold and carry out a certain function.

Hemoglobin structure generated using EZMol. PDB: 1GZX

At this point, I'd also like to get a bit exam-questiony and compare proteins to nucleic acids. Because whilst, yes, we have loads of different amino acids - twenty standard ones - and only five nucleobases (A, T, C, G and U), chemically speaking, these amino acids are far more simple than the nucleotides are. Nucleotides have a (deoxy)ribose sugar, a phosphate group, and a nucleobase; amino acids have just the amine, the carboxylic acid group, and a sidechain. 

And yet these amino acids have a far greater range of possible interactions with each other, ranging from salt bridges between charged amino acids like aspartic acid and lysine; hydrophobic and hydrophilic groups; even some aromatic sidechains in the case of tyrosine. One notable example that will come up later is that of cysteine, which has a thiol group. That means we have a far greater range of possible interactions and structures within proteins.

For this, we need to go up a notch: 

  • Secondary structure

There are two key secondary structures to consider:

  • Alpha-helices, which are coiled sections of amino acids. Here, they have a helical axis, with side chains perpendicular to the axis, and hydrogen bonds lie in the direction of the axis. These hydrogen bonds are between the N-H and C=O bonds in the sequence which lies four residues behind.
  • Beta sheets, which are planar sections of amino acids - they can lie parallel or anti-parallel to each other. Similarly, side-chains lie perpendicular to the plane of the beta sheet, whereas hydrogen bonds lie within the plane. Similarly, these hydrogen bonds are between the N-H and C=O bonds.

And worth clarifying here that the H in N-H acts as the hydrogen-bond donor, and the O in C=O is the hydrogen bond acceptor. 

Alpha-helices are in red, beta-sheets are in blue. Source: https://commons.wikimedia.org/wiki/File:PDB_1q1h_EBI.jpg

But we're still not done yet - at this point, we're still fairly limited with what we can actually do with our amino acids. We need to yet again go up a tier: 

  • Super-secondary structure

There are two different forms of super-secondary structure:

  • Motif -  this is a short section of protein structure which commonly repeats across various different proteins. These will often have different amino acid sequences, but will keep the same topology. Two examples are: 
    • helix-turn-helix  
    • helix-loop-helix
  • These two are important in protein-DNA binding, where an alpha-helix can bind within the major groove of the DNA, and interact with hydrogen bond donors and acceptors within the nucleobases.  

We also have domains, consisting of multiple motifs that form a region in a protein with one particular function. 

However, motifs and domains aren't particularly stable on their own - they are too unstable on their own to fold correctly and perform a certain function. You need other interactions between other motifs and domains to be stable. It's like an ecosystem!

Even so, if we collated all the motifs within a protein, and had them working together, we could still face certain issues in the protein being able to perform a task if they're unmodified. This is less of an issue with domains, which can still fold separately to induce protein activity, but that may not apply to all domains.

So now we need to go up a level once again:

Tertiary structure

This is basically what the double-helix is to the DNA (secondary structure) is to the nucleic acid (primary structure). The focus now lies on the 3D topology of the protein, which will involve various different modifications to the protein structure to enable the protein to fold correctly. 

There are two different types of modification:

  • Post-synthetic modifications (we change the structure after forming the protein)
  • Prosthetic groups (we add additional ligands to the protein)  

Prosthetic groups

These are molecules which are bound tightly to enzymes; they might help influence whether a protein can be activated or not, or whether it could act as a catalyst. There are also co-enzymes, which are bound more loosely to proteins, but which can still act as enzyme modulators, for instance - these include vitamins, as I mentioned in my enzymes blogpost

Post-synthetic modifications 

However, there are also several covalent interactions which are essential to helping modify proteins to enable them to fold correctly and enact a certain function. A key one is a disulfide bond, consisting of a two cysteine amino acids coming close together, and forming a sulphur-sulphur bond through an oxidation process which is ultimately reversible.

Other examples of post-translational modifications include:

  • phosphorylation (we add a phosphate group to a substrate), which is used in cell signalling such as receptor tyrosine kinases, which I will write up a blogpost on soon!
  • acetylation (we add an acetyl or ethanoyl group to a substrate). 
    • This is key in DNA-protein binding; as I mentioned earlier, the DNA double helix is only facilitated due to it being wrapped in histones, due to interactions between positively-charged lysines and negatively-charged phosphate groups in the DNA.
    • Lysine consists of an amine group, so we can use acetylation to convert the charged amine into an uncharged amide. This will remove the electrostatic interactions between the DNA and histones, enabling the DNA to unravel when necessary. 

From here we can move onto quaternary structures, like haemoglobin, if we have more than one polypeptide chain in our structure, with various different intermolecular interactions between them. But that's somewhat beyond the scope of my uni module, so I won't stray significantly further than that.

Protein stability

Now the question is how do we keep our protein together? I've already touched lightly on the concept of intermolecular interactions on numerous occasions in this post, but I'd like to go through all the amino acids and classify them by what sorts of interactions are possible:

  • Salt bridges will form between two charged amino acid sidechains. 
    • Arginine, histidine, glutamic acid, aspartic acid, and lysine are all capable of this.
  • Sidechains with polar groups can form hydrogen bonds with each other.
    • This is several of the amino acids: they will either have amide groups (eg. asparagine), imidazole groups (eg. histidine), or hydroxyl groups (eg. serine). There are also some aromatic sidechains with oxygens or nitrogens, like phenols (eg. tyrosine) or indoles (eg. tryptophan).
  • Sidechains with non-polar, hydrophobic groups can also interact in the hydrophobic effect, where effectively proteins will seek for these groups to avoid contact with water as much as possible - this can alter what structure a protein will take up. In this category, you'll find:
    • Hydrocarbon-based sidechains, like leucine and alanine;
    • Phenyl sidechains, like phenylalanine, which, yes, is just alanine with a phenyl group;
    • Proline, which does have a nitrogen so you might expect it to be able to form hydrogen bonds, but it doesn't have a hydrogen bound to it, so it can't;
    • Methonine. It has a sulphur, so you might think it could form a disulfide bond like cysteine, but methionine isn't a thiol, so is unable to.
  • And, of course, cysteine - which can act as a hydrogen bond donor, but which also forms those disulfide bonds.

From all those interactions, protein folding is primarily driven by the hydrophobic effect, and measures within the protein will occur that all folds as expected. You might have chaperones which stop hydrophobic regions from interacting with each other, and protein disulfide isomerases, which catalyse the shuffling of disulfide bonds around to ensure all the bonds are in the right places.

Sometimes, there are errors, which lead to unpleasant consequences. Prion diseases, such as Creutzfeldt-Jakob disease, are caused by the prion protein misfolding, which causes the protein to undergo conformational change from an alpha-helix to a beta-sheet. Alzheimers is caused by an amyloid-beta peptide misfolding and aggregating inside the brain, leading to neuron death. Luckily, though, protein misfolds are particularly rare, because it's rather difficult to get to that point - the body has so many checks during protein synthesis to ensure a protein won't get to that point. Unfortunately, though, errors can occur.

File:Prion Protein Fibrils (8656058266).jpg

Prion protein. Much better when folded correctly. 

Source: NIAID, CC BY 2.0 <https://creativecommons.org/licenses/by/2.0>, via Wikimedia Commons

Comments

Post a Comment