DNA, RNA, proteins, etc.
DNA and RNA are strings of “nucleotide” molecules. There are four nucleotides: G, A, C, T/U — with the caveat that RNA uses U where DNA uses T, and vice versa (so there are actually five nucleotides).
So DNA/RNA is a set of instructions to make and order amino acids. A chain of amino acids is a “protein.” Each amino acid is itself a protein.
Every three molecules corresponds to an amino acid that the ribosome will attach to the protein, in a specific order — or otherwise a STOP instruction which, naturally, tells the ribosome to stop attaching amino acids. Also, every RNA sequence must start with the triplet AUG (corresponding to the amino acid methionine) — this is the START instruction.
Almost every amino acids can be coded from more than one triplet:
Although every RNA sequence must start with the triplet AUG, and thus methionine (Met) is the first amino acid incorporated into every new protein, it is not always the first amino acid in mature proteins. In many cases, methionine is removed after translation by various enzymes floating around, which are regulated by other factors. There are more than a hundred known post-translational modifications, the removal of methionine from the start position being just one.1
As an example, the spike protein of the COVID-19 virus is the sequence of amino acids:
CUU GAC AAA GUU GAG GCU GAA GUG CAA AUU GAU AGG UUG AUC ACA GGC
This is it. This the thing that has been causing all our problems. Without this spike protein, the virus would not be able to penetrate and infect human cells.
The BioNTech COVID-19 vaccine works by injecting you with mRNA that tells your cells (ribosome) to produce this spike protein (without the rest of the virus body). Your immune system is then able to learn to recognize the spike protein, and thus the virus, without actually being exposed to the virus.
Interestingly, the vaccine uses the molecule ‘pseudouridine’ (Ψ) in place of uridine (U) in its RNA coding. The ribosome reads the Ψ as it would a U, and attaches the appropriate amino acids to create the spike protein.
The importance of the Ψ in place of U is that it allows the mRNA to sneak past the immune system, which obviously doesn’t like foreign RNA. How it does this, I do not know.
The vaccine also uses triplets that are synonymous to ones in the original virus (see table above), and apparently this isn’t random and is for good reason — but I don’t know the reason.
For example, the vaccine codes the spike protein as:
CΨG GAC CCΨ CCΨ GAG GCC GAG GΨG CAG AΨC GAC AGA CUG AΨC ACA GGC
You can check that, except for the third and fourth triplet, this encodes the same spike protein as the previous sequence above. (The third and fourth triplet “
CCΨ CCΨ
” apparently allows the spike protein to keep its form and not degrade when it is free standing without the rest of the virus attached to it.2)I know of one retired micro-biologist that is slightly worried that these Ψ’s might do “something” in the body once the mRNA degrades into its component parts. Oh well.
A piece of the puzzle to a question I had and still have: Why is one G nucleotide on the mRNA cap sufficient to protect the cap end, while the tail needs 100-200 A’s?
Well, an mRNA can be reused many times. But each time it’s used, it loses some of the A’s on its tail. Once the A’s run out, the mRNA is no longer functional and gets discarded. The ‘poly-A’ tail is protection from degradation. So the more you have, the better (in this narrow sense).
But I still have to ask: Why doesn’t the G nucleotide on the cap “fall off” in the same way? Why not have 100-200 G’s?
https://biology.stackexchange.com/questions/56939/do-all-proteins-start-with-methionine
https://berthub.eu/articles/posts/reverse-engineering-source-code-of-the-biontech-pfizer-vaccine/