I recently published an article in the Journal of Proteome Research titled “Towards an upgraded honey bee (Apis mellifera L.) genome annotation using proteogenomics” along with several co-authors. It’s all about how we were able to identify new parts of genes that haven’t been reported before, which helps us map what regions of bee DNA get expressed and which regions don’t. Of course, I posted this publication on social media. Along with the congratulatory remarks, though, there were also some people expressing that they didn’t understand what it was all about! I must admit, even the table of contents graphic that the journal asked for (above) contains acronyms and is probably only decipherable for people in the field. I am writing this blog post to help explain the paper without using the stiff tone of a journal article and without unnecessary jargon (or at least if I do, I explain what it means). After all, communicating complicated scientific topics in a way the average person can understand is why I started this blog in the first place.
If it’s been a while since high school biology class, here is a quick refresher on the basics. DNA is the blueprint of life – it’s the stuff inside our cells that make us who we are, and that makes us different from fish and chimpanzees. We like calling it a blueprint because it contains the instructions for how to build the different parts in an organism, but really, it’s more like a blueprint written in a foreign language. If this were a construction site, the blueprint would first have to be rewritten in a way that the workers could understand – then the workers could read it, and assemble the final building. This is what happens inside our cells, except the blueprint is DNA (we also call this the “genome”), and the rewritten instructions are RNA. The RNA is deciphered by molecular machines called ribosomes (the construction workers), which put together the final protein product (the building). This process is so fundamental, it’s the central dogma for all biology.
I have been working in a biochemistry lab for a few years now, and our specialty is being able to take snapshots of the proteins that are present at any given time (we call these snapshots the “proteome”). The samples we work with could be anything from cells in culture, blood, organs or even whole organisms if it’s something small (like a bee). And I don’t mean that we’re looking at a few proteins at a time – we’re talking about looking at thousands of proteins at once, with a typical experiment identifying anywhere between 2,000 and 6,000 proteins. Being able to do this is useful because if we can find out how the proteome changes in response to, for example, an infection or disease, then we can start to figure out the underlying mechanism and in some cases, suggest potential treatments.
The actual instrumentation that we use to ‘see’ the proteins is where it gets a bit more complicated. This fancy piece of equipment (called a “mass spectrometer”) looks like a big grey box, but it typically sells for a few hundred thousand dollars – sometimes more. What makes it so special are the incredibly finely tuned electronics and geometry inside, which is where the proteins fly. Fly? Yes, that’s right. Since we cannot actually see the proteins with our eyes, we have to play some tricks to be able to identify them, and the best trick that anyone has come up with so far is to break them into smaller pieces (“peptides”) that are easier to work with, give them a positive charge and – now this is what the mass spectrometer does – make them fly across some distance until they hit a detector. There are other varieties of mass spectrometers that have different configurations, but the basic idea is the same. The reason the peptides are given a charge is because, if you can recall some basic physics, this means they can be accelerated through electric fields; without this charge, we would have no way to control where they go. After all this, the mass of the peptide can be calculated based on the principle that heavier peptides take a longer time to traverse the same distance than smaller peptides. This whole process happens very fast, allowing us to identify thousands of proteins in a few hours.
People in the lab have used this technique for years to study honey bees, with experiments ranging from seeing how the proteome changes as the bee grows from an egg to larva to pupa, to seeing what happens when a bee is infected with a virus or bacteria. These experiments were informative, but a frustrating trend began to emerge – without fail, far fewer proteins could be identified in bees compared to other organisms. A typical experiment with human cells might yield 4,000 to 6,000 proteins, but with bees, a typical experiment identifies 1,000 to 2,000. The absolute most ever is 2,997, which was achieved by yours truly in our above-mentioned paper. All this was quite troubling because it means that there is probably a large fraction of the proteome that is essentially invisible. In other words, we are continually missing out on data.
In our paper, we tested several potential reasons for why the visible honey bee proteome is so small (take a look at the headings in the “Results and Discussion” section if you would like to know what they are). We never did find a smoking gun, but we did manage to somewhat improve our numbers. When mass spectrometry is used to identify proteins, a caveat is that the researcher must at least know what proteins might be found. We cannot, for example, analyze the protein from a deep sea vent-dwelling crab and expect to identify the same number of proteins as in a human, because we know far less about the kinds of proteins the crab might be expressing in the first place. In the grand scheme of things, honey bees also haven’t been studied all that much compared to humans, so it’s reasonable to think that there are bee genes that we don’t even know exist. And this is exactly what we found: thousands of pieces of genes, in some cases what might be whole genes, that were previously unknown. It still blows my mind what we can do with modern technology.