Thursday, August 28, 2014

Prime XS -- Advance proteomics...and the rest of science


This is SO smart.  And we need an equivalent here in the U.S.  If you want to start it, I volunteer to head it.

Prime-XS is a program ran by the EU.  It ensures 2 things:  1) That proteomics is used for scientific studies of extreme merit.  and 2) That labs participating in XS are exposed to high quality, high impact biological problems.  It is a win-win.  Top notch labs get top notch collaborators and the EU pays for it!

How it works:  Researchers in the EU countries can apply for days of access to proteomics facilities that the EU has reserved for this program.  This is how it is currently divided:

Ummm.... 621 days are available in Utrecht?  That couldn't be the Heck lab, right?  Pretty sure it is!

What a win for everybody!  How many top notch proteomics facilities can you think of that have trouble finding high impact biological questions?  Tons, right?  Let's face it.  Cool problems don't always come from the same places where our best proteomics facilities are.  This fixes it.  In one fell swoop.  Top biological problems -- top proteomics capabilities -- and we all win.

The downside is that this program is going to push the power balance in impact factor toward Europe.  Meaning we need something like this over here.

You can read more about Prime-XS here.
READ MORE - Prime XS -- Advance proteomics...and the rest of science

Wednesday, August 27, 2014

Rough week around here


Last thing you want to see on your primary processing PC?  3600 sector errors.  Needless to say, it is slowing things down around here.
READ MORE - Rough week around here

Tuesday, August 26, 2014

Fill a library with what I don't know about in shotgun proteomics


Today I got a super interesting question about Percolator and Spectral libraries.  While investigating, I figured I'd better go to the Percolator Google Group and see (btw, I think this is going to be very interesting to everybody!!!)

Anywho, while searching, I came across a whole lot of names I recognize who belong to a Crux-users Google group.  What the heck is Crux?  Umm....maybe an incredibly awesome and apparently free software package for proteomics?

On further investigation, I found Scholar references back to 2008(?!?!?!).  I feel less out of touch because it appears to be buried within the 1,000 programs and features that make up the Trans Proteomic Pipeline.

You can find out more about Crux here.  Feel free to tell me about it.  Back to Percolator....
READ MORE - Fill a library with what I don't know about in shotgun proteomics

Friday, August 22, 2014

Perform imaging analysis on a Q Exactive?!?!?!

Guess what I did this week!  Wait...I guess if you read the subject line you have an idea....

I got to hang out all week with the very nice team of scientists at Protea Biosciences in the beautiful mountains of West Virginia.  Not heard of Protea?  Me neither!  But I expect that they will rapidly be something that we'll be talking about, in part, because of this thing:

Despite it's appearance it is not, in fact, a refrigerator/toaster oven combo.  I got to play with a beta model that was open so I could see exactly what it was doing at all times.


Yes.  That is attached to a Q Exactive!  And that light inside is for the camera that directs the LASER.  I'm not going to lie and say I'm some laser ionization expert.  I'm not.  But I've spent some time on a MALDI-Orbi XL and a Rich Helm's MALDIs, but I got a crash course in it this week, and this was the most badass one I've seen.  The source is a LAESI and you can read about it at wikipedia here.  The team here has this source running on all sorts of samples.  I was just there to see if we could fine tune the Q Exactive to get even better data.  It was cool because we could get about 3 second laser pulses on our controls.  The trick was optimizing cycle time in the Q Exactive so that we could optimize every single millisecond.

Definitely a different way of thinking.  But if you think about the Q Exactive, and assume that this source can ionize virtually anything and consider those implications.  On the QE we can run, at maximum, about 13Hz. If we multiplex, this gives us a chance to monitor as many about 65 compounds per second via targeted SIM or targeted MS2 (PRM; btw, I'm considering practicality.  We can multiplex 10 compounds in the QE, but 5 is easy.  10 is trickier).  If we're just going for detection, the LAESI-QE compound can probably SIM or PRM about 150 molecules per 3 second laser pulse (we were optimizing with small molecule drug mixtures!)

What else did we do?  We worked most of yesterday optimizing native protein and top-down.  Cause the LAESI can zap native proteins right out of tissue, right off a slide, right from where you want it from.  Point to the area on the microscope and ZAP (it didn't actually make a noise...) native protein MS!  We were even able to get nice top down data on our intact native protein using the LAESI by using multiplexing on the QE to fragment multiple charge states from the isotopic envelope.  Did I mention that my week was really cool?

Anyway, this source can be put onto just about anything but, honestly, what is cooler than having imaging capabilities on the world's favorite mass spec?

BTW, imaging isn't all this lab does.  They have a really exceptional team of mass spectrometrists with experts in virtually anything you can think of from molecule elucidation through quantitative proteomics and everything in between.  (Can you tell I was  impressed this week?)

TL/DR:  You probably know about this already, I didn't, but check out Protea here!
READ MORE - Perform imaging analysis on a Q Exactive?!?!?!

Thursday, August 21, 2014

Biomath calculator App


What is more annoying than doing that calculation above me?  Seriously!  I have x amount of protein in ug what is my molar concentration?  You are probably smarter than me, but I have a tendency to lose a decimal place or two.  This week I said "wow, there should be an app for that".  And there is.  Of course.

Promega has a free BioMath calculator that will do this one for you, among other things.  (Lots of DNA calculations and dilutions, but this is what I downloaded it for!)  It is available for both Android and Apple.

You can read more about it here.  A big thanks to the talented team at Protea for pointing this one out to me!
READ MORE - Biomath calculator App

Wednesday, August 20, 2014

I've seriously considered this.


A friend forwarded me this picture.  I'm not the only person that loves this field!!!

Then I got this one forwarded to me as well (thanks Da Jules):


READ MORE - I've seriously considered this.

Monday, August 18, 2014

Cool targeted peptide quan article in The Scientist


I had to go to the doctor today for routine checkup stuff and what magazine was at the top of the stack? This month's issue of The Scientist!  That's just how Baltimore is.  There is probably almost as many scientists here as drug addicts.  My physician is probably married to a researcher at Hopkins or UMD or something.

Anyway, there is a great article this month describing how to move from discovery to targeted proteomics, as well as a description of each open source platform.  This'll come to no surprise if you've used it at all, but Skyline made the top of their list.

A couple of these platforms were new to me and it might be worth it to check out this nice little review.  Or even to forward it to collaborators.  It is concise and nicely written.  You can find it here.
READ MORE - Cool targeted peptide quan article in The Scientist

Sunday, August 17, 2014

Proteome Discoverer 2.0 teaser!


Look what I got this morning!  It is looking really really good too.  I got it to do some PC kirunchmarking today while hanging out with a sick dog.

I'll show the kirunchmarking stuff later.  I need to sort out some variables.  While I was doing it, I noticed a file was taking a whole lot longer than it did on PD 1.4 (or on earlier PD 2.0 alpha copies).

Check out what our friends in Bremen got working!!!  That one file did take longer than normal, but it was because PD was doing a bunch of other things:


If you've spent any time on the job queue on any version of Proteome Discoverer, chances are I just blew your mind a little.  PD is running multiple files at once!!!!  Now, it remains to be determined if those two matching 81% are because PD detected that I was reprocessing the same Fusion files just with different names, but the fact that it is intelligently allocating time is bound to make more than just me happy. I need to do some digging around.  I don't have all that many Fusion files and I'm running them cause they are the hardest to work with.  If we can do them fast, I'm not worried about the QE or Elite files.  Easy.  This RAW file has 16k unique peptides in it!

I'm running on a crazy fast PC (more details on that later, too!) but it knocked out 4 Fusion runs in an hour and 17 minutes.  I was experimenting with different peptide and protein FDRs and it just tore right through them.  By comparison, I just saw a big fancy dual CPU Xeon choke on HeLa files in PD 1.4 for hours.  Better hardware.  Better software.  And all the sudden these huge datasets everyone is generating don't seem all that scary!

BTW, wait till you see how PD 2.0 handles complex experiments! Thermo is about to release the best proteomics software we've ever seen.
READ MORE - Proteome Discoverer 2.0 teaser!

Friday, August 15, 2014

Computing exact p-values to improve your shotgun proteomics



You know what I love?  When people start applying nice statistics to proteomics data.  A lot of these datasets are geting far too large for us to say "x is twice y".  But we all have a lot on our plates.  We can't just take a bunch of stats classes (believe me, I'm trying and I've already had to drop on that I paid for this summer...) in order to get caught up.  We need good, trustworthy, time tested stats built into our processing schemes.

Why not go for simple p-values?
Because, obviously, it isn't that simple, dummy!
HAHA!  But it turns out that it is!

JJ Howbert and Bill Noble think it is and they have some really good evidence.  Check out this paper (it appears to be open access) in press at MCP.

In this study, they went to the original Xcorr values assigned by Sequest and looked at the total score distribution across all the peptide-spectral matches.  At this level, they were able to determine the probability that their test hypothesis (in this case, the Xcorr value) was true, cause that's what p-values do.

When they went back and ranked their peptides by p-value, rather than Xcorr, they found they had a much more accurate measurement of PSM validity than merely saying "anything above an Xcorr of 2.0 is trustworthy" (which is what most of us have been doing all along, be honest, and we've all secretly known it was silly.  It's like saying a TMT fold regulation of 1.25 is significant.  It's just us being lazy....)

Awesome, right?  As proof of principle, they compared the same data set to a bunch of different engines and, predictably, this worked better than the other engines tested.

What about Percolator?!?!

This is where I don't know quite enough Greek letters..or at least when you're adding and dividing them it does a funny thing to my brain.  What I know this morning?  They were able to work this pipeline into Percolator and I fell asleep.  They come from the same place.  Of course it works with Percolator!


READ MORE - Computing exact p-values to improve your shotgun proteomics

Thursday, August 14, 2014

Increase your throughput with parallel LC


I'm still at this awesome LC bootcamp.  Yesterday the instructors threw out this idea that has never ever occurred to me.  If you have a dual pump system with enough valves, you can set up parallel LC.  Dionex actually just sells a kit for this.  The gist of the method is that while your peptides are eluting, the second sample is loaded onto a second trapping column and washed.  When you get past the point in your gradient where you are just washing crap off your column and re-equilibrating you switch valves and then go right into the elution of the next set of peptides!  You could be really ruthless with this and shave a ton of time off of each run or more conservative and still shave a lot of time!

In the example we were looking at we were able to shave off 30 minutes of trapping, desalting and equilibrating time from each sample injected.  Imagine a semi-complex sample that you run with a  140 peptide elution time and you are talking about close to 6 hours of extra run time that you are squeezing in a day.  (Assuming ~12 samples per day and 30 minutes each).  Thats 2-3 extra runs per day!

Looking at the schematic makes this seem reasonably simple for any LC that has 2 switching valves and separate loading pumps.  Might be a great solution for any of you guys who are juts getting buried under your sample queue!


READ MORE - Increase your throughput with parallel LC

Wednesday, August 13, 2014

Why are we always moving toward columns with smaller particles?


This week I am at an intensive chromatography bootcamp taught by legacy Dionex experts Nick Glogowski and Daniel Kutscher.  Of the hundreds of interesting things I've picked up so far, one of the coolest things is this explanation of smaller particle sizes and why we keep migrating to smaller and smaller particles.

This chart shows the optimal parameters for separation of different particle size solid phase materials.  (Sorry about the colors, original document available here.)  For the most striking difference look a the 10um particles.  There is only a very specific flow rate where the binding is optimal.  Look at that in comparison with the 2um and 3um beads where we can handle a much larger variation in flow rates and maintain a nice straight line.  While I'm not going to pretend I understand all the math that has passed by on the projector this morning (sorry, guys!) I think it is still pretty clear that consistency in our chromatography sure makes a big difference (especially if we want to load our columns faster than we elute off them!)


READ MORE - Why are we always moving toward columns with smaller particles?

Tuesday, August 12, 2014

Semi-complex standard for intact or top down analysis

Boy o' boy!  Intact analysis and top down proteomics are all the rage these days!  A lot of this has to do with the Exactives.  The Q Exactive is great for both.  The QE Plus has the new Protein mode option and the Exactive plus EMR is probably the easiest and most sensitive instrument for intact analysis ever made.  A large percentage of my day job these days is supporting you guys with intacts and top downs.  A problem I've ran into is that the standards out there kind of suck.

My friend Aimee Rineas at Dr. Lisa Jones's lab at IUPUI took a swing at fixing this problem a while back

Our solution?  A pretty thorough analysis of the 6(?? 7?? read on, lol!) Mass Prep protein mixture from Waters.  It is part number 186004900 .  Be careful, there are several similar products and the Waters website doesn't do a very good job of distinguishing them.  This is the one I'm talking about.


Great!  So we have a standard.  Easy, right?  Not so fast.  The chart above is all the information you get on the proteins.  For Ribonuclease A, the mass is rounded to the nearest 100?  Sure, this will be okay for some TOFs, cause thats about as close as you can get in the mass accuracy department, but I'm running Orbitraps.  I want to see my mass accuracy in the parts per million, not the level of mass accuracy I will get with a TOF or SDS-PAGE.

This is where the work comes in.  Aimee and I used a really short column.  A C4 5 cm column and ran this standard out several times on the Q Exactive in her lab.  We did this first to obtain the intact masses and a few more times for top down analysis.

Our best chromatography looked like this (this is an RSLCnano, we just ran the microflow with the loading pump:
Not bad for a 5 cm column, right?  

Lets look at the first peak:

13681.2.  This is our RiboNucleaseA.  If we go to Uniprot and look up the sequence and cleave the first 25 amino acids (this is the pro- sequence that isn't actually part of the actual expressed protein...its a genetics thing that we don't have to worry about (?) in shotgun proteomics (?) but we have to worry about in intact and top down.


According to ExPasy, the theoretical mass is:  13681.32.  This puts us 0.12 Da or 8.7 ppm off of theoretical.  Boom!  (On a QE Plus run since, I've actually tightened this awesome mass accuracy!)

Okay.  I know there are some good TOFs out there.  There are probably some that could get us within somewhere close to this mass.  But if we dig a little deeper, we see the real power the Orbitrap has on this sample.  Look at this below.


Can you see this?  Sorry.  Screen capture and blogger format only get me so far.  

If we look closer at our pure protein we purchased, an inconvenient fact emerges.  What we thought was our pure protein, we can see with the QE, most certainly is not.  At the very least, we find that this protein is phosphorylated.  This problem is exacerbated when you increase the sensitivity even further by analyzing this same sample with a QE Plus (I have limited data showing this mix actually has a 7th proteoform in it that I need to find further evaluate.

By the way, this protein is known to be phosphorylated in nature.  The manufacturer just wasn't aware some of it slipped through their purification process.  We also did top down, remember.  We should be able to localize the modification.  I just haven't got quite there yet.  Free time is a little limited these days.
(I have used ProsightPC and localized this modification, I'll try to put it up later). 

People doing intact analysis are sometimes critical of the "noise" they find in an Orbitrap.  Further evaluation of the noise will often reveal that they are really minor impurities in our sample.  Are they biologically relevant if a TOF can't see them and only a QE Plus can?  I don't know.  Probably not.  Maybe?  Wouldn't it be better to know that they are there in any case?  

I started this side project months ago, considered actually writing up a short note on it and figured, "what the heck" more people will probably read it if I put it here anyway.  I've also gotten to run this sample on a QE Plus, which revealed even more cool stuff.

This is incomplete, and not doublechecked, but these are the masses that I have so far for this standard:

Protein                                 Part list  mass                                    Our mass
RibonulceasA bovine 13700 13680.9
13779.3
Cytochrome C, horse 12384 12358.2
Albumin, bovine 66430 66426.7
Myoglobin horse 16950 16950.5
Myoglobin horse + heme 17600
Enolase, yeast 46671 46668.8
Phosphorylase B 97200 97129

P.S.  All of this data was processed with ProMass.  I, too, am a creature of habit.  Protein Deconvolution gives me tons more tools and better data but for a super simple deconvolution I still default to good old ProMass or MagTran.  If I had written this up and tried to submit it somewhere, you bet your peppy I'd take screenshots from my PDecon runs though!

READ MORE - Semi-complex standard for intact or top down analysis

Monday, August 11, 2014

Proteomics explores the damages of sleep deprivation


Sleep deprivation has been big news lately, ever since a study went viral that showed sleep deprivation can cause permanent damage in the brains of mice.  As someone who doesn't sleep all that much and who knows a lot of other people who I can reach just about 24/7 this has attracted my attention.

My criticisms of this study:  1) These are mice, not just that, these are horribly deformed and inbred mice that are produced to have no fear of human beings AND to be genetically identical.  2) For further evidence, Michael Jackson was reported to sleep no more than 3 hours per night and, last I checked, the King of Pop was doing just fine.

A new study takes aim at these observations using proteomics.  Mice were forced to stay awake, their sleepy little brains were extracted, the neurons were enriched on a density gradient, and proteomics looked at the differences.  Unfortunately, the results are a little underwhelming.  They found 80 proteins or so that were differentially regulated (1.5-fold) and the DAVID and IPA analysis was a little inconclusive.  The paper hints that further analysis is in works and that we'll know a lot more when they wrap up the next paper.  However, if you are interested in looking at neurons via proteomics, this paper has a nice and concise method.




READ MORE - Proteomics explores the damages of sleep deprivation

Sunday, August 10, 2014

DeMix workflow. Identify the other peptides you accidentally isolated


This is really really cool and currently in press (open access!) at MCP and comes from work done at Roman Zubarev's lab.  Edit:  Here is the link to the abstract (left it out before).

In a DDA experiment we pick the ion we're interested in that looks like a peptide, based on the parameters we provide that say "this is a peptide and it is probably one that will fragment well with the method that I'm using right now".  Then we isolate it and, too often, a bunch of shit around it.  Typically, we try to eliminate as much of those co-interfering compounds as we possibly can.  One of the biggest improvements the Q Exactive Plus has over the Q Exactive is that we can move from isolation widths of 2.0 to as low as 1.2 with very little loss in signal (I have heard at least one account of the QE HF being used with a 0.7 Da isolation window, but I still haven't got my hands on one of those magnificent boxes so I can't confirm...)

For years I've heard people kicking around this idea:  What if we identify our peptide from our MS/MS spectra and then we remove every MS/MS fragment that can possibly be linked to that peptide.  Then we're left over with the fragments from the peptides we accidentally isolated.  Lets then database search that and find out what that is.

And that is exactly what DeMix does.

Let me rant a little bit about how cool the workflow here is.  They ran this stuff on a QE with 70k MS1 and 17.5k MS2 and used isolation widths of 1 - 4 Da.  They converted everything over to centroid using TOPP (btw, they found better results when they used the high res conversion option for this data, so I'm using that from now on.  Next they ran their results through Morpheus using a 20ppm window and a modified scoring algorithm.  The high scoring MS/MS fragments were used to recalibrate the MS/MS spectra (just like the Mechtler lab PSMR does) using a Pyteomics Python script.  

Interestingly, when they made their second pass runs they tightened all of their tolerances and processed the deconvoluted MS/MS fragmentation events where the previously matched fragments were ignored.  I should probably finish my coffee and then work my way through the discussion, because I would have done it the opposite way (and, when we do serial searches in PD, that is the default workflow).  I'm not knocking it, I just find it counter-intuitive.

So what.  Did it work?  Of course it did, or it wouldn't have made it into MCP!  Final stats?  The QE was knocking out about 7 MS/MS events for every MS1.  Using this approach, they IDENTIFIED 9 PSMS(!!!) out of each 7 spectra.  They didn't get 2 ideas per MS/MS event, but they got about 1.2 which is a heck of a lot better than 1!

I can not wait to try this and I've got the perfect data set to run it on sitting right in front of me.  I'll let y'all know how it goes.


READ MORE - DeMix workflow. Identify the other peptides you accidentally isolated

Saturday, August 9, 2014

World's deadliest animals!


Thought I would share this.  I love the perspective.  Malaria - a disease that is virtually impossible to study with genomics techniques cause the involved genes mutate at a crazy rate, and only proteomics has a real shot of deciphering (its protein - antibody interactions! come on!)  

Shark week?  We need to have malaria week!
READ MORE - World's deadliest animals!

Friday, August 8, 2014

Computers designed for Proteomics processing!


It was just a matter of time before someone did this, right?  I get questions about this through the blog and through my day job all the time.  The truth is it isn't easy to figure out what is going to make a good processing computer.  You'd think it would be simpler, right?  Old computers will be slow, new computers will be fast, and expensive new computers will be the fastest.  Yet I've been in labs this summer that have dumped $3,000 + on Xeon desktops that are much slower than my quad core laptop I'm always bragging about.  To this day, I definitely can not guess why one PC is going to be slower than another.

Fortunately, I don't have to worry about it anymore.  These guys at OmicsComputing have put together 2 processing PCs, a basic "Omics Workstation" for fast processing and a super processing computer called, get this, the "Proteome Destroyer".

If you've read much on my blog, you know what a dork I am for stuff like this.  So my first thought was to see if I could borrow some time on one of these and run some stuff.  And I got to.  And they aren't messing around.



Sorry the text is small here.  What is it?  My favorite HeLa high high file ran on the Proteome Discoverer 1.4 demo.  I used the full human Uniprot.  Static modification of alkylated cysteines and dynamic mod of oxidized methionine.  Sequest took 57 seconds and Percolator (32-bit, not the awesome beta that I have accidentally shown you guys before) took under 6 minutes.  So...a whole HeLa high-high run in 7 minutes or so.  Pretty good.  I've done better on my overclocked quad core with the Percolator beta we're testing, but this beats the heck out of virtually every other run I've seen.

Okay, who cares, high high files are easy.  What about those big Fusion files that even my quad core suffers in processing (we're talking High/low and hugely dense data files)  20 minutes, with Percolator (misplaced the screenshot for proof, but I'll load it later.  I know its in one of these inboxes somewhere...).  I have never ever processed a Fusion HeLa file in under 30 minutes before....

Get this, though, apparently this isn't nearly as fast as this PC can go.  When I reported back what I saw (probably faster than I've seen,  but not mind blowing) they took a closer look at the runs and the processor wasn't running anywhere near maximum.  I guess it uses a very aggressive processing boost function when it is under high load.  Proteome Discoverer running a Fusion file wasn't enough to trigger high load functioning.  The PC was like, "oh well, I'll run this but I don't need to activate all of my cores or memory or anything."

So they tweaked the software or hardware or something so that it recognizes PD as a software that should be ran at full capacity and invited me to try re-running my files.  As you might guess, I'm psyched to test it out! As you also might guess, it may be a while till I can get to it cause I've got lots of other things to do.

You can check out their simple little webstore here.

TL/DR:  This  company designs computers just for genomics and proteomics processing and I'm pretty sure they are a whole lot faster than what you are processing your Proteome Discover data with.  And apparently, I didn't see the full capacity of what these computers can do!  BTW, they aren't nearly as expensive as you'd think, as they run from $1800 - $4000, crazy, right!?!?!








READ MORE - Computers designed for Proteomics processing!

Thursday, August 7, 2014

Proteomics reveals awesome new biomarkers


Sorry, I know.  That is super gross.  But I'm going for impact in my first post since my awesome technology-free vacation (rock climbing all over Appalachia!  wooo!)

A criticism I hear of proteomics is the lack of true results in the actual clinic.  We can hotly contest that one, obviously, but that doesn't stop people from saying it.  I love it when I can have a study in my back pocket to point people to that says "proteomics did THIS."

That's why this study in this month's journal Cancer is so awesome.  Proteomics did THIS.  Validated biomarker panel to detect cancer in the esophagus.  In the clinic.  Now.

The study I'm talking about is here and came from researchers in the Allegheny Health Network in Pennsylvania in conjunction with some researchers in Buffalo.  Interestingly, this is yet another group I've ran into in the last month or so that is successfully using spectral counting -- maybe its coincidental, but it is really starting to look like this approach is making a major comeback.  Something I really want to evaluate on today's super fast instruments (at Fusion speeds, can we get both sample depth and quantitative dynamic range? Maybe...again, thoughts for later!)

Are the approaches revolutionary?  Not really.  The samples are really cool.  The math is good (by that, I mean good use of statistics!) and ELISA validation means that you can rapidly move from a proteomics observation right into a good molecular lab in the clinic.  And even cooler that this study didn't go to MCP, but went right into Cancer!  (No offense, MCP, but as a biologist I'd much rather read a great proteomics study in a biology journal...)  Great example for the haters out there!

BTW, this study is getting lots of press.  You can read more about it here.
READ MORE - Proteomics reveals awesome new biomarkers

Thursday, July 24, 2014

Is the human proteome shrinking?


Is the human proteome (by that, I mean the number of proteins we can express) shrinking the way we've shrank the majestic mastiff into the super majestic pug?

Lets look at the evidence:


The Scientist this month (thanks Nastratin) reported the results of 7 studies that said we can express even fewer proteins than we thought we could.  For a bit of history, the human genome project initially reported about 30k coding regions.  Subsequent studies have found that a lot of those regions contain junk (by that, I probably don't mean mysterious epigenetic controls...or do I...?)

How did these studies determine that these other regions are not coding?  By looking at in-depth proteomics studies, of course!  One way of doing this would be to say "hey, we've ran 7 billion proteomics samples on tissue to this point in time and NO ONE has ever seen a peptide from this protein."  That is one way, right, but that doesn't rule out the possibility that this thing is in plasma and has a copy number of 10 proteins per mg of plasma, right?

This group took the evolutionary approach.  Genes that are highly conserved among many species produce proteins that are essential to life.  The more essential, the more species carry them, and the more conserved they tend to be.  What if we then just go to the gene sequence and compare that sequence against monkeys and dogs and a bunch of other things?  If no one has seen a peptide from this protien & this gene is not expressed by our closest relatives (or it is highly modified) & it has a structure that looks very unlikely to be a protein THEN we can probably safely say that isn't a sequence of DNA for making protein (its probably for mysterious epigenetic weirdness, cause its unlikely its just taking up space, right?)

The conclusion is that human beings can probably express about 19,000 proteins, which probably means their are only 4 billion proteoforms....

You can read the original article (open access) here.

You can read the article in the Scientist here.




READ MORE - Is the human proteome shrinking?

Wednesday, July 23, 2014

Search GUI!


I can't hide my love of the friendly and powerful DeNovoGUI from you guys.  What if that easy interface also had a cousin with 4 integrated search engines and the display visualization of the cool PeptideShaker?

Then you'd have SearchGUI, yet another thing that has been around forever that it completely new to me.  Sure, these open source tools have weaknesses, but with their powers combined, you are looking at a very nice free (and easy to use!) tool

You can get SearchGUI here.
READ MORE - Search GUI!

The truth about mad scienstists


I tried tracking down the original source for this:  seems like its redditor:  Dualaction2.
READ MORE - The truth about mad scienstists

Tuesday, July 22, 2014

Proteogenomic characterization of cancer!


I have been waiting for this stuff to start showing up in bulk!

Most biology labs now have tons of "next gen" genomics data on their cells of interest.  Increasingly, these labs are all acquiring proteomics data.  The obvious next step?  Making these data sets work together.  Yes, some stuff has trickled out here and there, including some cool strategies for database reduction.  But I think we've seen the tip of the iceberg so far.

In this week's Nature, we start to see what is coming!  In a huge study that features members of the Tabb lab and Reid Townsend and many others, the power of this approach is really explored.  Proteomic data from colon and rectal cancer was compared to data from "next gen" sequencing data and tons of new data was discovered.

The punchline?  This NATURE paper required no mass spectrometer.  This data was already deposited in The Cancer Genome Atlas.  This was essentially a meta-analysis but comparing these datasets was powerful enough that it slammed into a journal this big.  Again, this is the kind of stuff that is coming....
READ MORE - Proteogenomic characterization of cancer!

Monday, July 21, 2014

More on protein quan trumping mRNA quan


More (now old...2013...where have I been...) evidence that protein quan is superior to RNA quan.  In this cool paper in Science from November, researchers looked at the expression levels of RNA and protein in:  humans, chimpanzees and rhesus (monkeys...I don't care what the correct term is).

The conclusion?  While the mRNA predicted tons of variation in expression levels, none was seen at the protein level.  And this makes sense, right?  Why would our close neighbors have completely different levels of the proteins that we have?  The conclusion in this awesome paper (like the one from Saturday) is that the real selective pressure on regulation is at the protein level, where we can exercise a much more fine level of control.  Thanks to Michael Ford for tipping me off to this great study.  And kudos to these researchers for such an elegant experimental design, because its easier for us to forget our primate cousins since we don't have to look at this every day:


READ MORE - More on protein quan trumping mRNA quan

Sunday, July 20, 2014

Heavy analysis of the human proteome drafts


I'm certainly not the only person who has jumped on the new resources provided by the human proteome drafts and checked them out.  In this brand new paper in JPR, a group out of Madrid takes a look at some of their favorite proteins in the human proteome drafts and comes back with an interesting analysis.  (Abstract here.)

I love the fact that, in this paper, they did the same experiment Alexis and I did the day the drafts came out.  We chose proteins that we knew would lead to cancer if they were over or under expressed and analyzed those.  This group took proteins from nasal tissue (olfactory receptor proteins) and looked for those in the various tissues.

At first glance, the image on the abstract looks pretty damning:


These are olfactory (smelling) receptors.  What are they doing being expressed in colon cells and platelets?!?!  (It is worth noting that the image above is from the HumanProteomeMap.org (the data from the Pandey lab).

The authors of this analysis indicate, even in the abstract, that the "experimental data from these studies should be used with caution."  And I agree.  There is inherent error in studies this big; hell, a 1% false discovery rate on 100 million observations is 1 million observations that are false, right?.  But...the experimental data from every study should be used with caution.  And we all know that (by "we" I mean you proteomics experts who read this.)  I am glad that this caution is stated, though, for the people outside our field who have discovered this resource through mainstream news outlets.

That being said, I have some problems with this experimental design.  There are 3 big assumptions being made here:
1) The annotation of these proteins are 100% correct
2) These proteins have 1 function
3) These proteins only function in one tissue

Number 1 is easy.  Annotations suck.  The system for annotation sucks.  The first person to identify a protein in the first tissue gets to name it, right?  So there are tons and tons of proteins named in tissues that are heavily studied.

Number 2 is relatively easy, as well.  Making new proteins takes a ton of energy.  Evolutionarily (that's not a word? whatever...) there will be a lot of pressure for proteins to function in more than one way, in more than one context.  (Side note, one of my graduate committee members, Jiann-Shin Chen proved the first dual substrate enzyme...in bacteria...in the 1970s...sorry, couldn't find the link, I'll add it later if you're interested).  Considering the sophistication of eukaryote proteins, it is naive to think that if a protein is annotated as "Butt_itching_protein_1" that it would ONLY be utilized in the itchy butt response pathway.

Number 3 is an impressive coincidence.  Like millions of Americans, I subscribe to "I Fucking Love Science" and get Elise's feed of cool articles.  From this feed I know that:  zebrafish embryos highly express functional olfactory response proteins and olfactory receptors are highly active in human skin.  Heck, I've looked through more than a few high quality proteomics assays and seen "olfactory response proteins" in bunches of different tissues.  So...I think this was a poor choice for analysis.

TL;DR:  Please interpret the results of the human proteome draft maps with caution.  They are draft maps.  Two, consider proteins in an evolutionary context before using those proteins to generate excessive criticism of datasets that a ton of work went into.

Thanks, Karl, for suggesting something to read over coffee this morning!
READ MORE - Heavy analysis of the human proteome drafts

Saturday, July 19, 2014

Peptide quan beats RNA quan?!?!?!


This is old news, I guess.  At least everyone at the bar last night knew about this except for me.  In my defense, this field publishes a ton of stuff.  As most y'all know, my background is biology, microbiology to be precise. While I never ever did RT-PCR, I am (was?) under the impression that nothing would give you verifiable insight into the amount of a gene product in a cell the way that technique does.

What if it isn't as precise as peptide quan.  What if it wasn't even close?

Well, it seems like an accepted fact now that it is the case.  For a breakdown, take a look at this paper in Nature Reviews Genetics.

Highlights?
Protein abundances are more conserved among species than RNA abundances.  There is plenty of evidence that living systems have protein abundance levels that they are happy with and a whole lot of that regulation is at the post-translational level.

mRNA transcript abundances only partially correlate with protein abundances.  Right.  I guess it seems obvious, I think of the protein levels in a cell as this dynamic mixture, but I assumed that we could tell how much protein is present -- in a very linear way -- from the amount of RNA present...and all the evidence says that we can't.  Post translational regulation is very very important to the amount of protein that is going to be present.  Again, probably obvious, but I'll have to change my mental framework around a little.


READ MORE - Peptide quan beats RNA quan?!?!?!

Thursday, July 17, 2014

CorConneX -- tools for zero dead volumes


Something cool I learned about this week is the CorConneX.  I have to admit I don't have a concrete understanding of how this works, but it was highly recommended by someone who knows what she's doing!
You can read more about it here.  The gist, however, is that it automatically (robotically?) makes zero dead volume junctions with silica columns and traps and lines and as many as you want.  This would be especially useful for systems that aren't, for whatever reason, compatible with nanoViper fittings.
READ MORE - CorConneX -- tools for zero dead volumes

Wednesday, July 16, 2014

Cold spring harbor proteomics course!


This week I was lucky enough to get to help out at the Cold Spring Harbor Proteomics Course.  If you aren't familiar, it is probably the most intense bootcamp in proteomics in the world.

Want to learn proteomics?  Sign up for next year.  The amount of stuff the instructors run through is just boggling.  If you can think of a proteomics experiment, chances are they at least talk about it at this course --heck, they probably do it.  Dave Muddiman showed up last night and taught a class on imaging mass spec that ran long after dark.

My contribution was to set up a shiny new Q Exactive and to train people on it and to provide instrument support.  I've heard from multiple people that it is pretty tough to find and hire skilled proteomics experts these days.  If you are in this quandary next year, you might want to considering hiring someone who is smart and motivated to work hard and send him/her up there.
READ MORE - Cold spring harbor proteomics course!

Tuesday, July 8, 2014

SIEVE or PD for label free quan


In the context of obtaining peptide IDs, nothing can aid you the way good chromatography does.  More and more, I'm seeing that this is paramount with today's super fast instruments.

What about for assigning quantitative data?  How important is my chromatographic alignment?  SIEVE is a program that puts chromatography up front -- data in tight m/z windows within a small retention time windows are chosen as "frames" and those are what you do your quan on.

Proteome Discoverer has a label free quan node - the "Precuror Ion Area Detector".  When used in conjunction with an event detector, you can pull out ions from your files within a narrow ppm range (maximum is 4ppm! I use 2ppm) and compare those peptides.  Retention time is never considered in this current iteration.

This is how I've always done label free quan.  Mostly because my previous employers did not purchase SIEVE.  I'd manually export my label free quan data and use a short script in DigDB to remove matching peptides that did not match in retention time (or pull out the PSMs with the list the retention times, subtract them and throw out anything on the outside of my retention time window, then recompile the report).

Yes, it is a lot of work, and essentially a work-around.  But if you can't afford another software package, you have a backup plan.  You can follow a link here to a video I made regarding setting up PD for label free quan.



READ MORE - SIEVE or PD for label free quan

Wednesday, July 2, 2014

Sample collection. Level? Super robot!


This week I've been working up in Cape Cod.  Tough life, I know....

The proteomics facility at Woods Hole studies biomarkers throughout the ocean and found at different depths.  A complication of the research here is that sample collection requires dragging long (miles long!) cables below the ocean and pulling them up.  This process takes a long time.

Fortunately, WHOI has an amazing capacity for building submersible robots!

Enter CLIO, a robotic submersible for capturing proteomics samples (and genomics and RNA and metabolomics and other boring stuff) from the oceans of the world.

CLIO's job will be to drop down to incredible depths, filter sea water, store it and return it for proteomic analysis.

You can read more about CLIO here.

READ MORE - Sample collection. Level? Super robot!

Tuesday, July 1, 2014

Captive spray is back for mass specs!



Almost a week with no entries!?!?!  I've been busy...  This one is definitely worth a few minutes of writing.

Captive spray is something that has a lot of fans out there.  Recent big business dealings made it go away, however, and now you can only purchase it for NMRs or for instruments made by an NMR company or something.

I recently found out that a new company is now producing these sources again for mass spectrometers.  If you are interested, check out Ultra-FAST at this link.  If English is your language, there is a small button in the right corner that will translate the page.

READ MORE - Captive spray is back for mass specs!

Thursday, June 26, 2014

ProteoSuite. Friendly free tools for quan!


There is a lot of proteomics software out there.  And...some of it is great!  This leaves a window open for people to write more stuff...hopefully great stuff!

ProteoSuite is a new package.  Why would you want to use this one when there are so many accepted free programs out there?  Actually, I don't know...

Wait!  I have a reason.  The TPP is imposing.  Sorry, it is.  Is it powerful?  Absolutely.  Is it easy for someone who has never done this stuff to start running on it?  No.  It is not (again, sorry, I seriously love you guys but if you are a biologist who just wants to re-assess an existing dataset this is not the tool you want to use.

MaxQuant isn't that much better, honestly.  Yes, I know there are tons of resources to get going.  Videos and publications and we all know how good that software is.  But you aren't going to download MaxQuant this morning and have quantitative data processing while you are at lunch.

ProteoSuite aims to bridge that gap with a friendly interface.  Check out how big the arrows are that they use!


ProteoSuite is still in development, but tutorial videos are on the site.  Example data sets are on the site.  ProteoSuite is putting accessibility and ease of use at the forefront.  And there is room in the open source quantitative proteomics world for that.

You can check out ProteoSuite here.  They are encouraging beta testing and looking for feedback.  If you decide to help them out, please post some of your feedback here somewhere.


READ MORE - ProteoSuite. Friendly free tools for quan!

Wednesday, June 25, 2014

Biotechniques article on the human proteome drafts


Yesterday's issue of BioTechniques had an interesting article on the human proteome drafts.  The best part of the article is a quote from Neil Kelleher that really puts this accomplishment in perspective.

You can check out the article here.
READ MORE - Biotechniques article on the human proteome drafts

Tuesday, June 24, 2014

Tornadoes don't like metabolomics?!?


Today's iORBI in Indianapolis was interrupted by a tornado that touched down nearby.

Fortunately, I understand it only interrupted a talk on metabolomics.  :)


READ MORE - Tornadoes don't like metabolomics?!?

Protein isotope envelope fingerprinting


A concept I was introduced to at ASMS is isotope envelope fingerprinting.  Some research on this topic revealed that the idea isn't exactly new (just another new one to me!).  The idea is that under certain conditions the number of charges that a protein accepts as well as the masses within the "convoluted" isotope envelope would be enough for a protein ID.

It makes sense on some level, right?  I've ran the Waters IGg standard dozens (hundreds?) of times in my life, and it pretty much always looks the same.  The fans of isotope envelope fingerprinting want to exploit this and leave MS/MS for other assays (or for confirmation).  My guess is that it would be fine unless we are looking at complex things.  I can easily pull the 5 proteins in my normal intact/top down mix apart by just their envelope, but I would love to see data on complex runs.

Protein Goggle is a program that can be used to compared IEFs (not to be confused with IEF...sigh....).  If you're an academic researcher, you can get Protein Goggle for free to check it out.  If you're me, you probably wouldn't have time to try it out anyway!

The interface looks pretty straight-forward.


You can read more about this concept and the software you'd need for doing this type of analysis here.
READ MORE - Protein isotope envelope fingerprinting