bio, math, cs, object manipulation

Friday, April 16, 2010

Network Biology 2.0 part 5

Aviv Regev gave a talk "Unbiased Reconstruction of Mammalian Regulatory Networks". This was definitely one of my favorite talks of the conference. She had previously done work that I really liked with Daphne Koller (like the module network paper in nature genetics). She started by saying how she wanted to apply the lessons she had learned in yeast network reconstruction to mammalian models. She wanted a primary cell that actually reflects cell biology and a model where transcriptional responses played a major role in environmental responses.

She chose to work with dendritic cells as they sense large clasess of pathogens via a cohort of receptors and that a lot is known about the receptor pathways but not as much about the transcriptional response.

Her basic flow for doing this was to gather mRNA expression profiles in time course, select regulators, select a minimal signature of regulated genes that was most important, perturb each candidate regulator then measure signature of regulated genes after perturbation and derive network model.

It seemed to me that the main new thing from her previous work was the choosing the optimal genes to measure and the usage of a neat new expression technology for ~200 gene size. She showed real improvement and integration of data in her work.

Eric E. Schadt gave a talk on "Moving Toward an Understanding of the Molecular Networks Underlying Biological Hydrogen Production by Bacteria". Like the first pacbio talk this one was incredibly piolished and really impressive. As was pointed out in the first talk distinct nucleotide modifications (methylation for instance) create distinct changes in how their SMRT system reads a base. This resutls in letting them create "Kinetic signatures" for each genome. He created these kinetic signatures across 125 strains of R. palustris across the whole genome. He chose R. palustris as a possible bacteria that would be an efficient way to produce hydrogen.

He found that hydrogen production varied signifcantly from strain to strain making a population based systems genetic approach viable. He used the kinetic variation across the entire genome as covariants and mapped them like eQTLs. He then constructed a regulatory network from this variational data.

I found this talk (which I don't really do justice) immensely impressive and I think that pac bio's technology will definitely be something to watch out for in a huge way.

Nicholas Eriksson gave a talk on web-based parallel gwas. He is part of 23andme.com and talked about how they not only genotype their customers but also create a social network for them. In this network they create surveys that ask their customers questions. They use their 20,000 responding customer base with questions from their surveys. They found a few novel SNPs associated with various things like curly hair and I believe parkinsons.

They also stated in a follow up panel that they were indeed likely to patent the novel genes they discovered which I personally find completely ethically reprehensible.

Network Biology 2.0 part 4

The second day of talks at Network Biology 2.0 was yesterday and there were a lot of talks I found really insightful.

Andreas Califano gave a talk "Interrogating Cancer Interactomes to Optimize Therapy on an Individual Basis". He talked about how the initial great hope of cancer genetics was to go from genetics to science to distinguishing and curing human cancer. This paradigm has of course turned out to be completely wrong. He advocates that these things are all mediated by complex epigentic/cell dynamic/environmental/genetic factors wrapped up in dynamic regulatory logic.

He talked briefly about ARACNE a program for reconstructing regulatory networks from gene expression data that his group created and then MINDy a program for reverse engineering conditional transcription factor interactions and lastly MARINa: MAster Regulator INference Algorithm (you know if you keep making acronyms like that it's kind of meaningless) which seems to be looking for the most powerful transcriptional regulator nodes. He defined a master regulator as a gene that is necessary and or sufficient to induce a specific cellular transformation or differentiation event.

He made the argument that in cancer research it was compelling to either describe disease on the level of individual epigenetic alterations which are largely patient specific or via expressed phenotype which are largely homogenous and proposed that a better more useful way was to look at this master regulator model of abstraction.

Using ARACNE and MARINA he was able to reconstruct a regulatory network and identify 6 TF regulatory modules controlling ~80% of the MGES (mesenychmal?) genes.

He then introduced another program IDEA - Interactome-based Dysregulation Enrichment Analysis which looks at dysregulated interactions occuring by more than random chance as a ranking methodology.

He then summarized with 5 points on cancer medicine:

The current emphais on genes harboring epigenetic alterations is inappropriate
The current approach to biomarker discovery must be re-evaluated from GWAS to PWAS (Pathway Wide Association)
That statistical power cannot be sacrificed for coverage
We must fundamentally rethink clinical studies because we cannot use a sample of one
One drug for one disease paradigm needs to shift to a toolkit of target specific drugs

He also made the point that you don't need to fully silence a gene via shRNA for it to be an effective treatment but rather just partially, that in fact fully silencing a gene would probably be fatal.

I was really impressed by this talk, he seemed to really understand the algorithmic underpinnings of his work and I will be definitely investigating more about his methodology. His idea of pathway analysis I believe took some inspiration from GSEA and rightfully so.

Wednesday, April 14, 2010

Network Biology 2.0 part 3

The next speaker was Michael Cusick who talked about interactome networks and human diseases. He talked about basically doing yeast 2-hybrid studies crossing all proteins with all proteins. My mind still boggles at the scope of projects like this and I think it's incredible people do them. I find it interesting that they mention how their statisticians tell them they need something like ~10x coverage to fully recover the interactome but funding agencies seem to disagree. I think that these sorts of studies need the sort of tentpole funding that happened with the human genome project, the amount of funding they need is still an order of magnitude less than that and their resulting work is no less important it just doesn't have that sort of visceral appeal that the human genome project had to the public.

at first he only mentioned binary models which I took issue with but he seemed to indicate that they are moving towards a probabilistic model which is obviously the better choice.

He also spoke at length about literature curation and literature generated protein interaction networks, the literature of which I'm going to read in hopes of adapting it to genetic regulatory networks.

Lastly he talked about something called edgetics which is studying the phenotypic perturbations caused by removing a link between two proteins as opposed to knocking out a protein. He showed that edge perturbations gave rise to unique phenotypes which isn't so surprising, but I think it's fascinating that they were able to do this. I'm definitely going to be delving into these papers.

Robert weinberg then talked in great detail about cancer biology and morpohology which I frankly did not follow well at all, it seemed realy meaningful and his presentation and speaking was really great.

James Collins gave a talk on network biology and drug discovery. He stated that given inputs and outputs the network inside could be recreated, which isn't exactly true, some network that gives those outputs can be created but there are probably numerous equivalent networks that could create the same outputs from the given inputs.

He went on to discuss how synthetic biology could be used to perturb a cells regulatory states to learn the regulatory interaction network. However he did not elucidate what advantages this technique has over say knockdowns or knockout techniques (it may be more advantageous I just don't enough about it to say why).

He then went on to talk about how it seems likely that antibiotics that kill bacteria tend to do so by inducing the creation of hydroxyl radicals in their cells. This lead him to talk about how sub lethal levels of antibiotics didn't just not kill some bacteria but that they still induced hydroxyl radical formation which resulted in increased mutagenesis of the bacteria and increased resistance across antibiotics, scary!

Network Biology 2.0 part 2

The next talk was by Steve Turner the pacific biosciences founder where he talked about not surprisingly, pacific biosciences 3rd gen sequencing technology SMRT. This was I think my favorite talk. He had a super polished presentation and talk but in addition there was a lot of really awesome stuff in it.

He went through his single molecule technology which uses these things called zero mode wave guides to sequence. They appear to be focusing on laong read length and accuracy. In sequencing the e. coli genome they had an average read length of 586bp with a max read length of over 2000bp. He stated that the main cause of sequencing termination is damage by the lasers which I thought was interesting and said one thing that they were doing was pulsing the laser on and off. Since they circularize the dna strands in their technique this results in the polymerase continuing to sequence the strand over and over again, only actually recording the reading when the pulse was on. I don't entirely understand how this is useful but it seemed to be.

The really awesome thing that he described was how they methylation of a nucleotide causes a recognizable signature. That at this point they are able to identify methylation of base pairs via their SMRT technology and that this same principle can be applied to any sort of base pair modification potentially, although that appears to be a work in progress. I'm not sure if he mentioned or whether it's the case that the SMRT technology requires amplification. So I'm wondering if it does, won't the base pair modifications generally speaking be lost in that step, sort of negating that usefulness.

He additionally mentioned that they were working on a direct RNA sequencer using their SMRT technology. He mentioned it in the context of sequencing viruses but if you could do direct mRNA sequencing with it, and detect their post translational modifications I think that would be an incredibly revolutionary point in gene expression studies.

Direct RNA sequencing while recovering all post translational effects would allow for all kinds of crazy shit like correlating among inter-gene modifications. At this point with RNA-seq you can already do something like this with exons but because of the typically short reads it seems like it could be problematic.

In addition this would potentially exponentially increase the feature space of a transcriptional regulatory network, hopefully the increase in the ability of sequencing technology would allow for it to still be useful, but as it seems all of this stuff is in the next 5? or so years in the future we'll just have to wait and see. Regardless this left me really excited.

He closed with using ribosomes to do translational sequencing which is still not a solved problem for them but represents an amazing opportunity undoubtedly. I was too busy geeking out thinking about the RNA sequencing so I didn't pay attention to this as well as I should.

Network Biology 2.0 part 1

The first day of the network biology symposium ended and I have a few impressions about the talks so far and I'm going to break this up into multiple blog posts across speakers.

There was a much greater emphasis on the biological results than on the algorithmic machinery to produce those results. Perhaps as a biological seminar this shouldn't have come as a surprise to anyone but I felt that the emphasis may have been a bit too much on results and less on a demonstration of the accuracy of the methodology and the novelty of the methods used.

That being said the biological implications presented here were very cool if not completely in my area of interest at times and some of the methodology used and things attempted has far reaching implications I believe.

The first speaker was George Church and he talked in broad strokes about the current state of the art in genomics and next gen sequencing. He made the point that I'm sure has been made previously that DNA sequencing technology is increasing at a rate of 10x/year in comparison to moore's law which is a rate of 1.5x/year.

He talked in length about the growth of various -omes that will allow for an increase in understanding of cancer and also made a big big point about how so far every attempt to anonymize biological data has failed and that the answer is not to anonymize but to get the informed consent of the subjects prior to the fact. He pimped his attempt to do this personalgenomes.org which looks very cool.

I'm glad to see some of the really famous people in biology get behind the idea of open science. Walled gardens of data are already hurting scientific advancement and our abilities to build useful tools to analyze the data and create predictive models.

What we can learn from google?

Since I'm stuck here waiting in the lobby I will talk a bit about something I've been thinking a lot about, large scale datasets and large scale machine learning. This poston google's blog a few days ago has had me thinking. A large number of biological problems are classification problems, does this genotype or this gene expression profile mean you'll get this phenotype, for instance is a hugely targeted goal in the research community. This is sort of relatedly the goal of a lot of GWAs studies, find SNPs X associated with phenotype Y.

However most of this data is walled off. Researchers don't want to share data, it takes extra time and it may help other people get papers out of the data that the original generator could have. In addition there have been problems with data across experiments being comparable, especially with microarrays.

So I'd like to propose something that won't actually happen but would be nice if it did. If we created a simple, easy to use data collection and annotation database that was maintained by both the authors and others. GEO has the issue of the data only being updatable by the author which can lead to really out of date information. ArrayWiki attempts to solve this problem but it only deals with a somewhat antiquated technology, microarrays, and hasn't caught on hugely in popularity. A real time curated database would be a substantial investment but it would allow for us to build and leverage large scale machine learning tools like google is currently developing, which I think would allow for substantial scientific discovery

Network Biology 2.0

The main impetus behind starting this blog was that I wanted to follow in the vaunted footsteps of a friend and blog a conference. Now the time is upon us as I am sitting in the lobby of the broad institute waiting for the Network Biology 2.0 symposium to start. Network Biology 2.0is a symposium sponsored by GNSbiotech and the broad institute. I must say being somewhat of a country yokel in the wide world of science I feel somewhat awed being in this building.