Archive | February, 2013

I have a headache reading about ENCODE: moving into the realm of “big science”

28 Feb


I spent the past few days reading about ENCODE: the ENCyclopedia Of DNA Elements, which is generating a lot of fuzz right now – why does reading about it give me a headache? What is ENCODE? This is a great chance to talk about this “big science” project, and to learn how communication of scientific results can become a mess…

The genome is a collection of genetic codes, based on which an organism (like us) gets the traits and features the organism has. These traits and features come from many processes within the cell – the codes are transcribed and translated to become chains of amino acids, which are then modified to become proteins, which are then transported to where they need to be, and essentially become the building blocks for an organism. Now, after the Human Genome Project, we have an idea what the long sequence of codes looks like – 3164.7 million chemical nucleotide bases, each is represented by a letter of A, T, C, or G. This is massive! If we were to print this out letter by letter, apparently we can fill two hundred 500-page telephone directories. The ENCODE project (420 scientists, 32 labs around the world) aims to go a step further. It says on its project website:

The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.

This is an important step, because simply knowing the codes does not tell us what they really do. But how do you start building a comprehensive list when you have 3164.7 million nucleotide bases to go through? In general, the ENCODE approach is this – let’s imagine that you are doing online shopping at ebay, which has lots and lots (and lots!) of products. Some are useful, working products, and some are not. You want to get a clock, but looking for it one after another is simply taking too long, so instead you look for specific “features” – something with gears, a circular face with numbers 1 to 12 on it, with hour/minute hands, and so on.

This is actually a pretty smart approach. In their 2012 paper, the ENCODE team looked for 4 features: regions of transcription, transcription factor association, chromatin structure, and histone modification, because these are elements that likely matter if we are to search for something specific (like, a clock) later on.

So what’s the problem? It mostly comes down to one word – “function.”

In ENCODE’s news release, they stated that

[…], researchers linked more than 80 percent of the human genome sequence to a specific biological function and mapped more than 4 million regulatory regions where proteins specifically interact with the DNA.

The news release further stated that “most of the human genome is involved in the complex molecular choreography required for converting genetic information into living cells and organisms.”

This sent a shock wave throughout much of the science community. From what we learnt about DNA and human genome so far, we know that a large proportion of the sequence is not “functional” – doesn’t code for a protein and doesn’t seem to have specific purposes in the cell. It is what we called “junk DNA” (terrible term, because not having immediate functions doesn’t mean that it should be thrown out – so many scientists avoid the term). 80%  is much, much higher than what was expected by most scientists. This discovery by ENCODE was picked up immediately by media, marked as “an overturn of the junk DNA theory” *cringe*. A new breakthrough in the field! – or is it?

Just because it has a gear doesn't mean it is something "functional" (image by Catherinette Rings Steampunk)

Just because it has a gear doesn’t mean it is something “functional” (image by Catherinette Rings Steampunk)

You might have figured out what doesn’t seem quite right here. What ENCODE identified were “functional elements” – elements that suggest the possibility for biological functions. Just like not all products with gears are actually “functional” (it could be a clock, a broken watch, some “as seen on TV” product, a bag of random mechanical parts, or a craft project glued together by your 4 year old nephew), identification of functional elements does not equate to actual biological functions in your cell. And having functional elements does not confirm involvement in critical cellular pathways or association with important functions in the cell.

After the immediate media hype, other scientists expressed concerns (to say it lightly), but it was too late (read A Genome-Sized Media Failure by Michael White). This also leads to a very recent, rather aggressive paper by Graur et al refuting the claims by the ENCODE project. This whole thing is now very messy (that’s why I was having a headache) 😦  I won’t elaborate much further, but if you want to know more about this – see Further Reading.

The funny thing is, ENCODE could have been more specific, could have chosen a less controversial term –  like “specific biochemical activity” as suggested by PZ Myers, or perhaps “ability to bind to cellular factors.” If they did not attempt to over-reach the claim,  the focus would have remained on the amazingly huge amount of information that ENCODE provides, which can now be analyzed by scientists around the world to enable us to know more about our genome and how it works.

The ENCODE experience is probably good for science (I didn’t say it is going to be a pleasant one…). We now have this enthusiasm/obssession about big science, that there is so much pressure to get the “next breakthrough” out, to create the next hype. But we should really come back to the objectives of these big science projects – for ENCODE, it is about building an informative genome database for scientists – and disseminate well-supported information to the public and media with adequate explanations.

And, scientists or not, we should remain curious yet inquisitive about “breakthrough” discoveries in the future 🙂


Postscript 1: This reminds me of the OPERA discovery about neutrinos travelling faster than the speed of light – which was found out later to be the result of equipment/calculation errors. Even though in the end this went down not so nicely, at least they right out stated that they were not sure what was going on, and invited everyone to help figure out whether this was a true discovery or an error (In fact, this sparked a lot of good public discussion about particle physics, which was awesome). I gave the OPERA team kudos for that.

Postscript 2: While I am a little sympathetic about the situation ENCODE is in, I don’t have much good to say about ENCODE’s public promo video below. Neither the Human Genome Project, nor ENCODE, is a shortcut to drug discoveries and treatments for rare diseases. They are however critical  steps toward the understanding of how our genome works. It will take a lot more efforts in the future to tease out specifics – and the video seems to convey the message the ENCODE is much closer than Human Genome Project in finding cures for diseases (it isn’t…we don’t even know where the end is…)

Further Reading

How Astronaut Chris Hadfield Brings Space Closer to Us

26 Feb

I have to admit that before I knew about Chris Hadfield, Canadian Astronaut and the current Commander for the International Space Station, I didn’t care so much about what’s really going on up in space. Sure, I know about a space mission here and there, know what caused the Challenger tragedy (I grew up reading Richard Feynman biographies), and do have an idea about the Solar System and the Universe (kinda have to because I work for the Department of Physics & Astronomy!). But Chris is special – this ISS Commander makes me feel that space is reachable and relevant, and he is the awesome friend that I want to hang out with all the time (I guess I am not the only one feeling this way – see this National Post article).

First there was the awesome twitter exchange between Chris Hadfield and William Shatner (a.k.a. Captain Kirk from Star Trek):


And then this music collaboration with Ed Robertson from the Canadian Band Barenaked Ladies (by the way, Barenaked Ladies wrote the opening song for the TV show the Big Bang Theory):

He even made making a honey peanut butter tortilla sandwich so much fun to watch!

Even occasionally answered questions from space (I hope this little girl becomes an astronaut or mathematician):

And shared amazing photos from space through his twitter account.

Chris Hadfield  Cmdr_Hadfield  on Twitter

As cool as all these are, the sad thing is that the Canadian Space Agency is currently facing budget cuts and layoffs. I cannot imagine Chris Hadfield being able to do what he does now without the help and support from the Canadian Space Agency. I really hope that with Chris’ popularity, more people can pay attention to the difficult situation the Canadian Space Agency is facing right now, and encourage the Canadian Government to continue  funding programs by the Agency.

Chris’ mission on International Space Station is until mid-May I believe. Meanwhile you can follow him on twitter, Facebook, or subscribe to the Canadian Space Agency’s YouTube Channel, where Chris’ videos are posted. The Guardian has a nice coverage of how Chris grasps our attention through social media.

Postscript 1: I think this is a great example of how social media can be so successfully used to engage an audience and get people interested in science and astronomy!

Making Bread Is Like Running Science Experiments

22 Feb


I made bread for the first time the other day during a bread making class! I quite enjoy cooking and baking. Not so much because it is a girl thing to do – my mom didn’t actually cook much, and I spent much more time working than cooking. If anything, I think it is because nothing relieves my stress better than chopping up vegetables with a knife on a chopping board (right, try not to get into my kitchen when I cook). My favourite recipes are the ones for apple crumbles and cheesecakes. This is the first time that I ventured into bread making.

Making bread, it turns out, is very much like running  science experiments. You have this bread that you would like to make and you think you know how to make it. In science, this could be a chemical reaction you want to make happen, a cellular process you want to study, some kind of interactions between organisms you need to analyze, and more. So you start out with your ingredients – as a beginner in bread making, you have flour, salt, yeast, and water. First of all, you have many factors to think about and many questions to ask yourself: How much of each ingredient do I add? How long will I need to leave the dough to rest (ferment)? What kind of flour do I use – and does that affect how much water I add? What temperature do I bake my bread in, and when do I turn the temperature up or down? It comes down to finding the winning combination by trying to change one factor at a time, or changing multiple factors but carefully record them so you can see if different factors interact with each other (when you do multifactorial analysis in science experiments, statistics is very important – I used to have to do this with my experiments and it was NOT fun).

But sometimes it doesn’t happen as planned. Sometimes you fail completely and after a few times you know that you will need to revise your recipe, or perhaps you are going to make a different kind of bread after all. Occasionally something unexpected happens – you follow what you think is the correct instructions to make a loaf but end up making a baguette – what happened!? It could be a silly mistake, or it could be a serendipitous discovery. Either way, the end result is exciting (and you obvious will eat the baguette :P). And this is why note taking and paying attention to details become important in bread making and science experiments, because if you don’t do so carefully you might not get baguettes again!

Once you are happy with the results and are familiar with the procedures, you then start asking more questions: If I know what each ingredient does to my bread, can I put together some sort of general guide that will allow me to get an idea what my bread is going to turn out before I even put my dough in the oven? Or maybe it is the other way around – that you baked so many different types of bread that you can start extracting how an ingredient works? Can I engineer a machine to make bread automatically – a bread maker? Or build a factory that can make 10,000 of the same bread (or of different types) at the same time? What happens if I introduce another factor – say, if I want to add pecans to my bread? Or do I start finding ways to evaluate different types of breads I get from all over the world? And once I have my wonderful bread that I think no one else made before, I will write the instructions down as a recipe, and state the kind of bread one can expect so that others can make the same bread. I then post the recipe (or even the mistakes I made) online so that I can share what I learn with others, just like I learn from others before.

Now, there are a few important things to considered here. Why share recipes? Part is for personal pride (“Look, I made this!” Much like what I am doing right now I guess). The other part is so that everyone get a chance to see what each other does, share experiences (good or bad), create new recipes, or even make the recipes better. The bonus is that one day you might decide to open a bread shop, and seeing what others do will help you decide who you want to work with perhaps. And this ability to share recipes is very important (how do you know you have a special bread if you don’t know what others are doing?). This is why scientists share the results of their research with other scientists through scientific publications. Also, if you are not careful in planning your recipe or writing down exactly what you do, others cannot repeat what you made, and cannot move forward to study the recipe further, to build on the recipe, or to make improvements to the recipe. It also becomes difficult to tell whether your recipe is truly a new discovery or some kind of random artifact. In science, the ability to reproduce another scientist’s results by following the same method is also extremely important (“reproducibility“).  And one more thing to note is that while bread making deals with bread, scientists usually deal with organisms or systems that interact with each other and/or the bigger environment, in very diverse fields – thus making scientific research more complicated than making bread. Bread making resembles my experience in scientific experimentation, but the field of science is so broad so others might have different paths (this is my disclaimer, haha). Scientists also try to answer one question after another, running one experiment after another, so to make this a never-ending quest of scientific knowledge.

Right, these are the kinds of things that I thought about when I was in my bread-making class. (Nerd Alert! :P) Here are some photos from the class. My instructor was Florin Moldovan, who used to own the Transilvania Peasant Bread on West Broadway. You check out his blog Baking Stories. By the way, apparently he’s a computer science guy by training…!

Why Do Scientists and Aaron Swartz Care So Much About Open Access (III. Solutions?)

19 Feb

This is part III of my 3 part series about Open Access. Read Part I. Read Part II.

With increasing digitization of research publications and improved ability to share information online, we finally have an opportunity to address the problems. With great power comes great responsibility! (wait, I heard of that somewhere before 😛 ). Here I will try to outline the two main approaches, provide basic information for the latest debate, and discuss how the Open Access (OA) movement is more than just making papers public.

The two approaches you might have heard about in the news are the Green OA, and the Gold OA.

With the Green OA, a scientist would submit a paper to an online repository that is open to the public. The paper can still be submitted to a journal for peer review and publication: According to the SHERPA-RoMEO  database developed by the University of Nottingham, 68% of the 1196 journals in its database accept having preprints (pre-reviewed) and/or postprints (post-reviewed) submitted to online repositories. Even though some of the articles in repositories might not have been peer-reviewed, typically there is some process in place to ensure the authors are credible and the papers do not contain significant flaws. Authors do not need to pay for submissions to repositories, and articles are typically available online in a few days, compared to the months typically required for a paper to be reviewed and then published by a publisher.

An example of Green OA repositories is the (for Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics). ArXiv works on a moderator system, in which submissions were moderated by selected scientists approved by a committee. While there is no peer review, ArXiv has become a critical source of information particularly because how quickly scientists can exchange information online – at the time this is published, ArXiv contains 822,196 publicly accessible papers. Lately, Figshare presents itself as an option (or alternative) for those in fields not covered by ArXiv. Financially, these repositories do not generate revenues through the publication process. ArXiv sustains its operation by asking for pledges from 200 institutions with the heaviest usage (at $1500-3000 per institution), and by receiving grants from non-profit organizations (such as the Simons Foundation). Figshare is currently backed by Digital Science, a branch of the Macmillan Publishers Limited (although according to the news release Figshare will maintain its autonomy). The key to successful Green OA is likely to have a handful of key central repositories that everyone submits to, because this will make searching for papers a lot easier, and that might by why ArXiv has been so successful.

With the Gold OA, a scientist would submit a paper to an open access journal, and once the paper is accepted the journal will ask the scientist for an article processing charge (APC, usually ranges from $1 to ~$3000 USD, depending on the journal. UC Berkeley has a selected list of OA journals and their APCs on their Library Collections website). The article processing charge will then be used to support the logistics of making the paper open access, as well as to maintain the peer review system. This money will come from the authors’ own research grants, specific funds set up by their institutions for supporting open access publications, or occasionally personal moneys for smaller article processing charges (but some journals will waive the charge under specific circumstances). Here are a few examples of Gold OA journals: Public Library of Science (PLoS) (online only), BioMed Central (online only), and Sage Open (part of Sage, which publishes journals traditionally as well).

Gold OA journal examples

Gold OA journal examples

The Gold OA model is the one recently adopted by the UK government based on the famous (or infamous?) Finch Report (executive summary). If you are interested, Bo-Christer Björk and David Solomon published many articles analyzing APCs.

Regardless of which open access model it is, SPARC put together an excellent page on different income models that can be adopted for Open Access.

When it comes to Green vs. Gold, there are discussions and debates about which model is better for the scientific community (more specifically, which model institutions or governments should mandate their researchers to follow). Steven Harnad is a major proponent for the green model, while Stuart Shieber supports the adoption of the gold model (although, as Steven clarified in his comment, both of them favour Green Open Access). You can read about the cases they made: Steven Harnad’s “The Argument Against (Premature) Gold OA Support“, Stuart Shieber’s “The argument for gold OA support“. Also check out a good coverage by Times Higher Education.

But is open access simply about making the papers public? For me personally, it is more than that. For the Open Access movement to be really successful, we all need to start changing how we think of scientific prestige. For the past few decades, much of a scientist’s worth has been evaluated through the number of papers published, and through the journals the papers are published in. Each journal, as it stands now, has a number called the impact factor associated with it. Publications in journals with lower impact factors (usually newer, smaller, less popular journals) usually weigh much less than publications in those with higher impact factors, no matter how rigorous the science is (Postscript 1). For a scientist, this can affect anything from jobs, research funding, to awards or fellowships. The reason that some commercial publishers were able to charge high subscription fees without anyone complaining until now is simply because the money was not just to support the publication process, but also to buy and sell the scientific prestige that these journals represent. Is this going to change with OA? According to an analysis on the APCs of Gold OA and Hybrid OA journals by Theo Andrew, Open Scholarship Development Officer at the University of Edinburgh, there is a positive correlation between the APC and a journal’s impact factor (the higher the impact factors, the higher the APCs). While there are indeed costs associated with the publication process, what’s concerning is that there is no transparency to how the APCs are set, resulting in a huge variety in the numbers.

This move toward Open Access will take a while, because it is not just about changing the access model, but also about changing how we see science. If we continue to think that scientists can be evaluated exclusively by numbers and not by the science they do or by the people they inspire, and do not ask for any transparency in the scientific publishing systems, then the financial unfairness to scientists, as I described in Part II of my posts, will  likely happen again (granted, at least we will have open access…), and the OA movement won’t really achieve as much as what it sets out to achieve (Postscript 2).

Postscript 1. If you are interested in reading more about the effects of impact factors, here are a few great articles:

Postscript 2. I originally plan to include a few actions we can take to support to Open Access movement, and my personal take in the Green vs. Gold debate, but this post was getting too long. I will leave my call to actions in another post, and I don’t mind elaborating my point of view  if anyone asks specific questions in the comments section below 🙂

AAAS Meeting Is Happening Right Now! (where scientists meet to talk about, well, science)

15 Feb
Family Science Days 2012

Family Science Days in Vancouver last year (2012) during AAAS annual meeting

Just want to use this opportunity to let everyone know that the American Association for the Advancement of Science (AAAS) meeting is happening in Boston right now (Feb 14-18). AAAS is the world’s largest general scientific society and its annual meeting covers topics in all fields of science – including science education, outreach, and communication. This is also the time when major scientific stories break (because everyone waits for the AAAS meeting to make announcements). AAAS was in Vancouver last year and I had so much fun being a part of it. If you are in Boston, you can check out the AAAS exhibition hall for free during the Family Science Days on Feb 16-17. The Family Science Days are great fun for the whole family, so definitely make sure you drop by if you have kids.

If you are like me, who unfortunately cannot make it to the meeting in person 😦 , there are lots of things happening online as well. For example, there will be a Google + Hangout about the Future of Physics happening in 60 minutes. You can also read about the meeting and the breaking science news about the meeting on EurekAlert! (the AAAS online news service) or from the Science Magazine “news from the meeting” page.  To follow the meeting online – their handle is @AAASMeetings, hashtag #AAASmtg.

%d bloggers like this: