# Shotgun DNA Mapping: The DNA Anchor

In order to unzip DNA, I need to create three pieces of DNA that I will then attach to each other through a ligation reaction. The first piece that I will discuss is the anchor DNA.

The anchor DNA is a very versatile piece of double stranded DNA (dsDNA). From this singular piece, we can choose to unzip DNA or stretch it because of a special sequence contained in the DNA near one end. I’ll get into this a little bit later. But first a couple of questions:

1. Why is it called anchor DNA? The reason is because we use this piece of DNA to attach our entire structure to a glass surface. This is the point that anchors our DNA while we pull on it for either stretching or unzipping experiments. One of the bases is designed with a digoxigenin molecule attached to it and that base is placed right at the start of the sequence. In our tethering experiments, we coat our glass with an antibody for digoxigenin (dig for short), cleverly named anti-dig, and chemistry causes the anti-dig to bind with dig. You can understand a lot about antigen-antibody interactions here.
2. How can we decide between stretching and unzipping? Because of how we designed the anchor DNA, we can stretch the anchor segment by default. That means once I produce anchor DNA I can tether it and begin stretching experiments. If we want to unzip DNA, then I take the anchor DNA and cut the end off (the side opposite the dig molecule) in a digestion reaction (more on this another time). That reaction gives me a small overhang (when one side of the DNA is longer than the other). From there I can perform a series of reactions that create the DNA sequence necessary to perform unzipping experiments. Notice that the anchor end is left unchanged, and that is what enables us to perform both stretching and unzipping experiments from this one piece of DNA.

Now the third question is, How do you make the anchor sequence? For this we need to know several sequences, possibly perform some cloning, and perform a reaction known as polymerase chain reaction, or PCR.

I’m not going to go into the details of what PCR is and how it works (google searching will reveal a lot more useful information than what I’d be willing to put here), but what I will say is that PCR allows me to make millions/billions of copies of a sequence of DNA starting with just a few strands of the original sequence and some short pieces of DNA called primers.

Our original sequence comes from plasmids. For the anchor sequence I have two possible starting points: pRL574 is a plasmid that dates back to Koch’s graduate days, and about a year and a half ago I created a brand new plasmid called pALS. Both plasmids are viable options, but serve slightly difference purposes:

• pRL574 – for this plasmid we have several different sets of primers that allow us to make anchors of different lengths ~1.1kb and ~4.4kb. The 4.4kb sequence we use primarily for stretching experiments, while the 1.1kb sequence is used in unzipping experiments.
• pALS – this plasmid only produces one length which is about 4kb. But this plasmid allows us to both unzip and stretch as I described above. It also has a couple of very unique features. First, if I cut it in the right spot, I can ligate the plasmid to itself through a special adapter sequence (to be described later). Second, it contains a sequence that is recognize by nucleosomes, that we could use for more complicated experiments down the road.

So as you can tell, I have some options available to me. Normally I would just pick one plasmid to work with, but I want to work with both and figure out which may be the more viable option down the road. In my next post, I’ll link to and list the sequences needed to make the anchor construct, with some explanations as to what everything is.

# The Library of Congress: Science at Risk meeting

Back at the end of March, I was invited to a special meeting hosted by the Library of Congress. The meeting is entitled “Science at Risk: Toward a National Strategy for Preserving Online Science.” The meeting commences next week on June 26 and 27, and I am excited!

There are little details that I know about the event, but here is some info that I was forwarded:

Scholarly discourse, including interaction between scientists and the public, is rapidly changing and the ephemeral nature of this discussion on the web leaves it at substantial risk of being lost. Science blogs, the work of citizen scientists, and novel online publications like video journals are becoming the primary sources for understanding science in our times. These resources are almost exclusively online and increasingly at risk. The goal of this meeting is to begin identifying content that is valuable and at risk and to articulate next steps to ensure that this content is not lost.

In the face of this challenge, the Library of Congress, with generous support from the Alfred P. Sloan Foundation, wishes to join with other organizations to develop a national strategy for collecting and preserving science and science discourse which exists only in digital form on the open web.  The Library would welcome your participation in the invitation-only workshop “Science at Risk: Toward a National Strategy for Preserving Online Science” to be held over one and a half days on June 26-27, 2012.  The event will bring together scientists and representatives from online science projects, archivists, and the historians and other scholars who will increasingly depend upon the historical record.

From what I’ve seen the guest list is full of impressive people, and for me to be included among them is a huge honor.

I’m going to be speaking on behalf of open notebook science and scientists. I’ll present various open notebooks, including my own, and current methods for successful and useful open notebook science. As the week goes on I’ll have more details about this and I’ll be notebooking my preparations. Next week I’ll do my best to document the meeting.

If you are an open notebook scientist and would like your notebook mentioned/featured then feel free to contact me on twitter (@thescienceofant), email (anthonysalvagnoatgmaildotcom), in the comments below, regular mail, on facebook, or smoke signals (but please be in the ABQ area otherwise I may not see it).

# Proof of Principle for Shotgun DNA Mapping (Redux)

You might have noticed all the recent activity with lack of context, well it’s Spring Break here at UNM and Steve and I are going into overdrive to finalize some papers that were left in preprint limbo. We have three projects to work on of various priority: (1) Shotgun DNA Mapping (what this article is about), (2) a paper that models Kinesin motility, and (3) a paper about the Repeating Crumley experiment that will be self-published via Google Docs.

So all the stuff that has been appearing in my notebook from both myself and Steve is all about the Shotgun DNA Mapping (SDM) paper. Our goal is to complete the paper, with some extra experiments that show how useful the SDM software is, and complete it by the end of the week (I’m guessing by Sunday because that’s when my parents arrive).

Right now we are refamiliarizing ourselves with the software. The original code and resulting paper were written in September of 2008 (yikes I’ve been in grad school for a while). The code was originally written by Larry Herskowitz and back then he had talents as a programmer, but he had no talents as an organized human being. So getting familiar with his mindset and looking for important programs and files is no easy task. There may be a lot we need to do and that’s what we are trying to figure out. Hopefully tomorrow we’ll be able to move on from this step.

But what code am I talking about? You probably thought I was just an experimentalist. Well you’d be mostly correct, but I have dabbled in programming some.

The code in question is the heart of Shotgun DNA Mapping. The SDM project was a two step experiment:

1. Generate clones of random yeast genomic DNA sequences and unzip those sequences using our optical tweezers.
2. Compare the force curves from the unzipped DNA to a library of simulated force vs extension curves. The library is generated from the yeast genome.

The paper that we are working on was a proof of principle for step 2. The results from step 1 would be published once we had working tweezers and unzippable DNA. Unfortunately we couldn’t get unzippable DNA, but maybe this summer I’ll be able to try again. In this paper we discuss how the simulation software works, how we match genetic information, and we present results using some old data Steve had from grad school.

Aside: It just occurred to me that this could be a success of open science if other groups had DNA unzipping data that they shared online, but alas the rest of the world is closed

Let’s discuss the software briefly so there is some context as to what Steve and I may write here for the rest of the week, and so you can understand the basic premise of the paper.

To get a brief understanding of how the tweezers work and how we are able to unzip DNA check out my intro to SDM here.

Now that you understand all that stuff let’s get into the software:

1. The first step to SDM is to create a library of unzipping force vs extension curves for the yeast genome. We chose yeast because it’s DNA is bundled as chromatin which could be used for the next level of SDM, Shotgun Chromatin Mapping (SCM), and because our collaborators are expert yeast geneticists.
1. We downloaded the yeast genome sequence from yeastgenome.org (back in 2008) and did a simulated restriction digest. By this I mean we looked for the XhoI recognition sequence (CTCGAG) in the yeast genome. From each recognition site, we created “fragments” that are 2000bp in length to use for the unzipping simulation.
2. The unzipping simulation used a very simple equation to calculate the energy contained in a double stranded DNA sequence (dsDNA). The hamiltonian (as is  the term for energy terms) contained the energy of the freely-jointed chain (which is a model that describes a chain of paperclips, google it) and the base-pairing energy (ie A-T/G-C bond energies). That’s it! Remarkably that worked exceptionally well. The reasoning is that when you are unzipping you have two sequences of single-stranded DNA held together by the unzipped DNA (base-paired bonds).
3. After the energies contained in an unzipping sequence is calculated we needed to extract force information. We did this by solving an integral of x(F’)dF’ numerically, where x(F’) is the extension depending on the freely jointed chain model. I don’t expect that to mean too much right now and I may have incorrectly explained it myself, but this will be made more clear later.
2. Once we have simulated unzipping data we can begin to match data to the library we just created.
1. I don’t understand the mathematics behind the matching algorithm all that much myself (right now), but from what I understand the matching uses a normalization that is routed in the difference between a polyA strand and a polyG strand (polyA has forces of ~9pN and polyG has forces of ~19pN, everything else is in between).
2. Because of the nature of the simulation, at low extensions there is a lot of unreadable data, so we use a window between bases from 1200 and 1700 (we call this number index) to get better analysis. We also chose a 500 base window size above index 1000 because of the unzipping data that we were analyzing.
3. The algorithm generates a score based on the difference between real and simulated force profiles in the window size stated above. In our system great matches are close to 0 and bad matches are close to 1.
3. In graduate school, Steve had unzipped a plasmid known as pBR322 (purchasable from NEB). Using the software Larry had simulated the force profile for this sequence and hid the data in the library of yeast genomic simulated data. The matching algorithm managed to match the real unzipping data to the simulated data every time.

Like I said a lot, I don’t expect all of this to make sense, but as we rewrite the paper I’ll add thoughts about the project to the notebook. On top of that I’m sure we’ll have a lot of supplemental information to add to the paper via the notebook (so we can just site it). So while all this is new and confusing (if it isn’t then that’s awesome) be aware that the confusion will subside by the end of the week. We have a lot of exciting things coming in the next few days and you’ll all be the third to know about it (after Steve and I of course).

# Introducing Shotgun DNA Mapping

Before I got immersed in the world of deuterium-less water, I was a rabid DNA-man. The project was Shotgun DNA Mapping which was a term we invented to describe a quick protocol for mapping a DNA sequence.

In theory, the technique was awesome: unzip a DNA sequence with optical tweezers and compare the data to a library of simulated data for a given genome to figure out where you are in the genome. This would lead to a bigger project called Shotgun Chromatin Mapping which was the similar except you could map protein locations on DNA fragments using the same technique.

For this to work, you need three components:

1. Optical Tweezers – to unzip DNA and record data
2. DNA – to unzip
3. A computer simulation – to simulate DNA unzipping and match recorded data to simulated data

I dedicated a few blog posts in my other blog to discussing the basic principles of the project in case you need to get caught up to speed. But in case you don’t have time for all that, here is the whirlwind summary:

Optical tweezers are an optical system that requires a laser, a microscope objective, a condensor, some steering components, and a detector (in our case a quadrant photo diode) among other things. The laser is focused by the objective and this focus can exert forces on tiny dielectric particles. Our particles are microspheres. (The blog posts linked explain the physics of this in great detail.)

Using some principles of biochemistry I can attach a microsphere to a DNA fragment that is specially designed to: (1) tether to a glass slide using antibody-antigen interactions, (2) contain a weak point in the DNA backbone to begin unzipping, and (3) be versatile enough to use a variety of different DNA sequences.

I can then tether the DNA to slides and place them in the path of the laser. The focus will attract the beads, and if the tethering process works properly then the beads will be attached to DNA. This is how we are able to exert forces on the DNA. Our detector is used to track the laser movement, and those signals get converted into force data. The forces recorded are on the order of pN, which is insanely small but enough to distinguish from background noise.

Once we have unzipping data we can use a computer program to compare this information to a library of simulated unzipping data for a genome. In our proof of principle study we used the yeast genome, so we simulated unzipping fragments for the entire genome and then used actual yeast genomic DNA to unzip.

Unfortunately I hit an impassable road block in the experiment. The DNA I created wouldn’t unzip. I tried everything I could think of, reworked the entire process and tried to come up with alternate methods for creating the DNA fragments. Ultimately I had to switch to the project I’m working on now…

…But that doesn’t mean that the project was a complete failure. I’m sure the protocols and techniques I employed can be useful to someone, somewhere, someday and so I’m going to highlight posts from my old notebook here as a way to kind of direct attention to the protocols that summarize my project well and were most useful for me.

In this way, one wouldn’t need to sift through mounds of information just to find one thing. And it would provide visitors here a little more information about my background and something I keep alluding to. All in the name of open science!

# Kinesin Stability in D2O

A main focus of the lab is to examine the effects of D2O on the kinesin motor protein. Andy Maloney (now Dr. Maloney) spent quite a bit of time perfecting an experiment known as the gliding motility assay which allows an experimenter to study kinesin processivity by analyzing microtubule movement. Microtubules are relatively long protein filaments that kinesin has the ability to “walk” on which it does to carry cargo to various locations of a cell.

Below is a video of the gliding motility assay. Since kinesin molecules are too small to be resolved we actually depend on seeing the microtubules (which are in turn only visualized because of fluorescent dye proteins. What we are seeing below is that the microtubules (squiggles) are being propelled by the kinesin.

The reason that I mention this is because Andy used these experiments to determine if D2O affected kinesin movement at all. Initial results looked promising because it appeared kinesin would push the microtubules slower than identical experiments in regular DI water. It turns out that the decrease in speed could be related to the increased viscosity of deuterium oxide.

Despite this, Dr. Koch hypothesizes (based on introductory reports by Gilbert Lewis, a very popular name on this blog, and others regarding life in general) that D2O can stabalize kinesin. What do I mean by this? Well proteins in general don’t retain their shapes forever. There is some probability that a protein can denature (unfold) and this is expedited by certain cellular conditions (temperature, time, function, pH, charge, etc.). So it is believed that the inclusion of D2O can affect these conditions to ensure the “survival” of the protein. Currently kinesin suspended in buffer has a relatively short shelf life (but I’m not familiar with the lifetime). This would be very useful for use in the lab when chemicals, proteins, etc are stored for long term use.

We have proposed a couple of experiments that may help determine if deuterium oxide does indeed affect the storage life of kinesin (and perhaps other proteins/enzymes). The first is to detect aggregation which happens when proteins unfold and stick together to form large clumps of amino acids. The second experiment would involve detecting decreasing kinesin activity over time possibly through ATP hydrolysis (ATP turning into ADP+P).

Over the next few weeks (months?) I will be exploring these avenues. I have a slight head start on experiment 1 because an REU student named Kenji Doering spent the summer in our lab and explored the possibility of D2O affects on stability with ovalbumin which is a protein from egg whites. This will of course be the topic of my next post (or two) and I will be posting some rather interesting data from those experiments and some others.

# Mindmapping Proposed Projects (Updated)

Mindmeister and WordPress apparently do not work well together. I spent a good hour trying to figure out how iframes work in WordPress and it turns out they don’t so I had to install a plugin. I used Embed Iframe and it works, but it turns out that (well at least for me) Mindmeister wants to stick a big panel in front of the mindmap blocking at least 50% of the map. Booo! There is a mindmeister plugin, but I don’t know if I want to play with it right now. Maybe later, and then I’ll tell you all about it.

Back on track, if the above mindmap embed doesn’t work for you then the link is here to see it in full page glory.

Update: I found out that there was a plugin for Mindmeister and replaced the previous iframe coding with the new plugin. The process of discovery can be found here.

# Effect of Deuterium Depeleted Water on Life

Hydrogen has several isotopes and one of them, deuterium, exists quite naturally in water to form $D_{2}O$. In previous experiments and several papers by Gilbert Lewis, it has been found that life is hindered in the presence of $D_{2}O$. While this may be true, my PI Steve Koch wondered if life had found a use for it because naturally occurring water has about a 17mM (millimolar) concentration of deuterium.

To put that number into perspective, when I do a typical polymerase chain reaction of DNA I add 10mM of each base of DNA (which is less than the amount of naturally occurring deuterium) to create millions of copies of a DNA template from an amount that is 1000x less then what the reaction yields. In fact most chemicals in most of my buffers on the order of the amount of naturally occurring deuterium.

So you can see it isn’t a stretch to think that nature has found a use for $D_{2}O$ since it is quite abundant and life has been constantly evolving for billions of years. I want to test this hypothesis in a variety of different organisms:

1. Tobacco Seeds – to act as a foil to Lewis’ experiments in which he grew tobacco seeds in pure $D_{2}O$.
2. Mustard Seeds – from what I’m told mustard seeds are the powerhouse of the botanical genetics world much like Drosophila and S. cerevisiae are in their respective genetic fields.
3. Escherichia coli – another molecular biological powerhouse that is very easy to grow and may be easy to see results with. We just got the facilities to be able to grow E. coli and damn it I want to use them!
4. Saccharomyces cerevisiae (Yeast) – I know a guy who grows yeast for his experiments and I’m sure it wouldn’t be a stretch to get him to do so in deuterium depeleted water.

So the idea would be to try to grow these in regular water and in deuterium depleted water (no $D_{2}O$), and in the case of E. coli and yeast, perhaps in pure $D_{2}O$ because I don’t think those experiments have been carried out yet. Hopefully I will be able to conclusively state whether or not life has developed a need/use for $D_{2}O$ which would be a very interesting discovery indeed!