Category Archives: Shotgun DNA Mapping

Navigating Larry’s notebook and finding yeast genome

The link to Larry’s notebook (pertaining to SDM) is here (this is on a private site, sorry!). His notebook is terribly organized, so going through this takes some time. He does have a public notebook, but that is mostly about microtubule tracking software and kinesin motility simulations.

screenshots of larry's notebook displayed as a calendar.

For your benefits here is a picture of his private notebook. On the right is the 2008 notebook where the information I’m looking for resides. The left is from 2009 which contains information about upgrades and side projects to SDM. For instance the first entry of the year discusses some preliminary research regarding inversions for an application into SDM known as alternative splicing. The last entry of the year (because Larry went public) talks about some upgrades to the original simulation software, basically incorporating some more advanced mathematics into the unzipping energies.

From what I remember about 2008, Larry started designing the software we needed for SDM around July/August so I’ll start there. The paper was submitted to Nature Preceedings January 2009 to give you some scope.

Ok looking through starting Aug 1, I found this (August 6, 2008). This marks the first appearance of anything related to SDM. For those with no access here:

On that page there is a link to yeastgenome.org and Larry talks about how he can’t navigate the site and has no clue what he is doing. I’m guessing things have changed considerably in 3.5 years as I just went there and easily found a list of complete genome sequences for 28 strains of S. cerevisiae. Hoepfully as I look through Larry’s notes I’ll find exactly which strain he downloaded, otherwise I’ll just have to figure something out. Now it’s time to look through his notes and find important stuff. I’ll be back.

Finding the yeast genome

Today my goal is to acquire some genetic sequences for use in the algorithm. Once we get the software up and running, implementing this stuff should be simple. The goal for Steve today is to get that software up and running.

First I need to find Larry’s private wiki notes about where he got the yeast genome from and how he used it. I’ll be creating new posts (since I don’t think updating this post will be adequate or appropriate, it’s a my brain kinda thing) regarding this task as I progress throughout the day.

And I’ll be starting with Larry’s notes, by making relevant stuff public here. Avante!

Proof of Principle for Shotgun DNA Mapping (Redux)

You might have noticed all the recent activity with lack of context, well it’s Spring Break here at UNM and Steve and I are going into overdrive to finalize some papers that were left in preprint limbo. We have three projects to work on of various priority: (1) Shotgun DNA Mapping (what this article is about), (2) a paper that models Kinesin motility, and (3) a paper about the Repeating Crumley experiment that will be self-published via Google Docs.

So all the stuff that has been appearing in my notebook from both myself and Steve is all about the Shotgun DNA Mapping (SDM) paper. Our goal is to complete the paper, with some extra experiments that show how useful the SDM software is, and complete it by the end of the week (I’m guessing by Sunday because that’s when my parents arrive).

Right now we are refamiliarizing ourselves with the software. The original code and resulting paper were written in September of 2008 (yikes I’ve been in grad school for a while). The code was originally written by Larry Herskowitz and back then he had talents as a programmer, but he had no talents as an organized human being. So getting familiar with his mindset and looking for important programs and files is no easy task. There may be a lot we need to do and that’s what we are trying to figure out. Hopefully tomorrow we’ll be able to move on from this step.

But what code am I talking about? You probably thought I was just an experimentalist. Well you’d be mostly correct, but I have dabbled in programming some.

The code in question is the heart of Shotgun DNA Mapping. The SDM project was a two step experiment:

  1. Generate clones of random yeast genomic DNA sequences and unzip those sequences using our optical tweezers.
  2. Compare the force curves from the unzipped DNA to a library of simulated force vs extension curves. The library is generated from the yeast genome.

The paper that we are working on was a proof of principle for step 2. The results from step 1 would be published once we had working tweezers and unzippable DNA. Unfortunately we couldn’t get unzippable DNA, but maybe this summer I’ll be able to try again. In this paper we discuss how the simulation software works, how we match genetic information, and we present results using some old data Steve had from grad school.

Aside: It just occurred to me that this could be a success of open science if other groups had DNA unzipping data that they shared online, but alas the rest of the world is closed 🙁

Let’s discuss the software briefly so there is some context as to what Steve and I may write here for the rest of the week, and so you can understand the basic premise of the paper.

To get a brief understanding of how the tweezers work and how we are able to unzip DNA check out my intro to SDM here.

Now that you understand all that stuff let’s get into the software:

  1. The first step to SDM is to create a library of unzipping force vs extension curves for the yeast genome. We chose yeast because it’s DNA is bundled as chromatin which could be used for the next level of SDM, Shotgun Chromatin Mapping (SCM), and because our collaborators are expert yeast geneticists.
    1. We downloaded the yeast genome sequence from yeastgenome.org (back in 2008) and did a simulated restriction digest. By this I mean we looked for the XhoI recognition sequence (CTCGAG) in the yeast genome. From each recognition site, we created “fragments” that are 2000bp in length to use for the unzipping simulation.
    2. The unzipping simulation used a very simple equation to calculate the energy contained in a double stranded DNA sequence (dsDNA). The hamiltonian (as is  the term for energy terms) contained the energy of the freely-jointed chain (which is a model that describes a chain of paperclips, google it) and the base-pairing energy (ie A-T/G-C bond energies). That’s it! Remarkably that worked exceptionally well. The reasoning is that when you are unzipping you have two sequences of single-stranded DNA held together by the unzipped DNA (base-paired bonds).
    3. After the energies contained in an unzipping sequence is calculated we needed to extract force information. We did this by solving an integral of x(F’)dF’ numerically, where x(F’) is the extension depending on the freely jointed chain model. I don’t expect that to mean too much right now and I may have incorrectly explained it myself, but this will be made more clear later.
  2. Once we have simulated unzipping data we can begin to match data to the library we just created.
    1. I don’t understand the mathematics behind the matching algorithm all that much myself (right now), but from what I understand the matching uses a normalization that is routed in the difference between a polyA strand and a polyG strand (polyA has forces of ~9pN and polyG has forces of ~19pN, everything else is in between).
    2. Because of the nature of the simulation, at low extensions there is a lot of unreadable data, so we use a window between bases from 1200 and 1700 (we call this number index) to get better analysis. We also chose a 500 base window size above index 1000 because of the unzipping data that we were analyzing.
    3. The algorithm generates a score based on the difference between real and simulated force profiles in the window size stated above. In our system great matches are close to 0 and bad matches are close to 1.
  3. In graduate school, Steve had unzipped a plasmid known as pBR322 (purchasable from NEB). Using the software Larry had simulated the force profile for this sequence and hid the data in the library of yeast genomic simulated data. The matching algorithm managed to match the real unzipping data to the simulated data every time.

Like I said a lot, I don’t expect all of this to make sense, but as we rewrite the paper I’ll add thoughts about the project to the notebook. On top of that I’m sure we’ll have a lot of supplemental information to add to the paper via the notebook (so we can just site it). So while all this is new and confusing (if it isn’t then that’s awesome) be aware that the confusion will subside by the end of the week. We have a lot of exciting things coming in the next few days and you’ll all be the third to know about it (after Steve and I of course).

LabVIEW DNA Unzipping

I am just getting back up to speed with the code Larry wrote in 2009.  This was before he became a bad-ass in LabVIEW, so the code is tough to read.  I think the VI I was looking at yesterday was the wrong VI.  I believe (now) the final VI Larry used for simulation was probably this one (again not yet publicly available):

  • \\Controller\users\herskowitz.larry\My Documents\Sequencing\automated unzipping simulator through yeast genome with ant seq shutting off network.vi

This was last modified on 7/29/2009 and looks like what I remember we may have used.  If only we had open notebooks back then (and Larry took good notes–zing!) it’d be much easier to Google and find the right code.

====

Found some important files here (not public yet):

  • \\controller\pub\Sequences
  • \\Controller\users\herskowitz.larry\My Documents\Analysis of Deviation
  • \\Controller\users\herskowitz.larry\My Documents\Deviation Analysis
  • \\Controller\users\herskowitz.larry\Desktop\Graph in Patent Data
  • \\Controller\users\herskowitz.larry\Desktop\Unformatted chromosome from yeastgenome.org
  • \\Controller\users\herskowitz.larry\My Documents\Downhill Simplex
  • \\Controller\users\herskowitz.larry\My Documents\July backup

Maybe hints at Larry’s nearest neighbor work: \\Controller\users\herskowitz.larry\Desktop\Sim data

 

====

Wow!  I realized that I encouraged Larry to use Visual SourceSafe, and he actually did (for some/most things?).  I found many files in our sourcesafe database–if these end up being the important VIs, I will add all of them to github in a “larry” project, I think.

Snapshot of files in Larry's VSS "Sequencing" folder

===

So, It looks like “failed simulator again.vi” in Larry’s VSS is working properly.  I compared a new simulation of pBR322 unzipping with a simulation from grad school, and they look very similar.  Not exactly the same–possibly due to parameters, not sure yet.

First try at comparing Larry sim to old sim for pBR322 internal unzipping. Data file: Test 2012 pBR_try1.dat

4:35 PM: I opened up the old “Equilibrium model.vi” (from Jantzen) and the default values for GC/AT were 5 and 1.5.  However, these values are clearly wrong with much too high of a force.  At this point, I’m not sure how to figure out the values I used to create the 2002-ish file “pBR322 unzip 2002 calculated.dat”

Spring Break Planning Mindmaps

Spring Break planning map:

Planning for SDM paper mindmap:

2012 Spring Break Catch-up

Today Anthony and I brainstormed and are going to try to accomplish a couple things during Spring Break that are useful (a) for boosting my tenure dossier for Provost Office, (b) boosting Anthony’s CV/dissertation for jobs/graduation purposes (May 2013), and (c) useful for science.  Really anything with regards to (a) is a last ditch effort, but since lack of peer-reviewed publications was by far the biggest criticism of my dossier, it can’t hurt.  Andy and I are submitting a chapter of his paper to PLoS ONE later this week (after he finished uploading the data to FigShare).

We came up with a couple main ideas:

  1. Add to the SDM project and submit the revised preprint (from 3 or more years ago!) to PLoS ONE for peer review.  Anthony is now collecting our scattered links and will post a better summary in his post.  The main ideas are (a) post our shotgun DNA mapping software on github, after cleaning it up (b) implement many of the good suggestions Richard Yeh gave us on the preprint (see Anthony’s post for link to these ideas).  I believe after doing this that it would be appropriate to move Anthony to lead author and would be worth submitting to PLoS ONE for peer review.
  2. Finish the revisions for the kinesin modeling paper
Of those two options, as is obvious from my description, we felt (1) made the most sense for strengthening Anthony’s dissertation/CV.  This is because he spent a ton of time already on the SDM project.
Also, in parallel, we are thinking about how to use my time to strengthen the deuterium-depletion project.  One of these ideas is to modify our existing image analysis software to make it automatically track the lengths of the root growth.  I am optimistic that this can be done with Larry’s microtubule tracking software.  It will eventually lead to publications, but not in this short time frame.
To make everything more useful, I will make github projects as I find software and get it working.  This will be a bit tricky, but overall probably very useful to get our lab out of limbo between visual soucesafe (which nobody is really using except me now) and git, which is far superior–especially for open sharing of our code.  To be even more useful, it would be good to move away from LabVIEW, but we’re way to deeply entrenched for me to try to port stuff now.

Locations of some things I found (not really useful for public since it’s on the local harddrive for now):

  • DNA unzipping simulation code that Anthony cleaned up a bit: LarryXP->C:\SDM Simulation
  • Image analysis software I wrote for Haiqing a while ago: C:\Program Files\National Instruments\LabVIEW 7.1\development_sjkoch_7.1\MT Tracking and analysis\Circles analysis
  • A version of the tracking software that maybe isn’t the latest, but compiles on LarryXP: C:\Program Files\National Instruments\LabVIEW 7.1\MT Tracking\Larry Tracking for Andy 2009_fuckINI.vi  I remember revising this for Andy after Larry left so that it would ignore some issues we have with the INI files.
  • There is also a sub-VI that I wrote that would find the lengths of the tracked microtubules during analysis (I think).  This is probably related to the subVI: “length finding sub.vi”
Screen shot of microtubule tracking program attempting to segment a root hair image

 

Finishing Shotgun DNA Mapping

This isn’t going to make one iota of sense to anyone, so bear with me and I’ll clarify later.

Here are some useful links.

Introducing Shotgun DNA Mapping

Before I got immersed in the world of deuterium-less water, I was a rabid DNA-man. The project was Shotgun DNA Mapping which was a term we invented to describe a quick protocol for mapping a DNA sequence.

In theory, the technique was awesome: unzip a DNA sequence with optical tweezers and compare the data to a library of simulated data for a given genome to figure out where you are in the genome. This would lead to a bigger project called Shotgun Chromatin Mapping which was the similar except you could map protein locations on DNA fragments using the same technique.

For this to work, you need three components:

  1. Optical Tweezers – to unzip DNA and record data
  2. DNA – to unzip
  3. A computer simulation – to simulate DNA unzipping and match recorded data to simulated data

I dedicated a few blog posts in my other blog to discussing the basic principles of the project in case you need to get caught up to speed. But in case you don’t have time for all that, here is the whirlwind summary:

Optical tweezers are an optical system that requires a laser, a microscope objective, a condensor, some steering components, and a detector (in our case a quadrant photo diode) among other things. The laser is focused by the objective and this focus can exert forces on tiny dielectric particles. Our particles are microspheres. (The blog posts linked explain the physics of this in great detail.)

Using some principles of biochemistry I can attach a microsphere to a DNA fragment that is specially designed to: (1) tether to a glass slide using antibody-antigen interactions, (2) contain a weak point in the DNA backbone to begin unzipping, and (3) be versatile enough to use a variety of different DNA sequences.

I can then tether the DNA to slides and place them in the path of the laser. The focus will attract the beads, and if the tethering process works properly then the beads will be attached to DNA. This is how we are able to exert forces on the DNA. Our detector is used to track the laser movement, and those signals get converted into force data. The forces recorded are on the order of pN, which is insanely small but enough to distinguish from background noise.

Once we have unzipping data we can use a computer program to compare this information to a library of simulated unzipping data for a genome. In our proof of principle study we used the yeast genome, so we simulated unzipping fragments for the entire genome and then used actual yeast genomic DNA to unzip.

Unfortunately I hit an impassable road block in the experiment. The DNA I created wouldn’t unzip. I tried everything I could think of, reworked the entire process and tried to come up with alternate methods for creating the DNA fragments. Ultimately I had to switch to the project I’m working on now…

…But that doesn’t mean that the project was a complete failure. I’m sure the protocols and techniques I employed can be useful to someone, somewhere, someday and so I’m going to highlight posts from my old notebook here as a way to kind of direct attention to the protocols that summarize my project well and were most useful for me.

In this way, one wouldn’t need to sift through mounds of information just to find one thing. And it would provide visitors here a little more information about my background and something I keep alluding to. All in the name of open science!

Mindmapping Proposed Projects (Updated)

Mindmeister and WordPress apparently do not work well together. I spent a good hour trying to figure out how iframes work in WordPress and it turns out they don’t so I had to install a plugin. I used Embed Iframe and it works, but it turns out that (well at least for me) Mindmeister wants to stick a big panel in front of the mindmap blocking at least 50% of the map. Booo! There is a mindmeister plugin, but I don’t know if I want to play with it right now. Maybe later, and then I’ll tell you all about it.

Back on track, if the above mindmap embed doesn’t work for you then the link is here to see it in full page glory.

Update: I found out that there was a plugin for Mindmeister and replaced the previous iframe coding with the new plugin. The process of discovery can be found here.