All posts by stevekochscience

Ouch

Well, I crashed and burned big time last week and let Anthony down.  My apologies, Ugh.  I still think we can revise and submit the shotgun DNA mapping paper, but the timing is not as urgent.  Thus, I am more tempted to port the code to something useful for the public.  Python (with it’s bio library) is what I’m thinking.  Mulling this over.  In the meantime, after many attempts, I finally got the github project going:

https://github.com/stevekochscience/Shotgun-DNA-Mapping–Yeast

It took forever, I think because it’s a huge library.  But it should include most of the sequences and simulations that Larry used for the first draft of the paper.

===

I determined that the whitespace bug is indeed a problem in Larry’s code.  At least the code I’ve found.  In the VI, “initizing sub.vi” it definitely finds zero energies for the \n\r whitespaces.  As far as I can tell, this is not accounted for in the subsequent code.  I will test by using two different sequences and simulating with and without whitespace.  I don’t think this is (was) a problem, though, because the sequences Larry used didn’t have whitespace, I don’t think (at least after he found the XhoI sites)

-> Verified that the whitespace bug exists in “failed simulator again.vi”  to reiterate, though, this probably didn’t affect simulations, since the xhoi sites didn’t have whitespace after he found the sites

data files: Test 2012 pBR_2.dat and Test 2012 pBR_2no.dat (should be in github soon)

SDM Genome text analysis, matching, still looking through Larry’s stuff

10:45 AM: Found some more stuff in \\Controller\users\herskowitz.larry\My Documents

  • This looks like stuff he used for analysis (matching) after simulation messing around has data in it.vi
  • This has promise for being the program that finds the XhoI sequences: new formatting program.vi

Also, some locations of data files

  • \\controller\pub\,dropzone\August 28 pCP681 files for Larry

Some simulation files

  • \\controller\pub\8 and 8b Force Measurements

======

Figured out something that was confusing me: The sequences around the recognition site for both upstream and downstream are stored in the same files in this directory: \\controller\pub\Sequences\Yeast Genome.  So, I think pretty confidently, these are where most of the sequences Larry simulated are.  A remaining question (that I will have to answer via looking at software (probably “new formatting program.vi”)) is whether he threw out sites that were too close to other sites.

-> Note: Github says as long as I keep each public repository under 1 GiB I should be OK on diskspace

====

5:45 PM Still working on github, not sure if I’ll have time to upload tonight.  Git was being super-slow on LarryXP (maybe because I was using a networked folder).  Now copying to my laptop to upload later at home.  About half done in about an hour

Also, There is an important possible bug in Larry software (not sure yet, but definitely a misconception that I had).  He was using a Richard sub-VI called “Trim whitespace” (see picture below) that only removes whitespace from beginning and end of string.  I thought it worked on the whole string.

LabVIEW DNA Unzipping

I am just getting back up to speed with the code Larry wrote in 2009.  This was before he became a bad-ass in LabVIEW, so the code is tough to read.  I think the VI I was looking at yesterday was the wrong VI.  I believe (now) the final VI Larry used for simulation was probably this one (again not yet publicly available):

  • \\Controller\users\herskowitz.larry\My Documents\Sequencing\automated unzipping simulator through yeast genome with ant seq shutting off network.vi

This was last modified on 7/29/2009 and looks like what I remember we may have used.  If only we had open notebooks back then (and Larry took good notes–zing!) it’d be much easier to Google and find the right code.

====

Found some important files here (not public yet):

  • \\controller\pub\Sequences
  • \\Controller\users\herskowitz.larry\My Documents\Analysis of Deviation
  • \\Controller\users\herskowitz.larry\My Documents\Deviation Analysis
  • \\Controller\users\herskowitz.larry\Desktop\Graph in Patent Data
  • \\Controller\users\herskowitz.larry\Desktop\Unformatted chromosome from yeastgenome.org
  • \\Controller\users\herskowitz.larry\My Documents\Downhill Simplex
  • \\Controller\users\herskowitz.larry\My Documents\July backup

Maybe hints at Larry’s nearest neighbor work: \\Controller\users\herskowitz.larry\Desktop\Sim data

 

====

Wow!  I realized that I encouraged Larry to use Visual SourceSafe, and he actually did (for some/most things?).  I found many files in our sourcesafe database–if these end up being the important VIs, I will add all of them to github in a “larry” project, I think.

Snapshot of files in Larry's VSS "Sequencing" folder

===

So, It looks like “failed simulator again.vi” in Larry’s VSS is working properly.  I compared a new simulation of pBR322 unzipping with a simulation from grad school, and they look very similar.  Not exactly the same–possibly due to parameters, not sure yet.

First try at comparing Larry sim to old sim for pBR322 internal unzipping. Data file: Test 2012 pBR_try1.dat

4:35 PM: I opened up the old “Equilibrium model.vi” (from Jantzen) and the default values for GC/AT were 5 and 1.5.  However, these values are clearly wrong with much too high of a force.  At this point, I’m not sure how to figure out the values I used to create the 2002-ish file “pBR322 unzip 2002 calculated.dat”

2012 Spring Break Catch-up

Today Anthony and I brainstormed and are going to try to accomplish a couple things during Spring Break that are useful (a) for boosting my tenure dossier for Provost Office, (b) boosting Anthony’s CV/dissertation for jobs/graduation purposes (May 2013), and (c) useful for science.  Really anything with regards to (a) is a last ditch effort, but since lack of peer-reviewed publications was by far the biggest criticism of my dossier, it can’t hurt.  Andy and I are submitting a chapter of his paper to PLoS ONE later this week (after he finished uploading the data to FigShare).

We came up with a couple main ideas:

  1. Add to the SDM project and submit the revised preprint (from 3 or more years ago!) to PLoS ONE for peer review.  Anthony is now collecting our scattered links and will post a better summary in his post.  The main ideas are (a) post our shotgun DNA mapping software on github, after cleaning it up (b) implement many of the good suggestions Richard Yeh gave us on the preprint (see Anthony’s post for link to these ideas).  I believe after doing this that it would be appropriate to move Anthony to lead author and would be worth submitting to PLoS ONE for peer review.
  2. Finish the revisions for the kinesin modeling paper
Of those two options, as is obvious from my description, we felt (1) made the most sense for strengthening Anthony’s dissertation/CV.  This is because he spent a ton of time already on the SDM project.
Also, in parallel, we are thinking about how to use my time to strengthen the deuterium-depletion project.  One of these ideas is to modify our existing image analysis software to make it automatically track the lengths of the root growth.  I am optimistic that this can be done with Larry’s microtubule tracking software.  It will eventually lead to publications, but not in this short time frame.
To make everything more useful, I will make github projects as I find software and get it working.  This will be a bit tricky, but overall probably very useful to get our lab out of limbo between visual soucesafe (which nobody is really using except me now) and git, which is far superior–especially for open sharing of our code.  To be even more useful, it would be good to move away from LabVIEW, but we’re way to deeply entrenched for me to try to port stuff now.

Locations of some things I found (not really useful for public since it’s on the local harddrive for now):

  • DNA unzipping simulation code that Anthony cleaned up a bit: LarryXP->C:\SDM Simulation
  • Image analysis software I wrote for Haiqing a while ago: C:\Program Files\National Instruments\LabVIEW 7.1\development_sjkoch_7.1\MT Tracking and analysis\Circles analysis
  • A version of the tracking software that maybe isn’t the latest, but compiles on LarryXP: C:\Program Files\National Instruments\LabVIEW 7.1\MT Tracking\Larry Tracking for Andy 2009_fuckINI.vi  I remember revising this for Andy after Larry left so that it would ignore some issues we have with the INI files.
  • There is also a sub-VI that I wrote that would find the lengths of the tracked microtubules during analysis (I think).  This is probably related to the subVI: “length finding sub.vi”
Screen shot of microtubule tracking program attempting to segment a root hair image