Category Archives: Software

Ouch

Well, I crashed and burned big time last week and let Anthony down.  My apologies, Ugh.  I still think we can revise and submit the shotgun DNA mapping paper, but the timing is not as urgent.  Thus, I am more tempted to port the code to something useful for the public.  Python (with it’s bio library) is what I’m thinking.  Mulling this over.  In the meantime, after many attempts, I finally got the github project going:

https://github.com/stevekochscience/Shotgun-DNA-Mapping–Yeast

It took forever, I think because it’s a huge library.  But it should include most of the sequences and simulations that Larry used for the first draft of the paper.

===

I determined that the whitespace bug is indeed a problem in Larry’s code.  At least the code I’ve found.  In the VI, “initizing sub.vi” it definitely finds zero energies for the \n\r whitespaces.  As far as I can tell, this is not accounted for in the subsequent code.  I will test by using two different sequences and simulating with and without whitespace.  I don’t think this is (was) a problem, though, because the sequences Larry used didn’t have whitespace, I don’t think (at least after he found the XhoI sites)

-> Verified that the whitespace bug exists in “failed simulator again.vi”  to reiterate, though, this probably didn’t affect simulations, since the xhoi sites didn’t have whitespace after he found the sites

data files: Test 2012 pBR_2.dat and Test 2012 pBR_2no.dat (should be in github soon)

SDM Genome text analysis, matching, still looking through Larry’s stuff

10:45 AM: Found some more stuff in \\Controller\users\herskowitz.larry\My Documents

  • This looks like stuff he used for analysis (matching) after simulation messing around has data in it.vi
  • This has promise for being the program that finds the XhoI sequences: new formatting program.vi

Also, some locations of data files

  • \\controller\pub\,dropzone\August 28 pCP681 files for Larry

Some simulation files

  • \\controller\pub\8 and 8b Force Measurements

======

Figured out something that was confusing me: The sequences around the recognition site for both upstream and downstream are stored in the same files in this directory: \\controller\pub\Sequences\Yeast Genome.  So, I think pretty confidently, these are where most of the sequences Larry simulated are.  A remaining question (that I will have to answer via looking at software (probably “new formatting program.vi”)) is whether he threw out sites that were too close to other sites.

-> Note: Github says as long as I keep each public repository under 1 GiB I should be OK on diskspace

====

5:45 PM Still working on github, not sure if I’ll have time to upload tonight.  Git was being super-slow on LarryXP (maybe because I was using a networked folder).  Now copying to my laptop to upload later at home.  About half done in about an hour

Also, There is an important possible bug in Larry software (not sure yet, but definitely a misconception that I had).  He was using a Richard sub-VI called “Trim whitespace” (see picture below) that only removes whitespace from beginning and end of string.  I thought it worked on the whole string.