Category Archives: Genomic Data

The Human Genome

Took me more time than necessary because googling “human genome sequence” does not bring up exactly what you’d expect. But I found downloadable human genome sequence data and honestly have no clue what is contained in these files. They are all zipped but labeled .mfa.gz (the .gz is the zipped part), and have no idea what .mfa could be. Shut yo mouth!

Anyways here is what I found:

  • List of all chromosomes and some extra stuff – there are two copies of each chromosome, which makes sense in the human body but not here. One is .fa.gz and the other is .mfa.gz. Hmmm…
  • List of everything – with lots of stuff I don’t understand.

Since one chromosome is as big or larger than the entire yeast genome I think I’ll just “chop up” one chromosome and match some random stuff from that, if we decide to go that route.

DNA sequences for Shotgun DNA Mapping Algorithms

Via Dropbox:

If I can think of other cool things to download and upload to dropbox I’ll add them here.

Yeast genome found… potentially

Ok so yeastgenome.org was kinda complicated to navigate, but once I noticed a handy little link on the top of the site named “Download” then it was much easier. Here are some notes:

  • From the main page click Download. Then click Sequence. On that page is a list of genomes that aren’t S. cerevisiae (something cool for later maybe), and a list of sequence files that are of the reference strain S288C (which I have no understanding of).
  • On that page I clicked genomic_releases/ to get a list of updates for the yeast genome. I’m thinking Larry probably downloaded the most recent at the time which is from June 5, 2008, but I can also see him just picking whatever is at the top (which may have been the same file, it is only labeled “Current Release”).
  • I realize I could have just linked one of those links, but what kind of notebook would this be if I didn’t show you my entire train of thought? A shitty one, that’s what kind!
  • You will need 7-zip to open the downloaded sequence because it is saved as a “.tzg”.
  • I’m also downloading the most recent update to the genome, maybe this will be useful for the matching aspect of the simulation. The date on this file is Feb 3, 2011.

Navigating Larry’s notebook and finding yeast genome

The link to Larry’s notebook (pertaining to SDM) is here (this is on a private site, sorry!). His notebook is terribly organized, so going through this takes some time. He does have a public notebook, but that is mostly about microtubule tracking software and kinesin motility simulations.

screenshots of larry's notebook displayed as a calendar.

For your benefits here is a picture of his private notebook. On the right is the 2008 notebook where the information I’m looking for resides. The left is from 2009 which contains information about upgrades and side projects to SDM. For instance the first entry of the year discusses some preliminary research regarding inversions for an application into SDM known as alternative splicing. The last entry of the year (because Larry went public) talks about some upgrades to the original simulation software, basically incorporating some more advanced mathematics into the unzipping energies.

From what I remember about 2008, Larry started designing the software we needed for SDM around July/August so I’ll start there. The paper was submitted to Nature Preceedings January 2009 to give you some scope.

Ok looking through starting Aug 1, I found this (August 6, 2008). This marks the first appearance of anything related to SDM. For those with no access here:

On that page there is a link to yeastgenome.org and Larry talks about how he can’t navigate the site and has no clue what he is doing. I’m guessing things have changed considerably in 3.5 years as I just went there and easily found a list of complete genome sequences for 28 strains of S. cerevisiae. Hoepfully as I look through Larry’s notes I’ll find exactly which strain he downloaded, otherwise I’ll just have to figure something out. Now it’s time to look through his notes and find important stuff. I’ll be back.

Finding the yeast genome

Today my goal is to acquire some genetic sequences for use in the algorithm. Once we get the software up and running, implementing this stuff should be simple. The goal for Steve today is to get that software up and running.

First I need to find Larry’s private wiki notes about where he got the yeast genome from and how he used it. I’ll be creating new posts (since I don’t think updating this post will be adequate or appropriate, it’s a my brain kinda thing) regarding this task as I progress throughout the day.

And I’ll be starting with Larry’s notes, by making relevant stuff public here. Avante!