Category Archives: Open Notebook

ONS and Intellectual Property

This post is an excerpt from my dissertation which can be found here via figshare.

Note: The contained information pertains strictly to the US legal system, and is based on information I (Anthony Salvagno) alone researched. I am in no way a lawyer and offer no legal advice, but thought it would be foolish to not share basic copyright and patent law policy for scientific consideration.

 One of the biggest arguments I hear against open research is the fear about not being able to protect your intellectual property, also known as the fear of being scooped. The biggest oversight in that argument is that IP violations occur in traditional scientific culture both accidentally and maliciously. In an open environment, however, there is a greater risk of attracting this behavior if only because scientific research is made publicly available. With that said, there is nothing about being open that is any more inviting of harmful activity than in the traditional system. In fact, because of the current US legal system, being open may be more beneficial to protecting scientific information.

With regards to the US legal system, there are two primary protections available to scientists: (1) copyright law would protect recorded scientific information, for example data and ideas, while (2) patent law would protect scientific processes, production, procedures, etc.

Despite what is commonly believed, in no way does open notebook science prevent either protection from applying to scientific intellectual property. Open notebook science can actually stake your claim on IP and provide immediate protection. For patent law, patent protection is granted for one year once a work is publicly disclosed. If a patent is not filed, the IP becomes public domain and a patent can never be filed. In the case of copyright law, copyright applies from the moment of fixation (the moment scientific information is documented). In both cases, open notebook science can be used either as a defensive tactic to protect IP, or as an offensive tactic to prevent others from profiting from scientific IP.

Copyright Law

Copyright law is essentially very simple, and has been made increasingly simple since it was originally expanded upon in the US Constitution. The most recent addendum to this statute came about in the 1976 Copyright Act, which defined rights to copyright holders (exclusive rights), how copyright is achieved, and even what does/does not constitute infringement (fair use).

 While the law is simple in principle, copyright infringement is not necessarily black and white. In some instances it is questionable as to what is even copyrightable. In others, the matter of fair use is debatable. Even when there is infringement, it can be tough to prove because there are varying degrees of copying or “borrowing.”

The bare-essential rules of copyright law can be seen in Table 1:

Copyright is applied immediately from the moment any work is tangibly recorded, both publicly and privately.

To be protected a work needs to be original (not novel) and there needs to be a minimum element of creativity (known as expression).

The exclusive rights provided to copyright holders are reproduction, distribution, derivation, performance, and display.

Copyright infringement is a federal offense!

Even though copyright is applied immediately, in order to file suit for infringement a copyright needs to be registered with the US Copyright Office.

A copyright is not violated if it has been determined that the infringer has a fair use of the material. Fair use is a broad definition and is only created as a defense in infringement suits.

Table 1: Bare-essentials of copyright law.

Rule 2 from Table 1 may reveal that copyright law doesn’t apply to most of science intellectual property, because it is fact based and process driven. Patent law was developed for this very reason. While there are no statutes against having dual protection in the form of patents and copyrights, it is not likely to receive copyright protection if there is patent protection since the copyright lasts much longer than the patent. But that’s not to say none of science is copyrightable.

In fact, journal articles are in fact copyrighted. It can be interpreted that there is creative expression in organizing scientific discoveries (which are fact based) and that would make them copyrightable. Journals hold the copyrights for publications and have exclusive right to copy and distribute the articles any any material contained within. And there are cases where they’ve tried to enforce it.

In that link, the author tries to distribute (via publishing in her blog) figures from a publication and receives a cease and desist letter. Unfortunately it will never be known if there was a violation because the infringement never went to trial. She made an argument for fair use, which probably has some grounds, but skirted around the issue by recreating the figures using the original data (which is NOT copyrightable), thus making her own original figures which are therefore copyrightable. There is a chance that she has no fair use argument since her reuse (even through attribution) is a clear violation of distribution rights and can be viewed as falling within the same scope of the original publication.

 In the case of publications, scientists waive their copyright upon submission and acceptance for publication and dissemination, and grant that copyright to the journal. Not all scientific output is formatted for publication, or released at all. In that case, it would greatly benefit scientists to publish their figures via an open notebook to provide copyright protection for their research (if that is in fact the goal).

With regards to the traditional science system, scientists are offered protection from the moment they record their data and create figures based on that data. They are even protected at conferences where they present their research (either via an oral or poster format). This is specifically useful in the case of scientific scooping, which isn’t as rampant as we make it out to be but is still a major fear in the community. If there is a case of potential copyright infringement, you have the right to file suit (once you apply for copyright). If you can prove there was access to your research findings and there is substantial copying you may even win your case.

If you are an open scientist, in that you publish your research findings online before peer reviewed publication, you may be in an even better position. You are granted the same rights as a traditional scientist. In the open case, however, the proof of access is much easier to demonstrate since a simple Google search can turn up your findings. The burden is then that you prove there is evidence of copying, which is hard enough as it is.

Because of all the possible interpretations of copyright application to science, I highly advocate the use of the Creative Commons licenses. The CC0 (public domain), CC-BY (use with attribution), and CC-BY-SA (use with attribution and share alike) afford the copyright owner the ability to share their research findings with the community and in turn allow the community to share, use, and reuse those findings without fear of retaliation. It is incredibly important to note that using the CC licenses (with the exception of the CC0) does NOT waive all exclusive rights as a copyright holder. They allow you to waive your rights as long as the reuser of the original work attributes, shares, etc (per terms of the license) in turn. If those stipulations are infringed, you are free to take action. In fact, there is legal precedence of such action.

The licenses provide a means for others to use information and data without worrying about moral ambiguities, legal issues, and in turn promote a culture of sharing and attribution. With the CC licenses there will be more societal pressure to do the right thing. When credibility is involved social pressure can work wonders.

For more information, please refer to the US Copyright Office website.

Patent Law

The America Invents Act was initiated in 2011 and institutes some new changes to patent law. The newest inclusion to the law is that now patents are given based on a first-to-file system, whereas previously they were given through a first-to-invent system. This change was implemented on March 16, 2013 as a way to conform to international policy, but also to decrease the burden of the US Patent Office in identifying first-inventor which can be extremely complicated and arduous.

 In a first-to-file system, a patent will be granted to the first person to file a patent for a given invention. While the system is as simple as it sounds, it tends to give advantages to larger entities with the resources and efficiency to file patents for every invention conceived. It is outside the scope of this writing to argue the merits of a first-to-file or first-to-invent system, but this is mentioned because there are a couple of workarounds to the first-to-file mandate. The first is through the filing of a provisional application, and the second is through public disclosure. In both cases, there is a one-year grace period under which a patent must be filed lest it become public domain.

The provisional application is a low cost option that grants an inventor protection from competitive patent filings. The fee is $125 for small entity inventors, such as individuals, and $250 for large entities like corporations. The intellectual property remains a secret during the provisional period until patent. Public disclosure is a free alternative to the provisional patent, in the sense that there is nothing to file with the patent office. With this method, the details of an invention become public information, but no competitor may file a patent.

Scientifically speaking, patentable items include processes, designs, and technology of all sort (although computer programs are hard to patent or copyright). It is usually advantageous to maintain secrecy when dealing with intellectual property, and this culture is especially prevalent in science. As such many universities and institutions have legal services that aid scientists in patent filings. In an effort to maintain confidentiality, it is highly suggested by these services to file provisional applications for all inventions.

Much like copyright, the ultimate goal of a patent is to prevent competitors from stealing and reproducing a work without the inventor benefitting. It is little known fact that patents become public information after filing, generally 18 months after the earliest filing date. It is entirely possible for competitors to analyze a patent and create a “non-obvious”derivation of the work that can then be patented. In this scenario the benefit of the patent application is essentially lost.

Open notebook science can be a major benefit to the new patent process. Since it does cost money to file a provisional application, ONS (or other web disclosure) would provide a free alternative to the provisional application. The only difference between the two routes is that through ONS, the patent is immediately public information, while the provisional application maintains invention secrecy. Because the patent will eventually be public domain, the incentive to innovate is delayed a bit through the provisional process.

While ONS publicly discloses a scientific creation and encourages potential modification, it does not promote/encourage stealing the idea. Scientists are still protected from patent infringement. Now, if a competitor sees the notebook entries and makes non-obvious changes to the idea, then they can be granted a new patent, if filed. That is no different from how the patent process currently operates, it simply speeds up the process.

Filing a provision for every idea ever produced and paying $125 every time is a waste of money and resources. It is highly unlikely that every idea/invention will come to fruition. It also gives the US patent office a lot of unnecessary paperwork, and could actually stifle innovation and creativity. ONS would in turn allow a researcher to disseminate their ideas and protect the best ones for the original creator. Resources could be better used to fight for the best ideas and allow others to develop the ideas that won’t necessarily get the same level of attention or ever be produced.

In this way ONS could be used as a defensive tactic to protect a scientist from losing his/her best ideas. It is also possible for open notebook science to be used as an offensive tactic. In this maneuver, the documentation of ideas born from discussions or other endeavors creates prior art (which is essentially the same as public disclosure). An invention disclosed in prior art is exempt from patent protection. So in the case of public disclosure via ONS inventions would be blocked from filing for patent. Hypothetically, a researcher could publish any and all ideas, techniques, or technologies and prevent all competitors (and peers) from filing for patent.

In the interest of sharing research information, open notebook science may be the best protection against impediments in the scientific process. 

Notes on Intellectual Property: Copyright Law

In the quest to discover how a scientist may protect their intellectual property with regards to open access to that IP, I’ve decided to do some research. The notes contained here come from:

Intellectual Property: Patents, Trademarks, and Copyright (in a nut shell, 4th edition) by Arthur Miller and Michael Davis

In the interest of time and sanity, I’m going to focus on copyright law. Generally when providing open access and CC licensing, only copyright applies since nothing contained is trademarked or patented (except in the case where patents are filed). Hopefully the information I document here is useful to those who want to follow the model I have used, and maybe it’ll be useful to scientists who pursue other avenues of scientific discovery.

Foundations of Copyright Protection

  • first it should be said that copyrights pertain to “written” works which has come to expand to other works of art and computer programs, and in our case scientific data/research.
  • originally copyright law’s jurisdiction was from the moment of publication, but amended to the moment of fixation – that is the moment a work becomes transcribed into a tangible form. In our case that means once data/methods is acquired and stored.
  • typically, registration of a copyrighted work is important, but “the basic doctrine of this country’s copyright law is to protect authors without requiring it.” That is especially important for science because information and conclusions are being produced all the time and it would be nearly impossible to register all of that scientific work constantly.
  • The Copyright Clause of the US Constitution: “To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”
    • Basically Congress has the ability to power to create legislation dealing with copyrights, and has chosen to do so since 1790 and has amended the law several times since then.
    • A 1976 revision to the law was created as the Copyright Act of 1976, which applied copyright to moment of fixation, like I stated before.
  • Prior to the 1976 Act copyright fell under two distinctions (not sure if that’s the right term): (1) there was common law copyright and (2) statutory copyright
    • common law gave authors the ability to protect their work from being copied forever as long as the work was unpublished.
    • once the work was published then statutory copyright law took over. this copyright was limited (unlike common law which was perpetual). The benefit was that authors could publish their work and claim a monopoly over their work and receive compensation while being protected by the law.
    • the problem with this system was that there was a gray period when common law copyright would end and statutory copyright would begin. To complicate matters new methods of communication made it hard to classify the concept of “publication.”
  • the 1976 Act essentially eliminates the concept of common law copyright and protects the author from the moment a work is recorded in some concrete way. For research I assume that would be from the moment notes are taken, but I can see a case to say that this moment is actually when a grant for research is written. Some articles in the act:
    • Section 102 is pretty important in that it defines the moment of copyright and what a work of authorship is. Interestingly section b of the law states: “In no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in such work.” Despite the fact that copyright was specifically created to aid science the wording of that section seems contradictory. More information will be needed.
    • Section 106 gives the author exclusive rights to produce copies of the work and any person who makes copies without the authors consent is subject to an infringement suit and can be arrested (Section 506). Yikes! Derivative works are also protected.
    • The author is protected when displaying/performing the work publicly. This seems to be applicable to open science. Allowing scientists to publish their research without fear of data misuse/thievery
    • It seems copyright applies to the publication of science (data, journal articles, etc) but patents provide protection of the actual process of discovery. So the application of the law to open science would be a mixture of the two law regimes.
    • The basis of copyright protection lies in expression and originality. Since facts and ideas aren’t copyrightable the way an idea is expressed becomes important. So for science, data probably isn’t very protectable, but they way you display that data (interpretation) probably is copyrightable. Originality here becomes important. A work doesn’t need to be new or novel, it just needs to be proven that it wasn’t copied or derived from someone else.

The Subject Matter of Copyrights

  •  The key aspect of copyright is originality. According to the author “an author can claim copyright … as long as he created it himself, even if a thousand people created it before him.” 
    • This is especially interesting in the open publication world, and to me, makes Creative Commons licensing all the more important. With access to works (via the web) copyright violations can become more of an issue. The CC license essentially allows you to keep your copyright, but provide would-be authors the chance to adapt a work without fear of infringement (and likewise, authors won’t have to fear plagiarism).
    • Because of the simple concept of originality, there has been some interpretation as to what exactly can be copyrighted:
    • Burrow-Giles Lithographic Co v. Sarony (1884) established that artistic consideration and creative effort is enough for photographs to be copyrightable.
    • But in 1903 Bleistein v. Donaldson Lithographing Co declared that a work had originality if it was “one man’s alone.” At that point artistic merit was not to be considered by the court.
    • Artistic reproductions became copyrightable after Alfred Bell & Co. v. Catalda Fine Arts, Inc. (1951) because the reproduction can be considered an original work. Essentially the reproducer is protected from someone making copies of his reproduction. (This probably only applies to reproductions of works that are in the public domain, since only the copyright holder can allow reproductions of a work.) Also it must be demonstrated that the reproducer has contributed something more than trivial to the reproduction.
    • The “sweat of the brow” doctrine gave originality to works that were not artistic in nature. For instance, aggregations of public domain information were protected if the author demonstrated some investment of original work.
    • Feist Publications v. Rural Telephone Service (1991) rejected the “sweat of the brow” doctrine on the premise that there should be “some minimal degree of creativity.”
      • Basically simple information aggregation, or fact compiling, isn’t enough for copyright. But this shouldn’t exclude scientific data from being copyrightable since the collection of the data is a creative process and the data analysis is highly nontrivial.
      • Interestingly computer databases may fall into the category of non-copyrightable works and as such sui generis protections are required. This is interesting because of the involvement of data and may become an umbrella for scientific research.
      • As a result of this trial, there remains a lot of controversy as to how much creativity is required for copyright protection.
  • To determine what categories of works can be included for copyright protection see 17 USCA 102 (linked above). But the wording of that section suggests that copyrightable material need not fall under those categories specifically. Those are provided as a guide.
    • Works of utility (functional objects) are generally not granted copyright protection because that is what patents are for. But there are exceptions in the case of works that are non-functional, or for portions of functional objects that are non-functional (ie designs). For example Mazer v. Stein (1954) allowed the copyright of lamp bases.
    • When the idea and it’s expression are inseparable, copyright is generally denied. This affects things like forms, systems, software, and potentially scientific data. Blueprints on the other hand are copyrightable, and until recently the buildings themselves were not. Now buildings are copyright protected, but not functional components like doors and windows. Fashion designs fall into both realms, patterns are copyrightable but the design of clothes themselves are tough to copyright.
    • The availability of patent protection makes it hard to attain copyright, even though nothing is explicitly written to prevent this. In fact there has been a case to determine that patents and copyright can both exist in the same work (In re Yardley (1974)).
  • intangible expression is not protected under copyright since there is no fixation of the expression. Choreography is an example of this. Speeches are another, but presentations with powerpoint should be copyrighted because the presentation has been “scribed.” Likewise, audio recordings of a speech are copyrighted.
  • the term “writings” (as said in the Constitution) and the more narrow “works of authorship” (as written in the 1976 act) are incredibly hard to limit in scope. The authors note that it is “difficult to identify those works that would constitute writings but that would not be original works of authorship.”
  • computer programs are copyrightable, but may be denied copyright if they “lack minimal originality… or constitute the only way of accomplishing a particular result.” The second part is essentially phrased so that the program is itself an idea and no longer the expression of an idea that can be expressed in other ways.
    • when dealing with programs it seems there are two components literal and nonliteral:
      • literal components refer to the programming code and has been copyright protected
      • nonliteral components refer to the organization and the user-interface (among others) and is harder to attain copyright. This is especially true when the interface is dependent on user-interaction.
  • The Berne Convention has complicated the legality of copyright. Through signature, the US recognizes the copyright of all other countries that have also signed.
    • “the copyright formalities…have lost almost all of their legal significance”
    • “notice of copyright… has virtually no legal significance.”
    • “similarly, registration has almost no legal significance” –> “the only remaining procedureal effect of registration is that US authors must register before bringing suit.”

Exclusive Rights

  • see section 106 of the 1976 Act for the exclusive rights of authors. Most of these rights are upheld only publicly, but 2 (reproduction and derivative work) are subject to infringement both publicly and privately. Note that public is defined as “a performance or display to a ‘substantial number of persons’ outside of family and friends.”
  • reproduction allows the copyright owner to exclude all others from reproduction of the work
    • a copy is defined as “any material object from which, either with the naked eye or other senses, or with the aid of a machine or other device, the work can be perceived, reproduced, or communicated.”
    • phonorecords are not specifically excluded from the definition of copies, so they have been specifically added to the description of reproduction
  • derivative works (works based on the original work) are also under protection for a copyright owner
    • this is defined as “translations, arrangements, dramatizations, fictionalizations, films, recordings, abridgements, condensations, ‘or any other form in which a work may be recast, transformed, or adapted.'”
  • the right to distribute to the public “by sale or other transfer of ownership, or by rental, lease, or lending…”
    • called the first-sale doctrine
    • copyright owner has the right to prohibit others from distribution of work, until the ownership is sold/transferred. At this point, the new owner has this exclusive right.
    • designed to prevent restraints on alienation, “attempts to make an actual sale resemble something less than that… will be unsuccessful.”
    • it is possible a third-party to be held liable if there was no first sale
  • the right to perform work publicly is also provided to copyright owners, but excludes purely graphical works and I feel scientific data falls into this category.
  • the right to display a copyrighted work is also exclusive to a copyright holder.
    • owners of a copy of work are permitted to display one image of the copy and this includes digital transmission (internet, network, etc)

Infringement

  • occurs when any of the exclusive rights of the copyright owner are violated – makes sense
    • doesn’t need to be intentional
    • it can even be unconscious – an author produces work that he conceives is original but is actually unintentionally borrowed from another author
    • indirect infringement – “one who actively and knowingly encourages another to infringe”
    • contributory infringement – producing a work/device that can be used to infringe on copyrights (see A&M Records v. Napster, 2001), but note that if there are substantial non-infringing uses then contributory infringement is not applied
    • vicarious/related infringement – seems similar to indirect inf. “a person who profits from an infringing performance, AND who somehow supervises or has the right to control or supervise the performance”
  • “to prove infringement, a party must establish ownership of the copyright and impermissible copying”
    • usually determined via circumstantial evidence
      • substantial similarity – remarkable resemblance to original work
      • proof of access – opportunity for contact with original work prior to creating work
    • literal copies allow for the proof of access requirement to be less
    • similarity and access are not required proofs, but merely an evidentiary method

Fair Use

  • “a balancing process by which a complex of variables determine whether other interests should override the rights of creators” – there are 4 interests:
    1. purpose and character of the use, including commercial uses
    2. the nature of the copyrighted work
    3. the proportion of the work that was used
    4. the economic impact of the use
  • seems like a very sticky thing to prove in cases of infringement and all cases involving fair use are ruled based on the interests listed above. Seems like cases where indirect infringement occurs has most likey use of fair use defense.
  • Purpose and Character:
    • commercial vs noncommerical
    • public vs private – private nature of use can be favorable in fair use defenses
    • educational and nonprofit (especially together) are favored for fair use, but not always grounds against infringement
  • nature of the work plays a role in determining fair use
    • ex: educational works may not fall into fair use if the original work is educational itself, because of the economic impact of the use (the works are in the same area of economic potential)
    • consent issue – would the author give consent for uncompensated use if the author can use the work for their own benefit?
    • unpublished nature of work may be within fair use, but prior cases have precedent for barring the defense
  • amount of the work used (proportion) is important in determining fair use
    • proportionality is to be measured with respect to the original (copyrighted) work, not the potentially infringing work
    • quantitative, qualitative, and reverse proportionality can all be used to determine fair use, but only the first two are specifically mentioned in law
  • economic impact is particularly important when determining fair use – this should be obvious since copyright is designed to provide an author protection to profit from their work

Ownership

  • it is important to realize the physical work and the creative property are two separate entities. A transfer of the physical work does not constitute copyright transfer. This is important when considering communications between two parties: an email or letter for instance. The information in the communique is copyrighted and protected but the actual paper/message is nothing and particularly meaningless.
  • copyright must be transferred in writing
  • multiple authorship makes copyright ownership complicated and occurs when:
    1. work consists of material made by more than one person (joint works)
    2. work is made by one and published by another (work for hire)
    3. work can be neither joint nor work for hire and is classified as collective works
    4. work based on prior author is derivative
  • in cases of coauthors, each owner has the right to use the work for their own purposes, but neither can prevent the other from doing the same.
    • neither author is allowed to destroy the value of the work

Registration

  • copyright protection is automatic – as soon as a work is fixated (written, drawn, etc) copyright is applied
  • for clarification: copyright is designed to prevent copying, as an author you don’t need to find works that are similar to one you wish to create if you are creating something independently.
  • but registration of a copyright is required if legal action is to be taken – ie if you want to sue for infringement
    • you can register a copyright after finding an infringement but before filing suit
  • notice is optional (for works authored after 1989), but when it is applicable there are 3 rules, notice of copyright must be affixed with :
    1. copyright symbol (letter, symbol, word, or abbreviation
    2. the date of first publication
    3. the name of the copyright owner

 

The Repeating Crumley-ONS Project: Next Steps

Slightly over a month ago, I came across the Winnower and began a project in open notebook science. The concept was to upload notes from my notebook to the Winnower, archive the notes, and get DOI’s for each post. Then I would write 2 papers: one to summarize the experiment and the other to theorize a complete publication system that would incentive open documentation of real-time research (open notebook science). I chose the Repeating Crumley experiment for this experiment in ONS, and you can read about the reasoning here.

Well I’m happy to say that I’ve completed Steps 1, 2, and 3! I’ve posted every notebook entry in the RC series (there’s a physics pun there somewhere) to the Winnower and received DOI’s for almost every post. A few posts didn’t translate, at all, on the platform. They are uploaded, but I didn’t bother with the DOI. Regardless, you can go on any of my Winnower posts and get a DOI (or click through to my notebook),  or look through the RC entries and click the DOI to get to the Winnower archive of that post.

One cool side effect of this project was that a Twitter friend noticed a post that had embedded .gifs and I think I am now credited with being the first to publish a scientific paper with embedded .gif’s.

Now it’s time to write the paper based on all this research. I got the process started a couple years ago with a Google Doc about the project. I think I never followed through, because I didn’t value the traditional publication process. I think open science and peer review publication are on a course to merge and the incentives for ONS will shift, but this is a topic for another time.

Anyway, here is the previous write-up which I’ll work on, merge with some info from my dissertation, and to which add some new thoughts.

This part may take some time…

Small-ish issue with digital object identifiers

I’m no expert in this space, but I came across an issue with digital object identifiers because of my annoyingly persistent use (overuse? hahaha) of figshare. What happens if the archive tool you use for your data switches from one permanent link system to another?

Back in the early days of figshare, they used the handle system to provide a permanent link for data stored in their system. At some point they switched to using the DOI system. I have no idea when it happened and I don’t even think I noticed the change. The only thing I know now is that my older figshare datasets are full of dead links.

The point of using a permanent link, ie a handle or a DOI, is to maintain a connection to the source if the URL or data at that source changes. Any changes will result in a change to the metadata which will allow the permanent link to point to the correct location. This allows you to change the URL for a dataset on figshare, for instance, and the DOI link will point you to the updated location.

In my case old projects that were linked via the handle system are all updated with DOIs. Since the two systems are different, I have the unique situation of having broken permanent links! Obviously, this defeats the purpose of a permanent link. So it seems I have some work to do to find all the outdated figshare sets and update them, which presents a very tedious set of challenges.

Has anyone ever experienced anything like this? I’m not familiar with the internal workings of permanent link systems, but is there a way to easily move from one system to another? Does this present an issue for the future of web science where DOIs or handles are obsolete? I imagine in that world there would need to be a system wide effort to ensure everything is upgraded properly (like switching from paper to electronic records).

100% Real-time publication: an experiment in #opennotebookscience

I’ve long been an advocate of open notebook science. In my advocacy, I am always looking for new ways to encourage fellow researchers to pursue this methodology for their own research. The latest of which pertains to archival and citability.

The ability to receive credit for your research, has been a requirement of science culture for quite some time, and is presently essential to an academic career. The altmetrics movement has been a valuable way to track and receive academic credit for new and nontraditional publication methods. Online tools like Impactstory help to track these activities, while tools like Figshare help propagate data and track your online impact as well.

This has always been missing from open notebooks.

I’ve always advocated against the need for a singular open notebook platform for the reason that ONS needs to have the flexibility to meet the needs of the scientists who use it. I’ve also never actively pursued a tool that can provide that formal citation credit since there are APA, MLA, etc rules for citing websites and other online resources. But the success of Figshare and other software has made me rethink this approach.

If open notebooks could have an automatic way to apply either a handle or a DOI, and could be archived, I think people would pay attention. If there was a publishing platform that could freely contain all the information of an open notebook, give the notebook a DOI (for instance) for each entry, and then host the final publication for peer review, there would be an even bigger incentive for ONS. And obviously there would be more transparency in the research process.

Where am I going with this?

Well a few days ago, I did a search for “DOI for WordPress” and came up with this, a plugin for a website called The Winnower. I had never heard of this organization so I went to the website and found a world of opportunity.

The Winnower, in case you are unfamiliar, is self-labeled as a DIY science publication platform that features a post-publication peer-review process to expedite and lower the entry barrier for publication. Once you submit your manuscript you can request a DOI for your article, which will undergo changes as you receive feedback for the publication.

The aforementioned plugin allows you to post blog entries (self-hosted WordPress blogs only for now) to the Winnower and receive DOIs, and with it the easy ability to be cited, for those entries. Integration between an open notebook and the Winnower (or a platform like it) could be a huge step forward for the ONS movement.

Imagine being able to see the entire scientific record for a study contained in the same system. Even better, imagine being able to witness the development of the study in real-time, providing feedback to the experiment, and being active in its development. When it comes time for peer-review, the process should theoretically be quick, because the work should have been vetted. If it hasn’t already, then it is relatively easy to review the prior work summarized in the publication, because it is all self-contained on the publishing platform (or the open notebook where the publication is).

In the interest of open science, I will perform an experiment. I will re-publish a series of notebook entries pertaining to one experiment and will write a paper based on that experiment. All of that will be published on the Winnower, since the mechanism is in place to cross-post from this notebook to that site.

The experiment I have in mind is the Repeating Crumley experiment that was the basis for my work on deuterium depleted water. It is the perfect experiment for this trial in ONS publication because the work turned out to reveal a mistake in the original study from the 1960’s, and I also propose a correction to the methods.

The key to this ONS experiment would be to understand what would be required of an open notebook or publication system to be able to provide a complete, organized, and user-friendly documentation system, or at least what is required for proper interaction between an open notebook and a publication platform. Additionally I hope to demonstrate another benefit to open notebook science in an effort to encourage others to participate in ONS.

In the spirit of open notebook science, I will document my interactions here and possibly also on the Winnower, and then write another publication on the Winnower about ONS and the peer-review system.

You can follow the documentation process through my Winnower profile.

Design tips for a killer presentation

I posted this to the Scifund blog but decided to share it with the readers of this site who may not follow Scifund. Enjoy!

Yesterday I provided some motivation for why you should make a great presentation. Now that you are amped up, you should be ready to get to work. But what if you don’t know exactly what to do to separate your presentation from the rest? Well don’t worry, I got you covered. Today I’m going to provide a few simple design tips that you can incorporate into your presentation to give it that wow factor.

The rule of thirds.

If you learn only one thing from this post, remember this rule as it is one of the most basic/important design rules. It is also very handy for photographers and could easily be implemented in your presentation. The setup is easy, just take your artboard (your slideshow page) and divide it into 3 columns and 3 rows of equal spacing (the image here is a 1024×768 px image divided into 9 compartments).

Now I’ve heard the rule of thirds presented in two ways, and I use both depending on the situation. The primary rule is that the subject of your image should be placed on the grid lines of your slide. If you have intersecting components, for instance a horizon line and a subject, then the intersection of your composition should be placed on an intersection point of your grid. Here is a great example of this in practice:

In this image the hawk is aligned with the right grid line, while the top of the grass (horizon) is aligned with the bottom grid. Using the rule of thirds in this way creates a new level of interest in your presentation, and leaves a lot of desirable white space to enhance the interest in your subject.

The other use of the rule of thirds is to place your entire subject into thirds of the space. This is a bit more difficult for me to explain, so I’ll go right into an example:

Here the flower occupies the entire right third of the image, and the bee occupies the middle third, leaving the final third for white space. In this photo the subject (the bee and the flower) takes up 2/3 of the image space and enhances the interest in the subject. Coincidentally the bee is centered in the image, which might give peace to those symmetry freaks. Bonus points if you noticed that the eye of the bee is aligned with one of the grid intersection points.

One way I use this is presentations is when making an outline (which I really don’t like to do). In the following example, I simplified my dissertation talk into 3 components and used an image to summarize each component:

Screen Shot 2013-05-22 at 5.35.52 PM

There are lots of ways to use this rule in presentations so don’t be afraid to experiment.

Use simple colors for backgrounds.

I’ve seen this violated in business presentations far more times than I have in science, but it still is worth mentioning.

Don’t use backgrounds that have textures, patterns, gradients, or distracting graphics.

It is too distracting to the eye, and your audience won’t be paying attention to you, they’ll be too busy recovering from their seizure. All kidding aside keeping your slide backgrounds simple will make your presentation easy on the eye. Believe it or not, my favorite background is a simple white background with black text. With great contrast comes great responsibility… or something like that.

If you want to go with better eye ergonomics, then use a black background with white text. You get the same level of contrast (maximum!) and you get an added benefit. Think about this from your audience’s perspective. They are sitting in a dark room, getting blasted in the face with bright photons bouncing off the projector. By making the background black the intensity of light reflecting from the screen is diminished and your audience is a little happier. If the lights in the room are at maximum, you may want to stick with the white background so they can actually see the slides.

If you insist on using color, then by all means do so, but stick to solid colors and use a font or image color that provides good contrast to your color. Having a basic understanding of color theory can be very helpful in this regard (See also HSV color space).

Pick quality fonts.

The choice of font will mostly go unnoticed if you go with classic choices like Times New Roman, Calibri, Arial, Myriad, etc. But if you choose to use fonts like Comic Sans (sorry Comic Sans, I had to…) your presentation will definitely be remembered, in a bad way. If you want to go with interesting fonts pick something that fits the theme of your presentation, but make sure it isn’t too distracting. Fonts may make for interesting design, but if your audience struggles to read it at a normal pace they will pay less attention to your message and spend more time trying to figure out what you wrote on screen, why you chose that font, what the funny shapes look like, and then your audience will be lost.

But even picking classic fonts don’t have to be boring. You can pair fonts to make headings enticing and body text readable. For instance, use Times New Roman for titles and Arial for your body. The content will still be readable, but you’ve added a new twist to the presentation. You can even reverse the scheme and go with Arial for the title and Times for the body. Here is a decent beginners guide to pairing fonts. And if you want to find some fun fonts to install on your computer check out some of my favorite resources for royalty free fonts: Da Font, Font Squirrel, and the Lost Type Co-op.

One idea per slide.

All designers advocate for keeping it simple, and some presentation designers incorporate this concept by keeping slides to 3 ideas. I like to take this two steps further by maintaining only one idea per slide. This can be especially handy for presentation styles like Ignite talks. By limiting the slide to just one idea, your audience has no choice but to focus on the one topic at hand and it will certainly make it easier to remember individual points over the remainder of the talk. If you have an image to share, show just the image and remove all the bullets, descriptions, etc (crediting a source is ok though). If you have a list, break the list into its components and put each component on one slide. Make it impactful by just writing the one idea and nothing else. The benefit here is that your audience literally has nothing else to focus on, so after they quickly read the concept they’ll be making great eye contact with you and giving you amazing positive feedback that will energize you through the rest of your talk.

Show only the most relevant information.

This rule is slightly piggy-backing on the previous rule, but comes into play more when you have no choice but to feature more than one object of focus. Presenting data is a good example of this. Most data is complicated, and as the presenter it is your job to simplify it. Making it obvious what your audience should be taking away from a figure is important. Most likely you won’t be on a data set for longer than 3-5 minutes, and if your data is complicated it may take much longer to digest that. Here is an example of some data from my research:

Screen Shot 2013-05-22 at 6.21.17 PM

In this example I was merely trying to show that the higher the concentration of heavy water (D2O), the slower the growth of yeast. As a secondary, I wanted to show the disparity between normal water (DI water, green) and 99.9% D2O (blue). Since it was important that each line be distinguishable, I chose various colors to represent each data set. To help distinguish DI water from 99% D2O, I made those two colors more prominent by making all the other colors more white (in this case I increased the transparency of the other lines). If I simply wanted to distinguish the two lines from each other while showing the other data I could have done something like this:

Screen Shot 2013-05-22 at 6.33.15 PM

In this case I made the extra lines gray so they don’t detract from my message, which is there is vast difference in growth between yeast grown in DI water vs D2O. In this case it’s really easy to distinguish the two data sets I want to feature. But I didn’t want to lose the gradual difference in growth rate, so I simply applied a color gradient to the other sets. As the concentration of heavy water (D2O) increases the color changes slightly.

This is a relatively simple data set to explain, but you can use similar design logic to convey more complicated results. It just takes a little patience to make sure you are really conveying the point you wish to make, and more importantly the information you want your audience to retain.

Break some rules!

Despite all the tips I’ve shared with you, sometimes you can’t convey your message within the constraints of simple design rules. So the final rule is provided, to give you the flexibility you need. But be warned, you should only break the rules if you really need. Here is a slide that is one such example:

Screen Shot 2013-05-22 at 6.41.18 PM

I used a busy background and a list of details to show the differences between hydrogen and deuterium (the stable isotope of hydrogen). I did use the rule of thirds to align the columns, but still I broke lots of rules. But there is a method to my madness.

First, the background was designed to illustrate a point. Each of the little dots is a graphic representation of a water molecule (not to scale). In this case I was trying to show the ratio of hydrogen to deuterium in nature, which is for every 1 deuterium atom there are 6,420 hydrogen atoms in 3,210 molecules of water. So on this slide I placed 3,210 molecules, of which, there is exactly one deuterium atom.

Second, the list was used to highlight the differences between deuterium and hydrogen, which is the one idea of the slide. I don’t expect my audience to remember all these details, I was merely trying to show differences between the two atoms that will later explain differences in the chemical properties of the two water types.

In order to convey my message effectively and impactfully I needed to break a few rules. But I don’t abuse this rule when designing presentations, and you shouldn’t either. Combining the tips provided here will give you the most impactful science talk many of your peers will have personally witnessed. As a final example, here is my dissertation defense in it’s full glory. Take note of my use of each of the rules and try and understand my motivations on slides where I break the rules. If you have any questions feel free to tweet/email me or just leave a comment below.

Advertising in Open Notebooks

I wrote a post a long time ago about potentially funding science through ads in your open notebook (presuming you have one). The conversation was great and included a lot of different perspectives. The one prevailing piece of advice was that of caution. Advertising brings money, when money is involved there is investment, and people tend to feel that their investment warrants voice. When it comes to science there is the potential to be controlled (in whatever capacity that may entail), which is something that should never be incorporated into research.

Interestingly enough, I’m bringing this conversation to light again, but this time it is through practice. If you go to the home page of this site (click the title/banner above) you will see an ad for biology.answers.com in the right sidebar, just below my social media information.

I want to make it very clear that the presence of the ad in no way influences my research or post content. The material on this site is free to read and free to use and will always be CC licensed.

I decided to place the ad mostly because it would pay me. Which initially I was against, but after thinking about it I decided to go through with. They didn’t pay me an astronomical sum, but the money I received for the add pays for the website, which allows me to continue to keep the site running for the foreseeable future. This I believe is great for all of you.

Another reason I included an ad, is because I was asked to list the ad. It’s not a google ad that I’m trying to make money from over all time. Right now it is a one year deal and if it doesn’t work out then oh well. Because I was approached about advertising, and someone had spent time analyzing my site, I put some thought into the decision.

I also did it because of the experiment I linked to above. I don’t know if anyone has had advertising on their research site before, but if you can pay for any aspect of your research without government funding then why not? I may get a lot of negative feedback for allowing ads on my site, I may increase the likelihood people have open notebooks knowing they can have self-sufficient, self-hosted websites. I have no idea. We’ll see what happens in the long run regarding this issue.

If you have any comment, concern, or anything to say about me doing this then please, by all means, submit a comment. I also highly encourage you to read the post linked above, which is not my best work but raises some interesting questions. In that discussion, I never thought of specific banner ads, as most of the conversation was directed at Google banner ads, which change depending on the user and site content. With that said…

…share your mind!

 

@BreakingBio Episode 24: #OpenNotebookScience Edition

I totally forgot to post this when it came out about a week after I defended. The folks who run the hilarious video podcast Breaking Bio had me on as their special guest, with Heidi Smith reporting live on location. Check it out:

The Biophysical Effects of Heavy Water – My Defense Presentation

Defense Outline

Just over a week away now…

  1. Introduction
    1. What is D2O?
    2. The history of D2O
      1. Gilbert Lewis:
        1. purification
        2. biological effects
        3. The hypothesis
      2. Joseph Katz
        1. various experiments
    3. Uses of D2O
      1. NMR, mass spec
      2. The need for a D2O adapted organism
    4. Experiments in DDW
      1. use for space travel
      2. cure for cancer?
  2. The effects on life
    1. Tobacco Seeds
      1. The Crumley experiment and repeating the experiment
      2. Tobacco seed germination rate
      3. tobacco seed growth rate in low deuterium concentration
    2. Arabidopsis
      1. arabidopsis growth rate
      2. arabidopsis morphology
    3. E. coli
      1. growth rates
      2. adaptation and adapted growth
      3. morphology
    4. Yeast
      1. growth rates
      2. adaptation – can’t adapt
      3. morphology
        1. stall during cell division
        2. microtubule stabilization in D2O
  3. Molecular effects
    1. Stabilization of biomacromolecules
      1. DLS experiments
        1. Catalase
        2. Ovalbumin
      2. YPD longevity
    2. Investigation of HD exchange
      1. mechanism and exploitation for protein struture studies
      2. FT-IR analysis
      3. Cavity ring-down analysis
        1. low cost measurement of local atmosphere isotopic composition
    3. Effect on DNA
      1. The pursuit of shotgun DNA mapping
      2. optical tweezers
      3. methods
      4. overstretching data
  4. Future Work
    1. Arabidopsis
      1. adaptation
      2. seed growth in low deuterium
    2. Tobacco growth in low D2O
    3. Yeast morphology in taxol
    4. E coli protein expression in D2O and protein structure analysis
    5. DNA
      1. overstretching in D2O with intercalators

Well there is my idea of how to present my dissertation. I’m not sure if/where I should put my discussion on open notebook science. Also there are a couple things that I could see going elsewhere. I could describe the yeast and e. coli stuff in parallel instead of one after another. Also the HD exchange stuff could easily go right after the yeast, e. coli, or even the tobacco seed stuff. What to do…

Otherwise I think the story is pretty compelling: history of D2O and the unanswered question by Lewis. Investigations into D2O effects and trying to understand low D2O concentration effects, effects on macromolecules, and the understanding of large volume/long-term HD exchange.

Any feedback you may have would be GREATLY appreciated. I’ll send you a figshare t-shirt, or if you are XL, I’ll send you a hoodie (but I only have one).