IGERT Programs are designed to be innovative training programs for graduate students. This training involves lab experience, workshops, career development, and coursework. The Open IGERT (which is officially titled “Creating the New Scientist – Training Graduate Students in Open Science and Informatics”) is built around several core training initiatives that focus on each of those areas and include some new innovative ones. In this post I’m going to discuss the center of the educational component: The Open Courses.
The Open IGERT will feature 3 core courses designed to teach students the principles of open science and data management. Each courses material will be presented in a way that is applicable to multiple disciplines. The courses are built around a couple of core concepts that we feel apply broadly to data management:
- Data comes in a variety of forms; software, files, numbers, images, documents, etc. And it is not enough to just simply backup your data, scientists must be able to access the data at any point in time, secure it, and provide access to those that may come later.
- Data is useful to others even after it is useful to yourself. Collaboration will be key and online collaboration is essentially the flood gate control of scientific discovery.
The first course is the capstone course titled Collaborative and Open Research in Practice which will teach students (and really whoever wants to sign up for the course) principles and tools for open science and collaborative science. The information attained in this class will be used to develop a proposal to be submitted for the Open Research Challenge (to be explained in another post). Also the training from this course will be directly applicable to students for use in their home labs as it will be geared toward providing access to data and marketing that data to ensure it reaches scientists who would be interested in the findings.
Course two will focus on data management and is titled Data Management and Curation. In this course students will learn the fundamentals of the data life cycle which we outlined as
Acquisition, Processing, Analysis, and Dissemination. Metadata will play a major role in the course because in order to find relevant data online, that data needs to be tagged and metadata is the way to do it. Also metadata provides supplemental information that could be crucial to experimental repeatability. The NSF is beginning to require grant applications to have a data management plan attached, and current scientists are ill equipped to deal with this. My collaborator Rob Olendorf currently writes data management plans for many faculty at the University of New Mexico, and it’s time future scientists are taught the important features of a successful data management plan. Not only is this useful for grant applications but also data management plans will be crucial as labs become more digital. Having information and protocols for data protection, archival, and security (not just from potential theft but also from hardware failures) will be very beneficial for science in the long run.
The third core course will focus on data visualization and presentation and is titled Data Analysis and Visualization. I’m a firm believer that scientific data should be easy to understand and be visually appealing. Far too many scientists put little to no effort into making their data readable and publish complicated plots that require a lot of time to consume, interpret, and understand. This class will teach students various analysis techniques and outline effective methods for data presentation and present case studies of both good and bad examples of visualized data. Either as part of the course or as a supplement to it, I will get to teach the students the way of graphic design to enhance their data presentation prowess.
In addition to the core courses, we will feature a seminar series every semester that will range in topics that won’t be covered in the courses or will only be glossed over. The seminar will also feature career development education: oral and poster presentation design and speaking, scientific writing for publications and less formal (open notebooks and blogs), grant writing tips, preparing a CV, etc. This course will be very flexible to provide the most required educational components at the time.
In addition to the weekly seminar, we will also teach an ethics course titled Ethical Issues of Online Collaborative Research once every other year. By offering this course once every 4 semesters, the course will be available to all students in the IGERT program at some point in their funding period. The course will discuss relevant topics in open science such as: data use, reuse, licensing; social media use; conversations in public online; trolling, how to avoid it, and how to deal with it; protecting yourself and your data online; scientific communication – the responsibility of maintaining facts while providing accessibility; and daily lab ethics – working with others, using common areas and tools, publication authorship, etc.
The final educational component is an optional elective credit. We have compiled a list of courses that are taught in various disciplines that are applicable to our curriculum. Students may choose to supplement their education with one of these courses designed to provide a new wrinkle to their research experience.
The core courses of the IGERT along with the ethics course will be the core of a new Informatics program at the University. IGERT students will typically be from other disciplines and their reward for completing the courses (aside from receiving the IGERT stipend) will be a minor in Informatics or Data Management (still undecided).
The benefit of this curriculum is that students would be learning about tools, techniques, and practices that are applicable in just about every discipline. In every IGERT I’ve interacted with the educational component is very specific and unless you are directly involved in the research focus, the courses may not be relevant. My hope is to change this and by teaching some of the courses myself (definitely the capstone course, part of the data visualization course, and shared responsibilities with the ethics course and seminar series) I want to prevent students from replicating my IGERT educational experience.