Course Description: Environmental sensors, sequencing instruments, social media, and the Internet all contribute to fundamental changes in the nature of scientific research, suggesting data-driven research as the 4th Paradigm of Science. Digital data produced through computation is not a commodity that is consumed in a single use, but is an important and invaluable intellectual asset that can be used repeatedly to fuel new ideas and insights. Managing research data for the long-term, and ensuring its continued access, has emerged as a major challenge. But as the well known 2003 "Atkins report" states, "absent systematic archiving and curation of intermediate research results, data gathered at great expense will be lost". In this course we examine the full lifecycle of digital data with a focus on the challenges of Big Data.
The course covers the following topics:
The course utilizes lectures, presentations, and discussions. If student interest and background merits, students will get hands-on experience with research tools and web services around a class project. See http://pti.iu.edu for kinds of research tools to be explored.
Prerequisite Moderate level of mastery with programming in traditional programming language such as Java or C++, and this experience in something more substantial than toy standalone codes. Interdisciplinary teams that utilize complementary skill sets are a possibility depending on class makeup.