Discovering the unusual or unexpected in scientific data is often the path to the most fundamental breakthroughs in physical sciences. Yet as scientific data grows in size, our ability to understand and to find the unexpected rapidly diminishes with current tools. Experimental scientific data is doubling every year due to successive generations of inexpensive sensors and exponentially faster computing. Astronomical data sets are no exception.
In the near future, instruments will commonly create data sets of 50-100 TB or more for a single observing project. Many of our current tools and techniques for managing large data sets will not scale gracefully to meet this challenge. Next-generation facilities, like the SKA with data rates on the order of 16 TB/s, will be game changers. The growing data sets are also a challenge for scientists who need to extract the science from the data. Traditional methods of searching, frame-by-frame to look for discrepancies within the data will soon not be feasible. Exciting scientific breakthroughs to be found as astronomers manipulate and explore massive datasets, will require advanced computing capabilities and infrastructure, new algorithms, and a focus on data-intensive science.
This workshop brought together members of the astronomical, computing, and software communities to facilitate discussion and collaborations for data-intensive scientific computing, focusing on the science goals of the Astronomy and Astrophysics Decadal Survey.