The core tools of science (data, software, and computers) are undergoing a rapid and historic evolution, changing what questions scientists ask and how they find answers. Earth science data are being transformed into new formats optimized for cloud storage that enable rapid analysis of multi-petabyte datasets. Datasets are moving from archive centers to vast cloud data storage, adjacent to massive server farms. Open source cloud-based data science platforms, accessed through a web-browser window, are enabling advanced, collaborative, interdisciplinary science to be performed wherever scientists can connect to the internet. Specialized software and hardware for machine learning and artificial intelligence (AI/ML) are being integrated into data science platforms, making them more accessible to average scientists. Increasing amounts of data and computational power in the cloud are unlocking new approaches for data-driven discovery. For the first time, it is truly feasible for scientists to bring their analysis to data in the cloud without specialized cloud computing knowledge. This shift in paradigm has the potential to lower the threshold for entry, expand the science community, and increase opportunities for collaboration while promoting scientific innovation, transparency, and reproducibility. Yet, we have all witnessed promising new tools which seem harmless and beneficial at the outset become damaging or limiting. What do we need to consider as this new way of doing science is evolving?

Ryan McGranaghan

and 4 more

Data science refers to the set of tools, technologies, and teams that alter the paradigm by which data are collected, managed and analyzed. Data science is, therefore, decidedly broader than ‘machine learning,’ and includes instead the full data lifecycle. Never has the need for effective data science innovation been greater than now when at every turn data-driven discovery is both burdened and invigorated by the growth of data volumes, varieties, veracities, and velocities. This growing scale of science requires dramatic shifts in collaborative research, requiring projects to climb the gradations of collaboration from unidisciplinary, to multi-, inter-, and transdisciplinary (Figure 1, [Hall et al., 2014; NRC, 2015]), and perhaps even to an entirely new level that defies any traditional boundary, or antidisciplinary (https://joi.ito.com/weblog/2014/10/02/antidisciplinar.html). We will discuss the cutting-edge efforts advancing collaborative research in Space Physics and Aeronomy, highlight progress, and synthesize the lessons to provide a vision for future innovation in data science for Heliophysics. We will specifically focus on three trail-blazing initiatives: 1) the NASA Frontier Development Laboratory; 2) the HelioAnalytics group at the Goddard Space Flight Center in cooperation with the NASA Jet Propulsion Laboratory’s Data Science Working Group; and 3) an International Space Sciences Institute project. References: Hall, K.L., Stipelman, B., Vogel, A.L., Huang, G., and Dathe, M. (2014). Enhancing the Ef- fectiveness of Team-based Research: A Dynamic Multi-level Systems Map of Integral Factors in Team Science. Presented at the Fifth Annual Science of Team Science Confer- ence, August, Austin, TX. NRC (National Research Council) (2015). Enhancing the Effectiveness of Team Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/19007.