Alyssa Goodman

and 10 more

Alberto Pepe

and 2 more

The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of  new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since it's creation.  Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.IntroductionThe arXiv, pronounced "archive", is the most popular preprint repository in the world.  Started in 1991 by physicist Paul Ginsparg, the arXiv allows researchers to freely share post publication-ready articles prior to formal peer review and publication. Today, the arXiv publishes over 10,000 articles each month from high-energy physics, computer science, quantitative biology, statistics, quantitative finance, and others (see Fig \ref{104668}). The early success of arXiv stems from the introduction of new technological advances paired to a well-developed culture of collaboration and sharing. Indeed, before the arXiv even existed, physicists were already physically sharing recently finished manuscripts via mail, first, and by email, later.  To understand the success of the arXiv it is important to understand the history of the arXiv. Below we highlight a brief history of technology, services, and cultural norms that predate the arXiv and were integral to its early and continued success.  The history of the arXivPrior to the arXiv, preprinting was performed by institutional repositories, such as the SPIRES-HEP database (Stanford Physics Information REtrieval System- High Energy Physics) at the Stanford Linear Accelerator Center (SLAC) and the Document Server at CERN. Developed in the early 70's, SPIRES created a bibliographic standard and centralized resource that allowed researchers across universities in high energy physics to email the database and request a list of preprints be sent to them.  Since papers themselves could not be emailed at the time, the system relied on traditional mail. This resource was immediately successful with requests numbering in the thousands within the first few years \cite{Elizalde_2017}.  While SPIRES greatly improved the flow of information, it still took weeks for articles (preprints) to be sent and received. A new typesetting system would soon emerge and change this.TeX, pronounced "tech", was developed by Donald Knuth in the late 70's as a way for researchers to write and typeset articles programmatically. Soon after the introduction of TeX, Leslie Lamport set a standard for TeX formatting, called LaTeX, which made it very easy for all researchers to professionally typeset their documents on their own.  This system made sharing papers easier and cheaper than ever before. Indeed, many, if not most, researchers at the time relied upon secretaries or typists to write their work, which then had to be photocopied in order to be sent via mail to a handful of other researchers. Tex allowed researchers to write their documents in a specified manner (binary) that could be emailed and then downloaded and compiled without the need for physical mail. Soon, physicists were emailing and downloading .tex files at great rates hastening the process of research communication like never before.Such a system immediately created a new problem for researchers: information overload. Researchers were exchanging emails containing preprints at great rates, and given the size of computer hard drives at the time, email servers were running out of space \cite{Ginsparg_2011}.  To address this problem, an automated email server, called arXiv, was set up in the early 90's. The arXiv would allow researchers to automatically request preprints via email as needed. It would soon become one of the world's first web servers and today still serves as one of the most open and efficient forms of research communication in the world.  The arXiv was a leader in introducing and utilizing new technology when it was launched, however it has arguably changed very little since its inception, despite a wealth of new technologies now available. Here we look at the strengths and weaknesses of the arXiv in an effort to identify what possible improvements can be made based on new technologies and tools and propose that a modern arXiv might in fact not look at all like the arXiv of today --- a development that will likely occur with or without arXiv.

Alberto Pepe

and 1 more

Why are scientific ideas disseminated via "papers"? Is a paper the best way to share and publish research results today? The format and function of research communication has not changed much in the last 400 years. Take any paper published this week, download it, and compare it to a digitized version of a paper from the 1600s. The two papers may differ in page layout, color, and typeface, but they are essentially identical in format - a collection of text and figures. Indeed, the fact that we refer to the mainstream outlet of research communication as "paper" speaks volume of its boundness to print.While the published format has not changed in the last 400 years, the change in published content is astronomical: a proclamation of the success of science. The discovery of molecular structure of DNA \cite{WATSON_1953}, penicillin \cite{Fleming1980}, and the formulation of general relativity \cite{Einstein_1916} are some of the biggest and most splendid scientific discoveries of all time. They were all published in a two-dimensional paper format. Even more recently, the groundbreaking discovery of gravitational waves, which earned the 2017 Nobel Prize in Physics to the leads of the LIGO collaboration, was published with a traditional paper format \cite{Abbott_2016}. LIGO's groundbreaking was certainly not analyzed on a 2D piece of paper.So, how is it possible that scientists produce and write cutting-edge "21st-century research" and still publish it in a "17th-century format"? \cite{obsolete,Pepe}Obviously, the paper format, being so enduring and persistent, has served science well. But things have changed in the last three decades. The recent explosion of content digitalization, growing internet speed and connectivity, and reliance on data, code, and computational power are leading to an unprecedented and irreversible path to changing the way we publish and disseminate research ideas. A Gutenberg-style revolution in scholarly communication is upon us, and we believe it is being pioneered by the Open Science movement. The Open Science initiative aims to make scientific research and its dissemination accessible, reproducible, and transparent. In addition to encouraging publication of research as Open Access as early as possible (the availability of preprints in subject-based repositories has moved beyond  the realm of physics), for many computational domains Open Science translates into making code and data available to everyone, and into practicing "open notebook" science. In other words: readers and reviewers must be able to understand how the authors produced the computational results, which parameters were used for the analysis, and how manipulations to these parameters affect the results. Increasingly, journals and funding agencies are mandating that researchers share their code and data when reporting on computational results based on code and data. However, even when data and code are provided by authors, and published, they are oftentimes relegated to Supplementary Information or to entirely separate platforms, disconnected from the published "full text". Since code, data, and text are not linked on a deep level, readers and reviewers are faced with barriers that hinder their ability to understand and retrace how the authors achieved a specific result. In addition, while data and code may be available in repositories external to the corresponding article \cite{Antoniol_2002}, it takes readers and reviewers considerable effort to verify the software and re-run analyses with, say, changed parameters.The idea of a multimedia, multi-dimensional, scholarly publication that defies the limitations of the 2-dimensional paper format  is not new. The publication history of the first detection of gravitational waves by the LIGO collaboration is an example of how much this is needed in scientific publishing. The discovery was reported in a series of traditional articles \cite{Abbott_2016}\cite{Abbott_2016a} but with an associated and externally hosted supplemental Jupyter notebook \cite{losc-tutoriallosc_event_tutorialmaster}. The notebook allows readers to run and tweak the code, change parameters to alter the analysis, and, in its section dedicated to the signal processing of the gravitational waves into sound, it even allows readers to play the bloop of two black holes colliding. Yet, the notebook and the multimedia elements had to reside outside the article. Why?

Alberto Pepe

and 4 more

We're in a crisis We are in the midst of an unprecedented global crisis. Just weeks since its outbreak, the Coronavirus pandemic (COVID-19) has already affected, and will continue to affect, our daily lives, around the globe, for the foreseeable future. The answers and the solutions to this crisis will come from science. But the crisis affects science, too.It affects students, educators, and researchers; not just their day-to-day lives, social ties, and work routines, but also their ability to actively collaborate, convene in face-to-face meetings, attend academic conferences, teach and learn in an open university setting, pay a visit to the library, work overnight at the laboratory, and so on.But the thing is: science cannot stop. Scientific progress must go on. For each one of the challenges that scientists face in this time of crisis, there is, or there will be, a solution. We believe that the solution is not to be found in a single technological tool, product, framework, institution, funding agency, or company. It is the global cyber-infrastructure of scientific collaboration, built on scientific rigor, intellectual curiosity, and cooperation, that will enable science to advance in such difficult times. The power of scientific collaborationAs scientists, publishers, science communicators and technologists, we believe that: a. Science is the solution to the ongoing crisis. Now more than ever, reliance on the scientific method, rigor and clarity of scientific communication, transparency, reproducibility, and seamless sharing of all research data (including negative results), are fundamental to solving this health crisis and advancing human progress.b. Global collaboration and cooperation, beyond and above national and economic interests, is necessary not only at the scientific level, but also at the political and societal level. We're more interconnected and interdependent today than ever. And such interconnectedness extends to the ecological ecosystem in which we live. A crisis of such scale requires global solidarity, bipartisan political action, civic participation, and long-term thinking.

Authorea Help

and 3 more

WHAT IS LATEX? LaTeX is a programming language that can be used for writing and typesetting documents. It is especially useful to write mathematical notation such as equations and formulae. HOW TO USE LATEX TO WRITE MATHEMATICAL NOTATION There are three ways to enter “math mode” and present a mathematical expression in LaTeX: 1. _inline_ (in the middle of a text line) 2. as an _equation_, on a separate dedicated line 3. as a full-sized inline expression (_displaystyle_) _inline_ Inline expressions occur in the middle of a sentence. To produce an inline expression, place the math expression between dollar signs ($). For example, typing $E=mc^2$ yields E = mc². _equation_ Equations are mathematical expressions that are given their own line and are centered on the page. These are usually used for important equations that deserve to be showcased on their own line or for large equations that cannot fit inline. To produce an inline expression, place the mathematical expression between the symbols \[! and \verb!\]. Typing \[x=}{2a}\] yields \[x=}{2a}\] _displaystyle_ To get full-sized inline mathematical expressions use \displaystyle. Typing I want this $\displaystyle ^{\infty} {n}$, not this $^{\infty} {n}$. yields: I want this $\displaystyle ^{\infty}{n}$, not this $^{\infty}{n}.$ SYMBOLS (IN _MATH_ MODE) The basics As discussed above math mode in LaTeX happens inside the dollar signs ($...$), inside the square brackets \[...\] and inside equation and displaystyle environments. Here’s a cheatsheet showing what is possible in a math environment: -------------------------- ----------------- --------------- _description_ _command_ _output_ addition + + subtraction - − plus or minus \pm ± multiplication (times) \times × multiplication (dot) \cdot ⋅ division symbol \div ÷ division (slash) / / simple text text infinity \infty ∞ dots 1,2,3,\ldots 1, 2, 3, … dots 1+2+3+\cdots 1 + 2 + 3 + ⋯ fraction {b} ${b}$ square root $$ nth root \sqrt[n]{x} $\sqrt[n]{x}$ exponentiation a^b ab subscript a_b ab absolute value |x| |x| natural log \ln(x) ln(x) logarithms b logab exponential function e^x=\exp(x) ex = exp(x) deg \deg(f) deg(f) degree \degree $\degree$ arcmin ^\prime ′ arcsec ^{\prime\prime} ′′ circle plus \oplus ⊕ circle times \otimes ⊗ equal = = not equal \ne ≠ less than < < less than or equal to \le ≤ greater than or equal to \ge ≥ approximately equal to \approx ≈ -------------------------- ----------------- ---------------

Alberto Pepe

and 1 more

Hello, and welcome to Authorea!👋  We're happy to have you join us on this journey towards making writing and publishing smoother, data-driven, interactive, open, and simply awesome. This document is a short guide on how to get started with Authorea, specifically how to take advantage of some of our powerful tools. Of course, feedback and questions are not only welcome, but encouraged--just hit the comment icon to the right of this text 💬  (You can also highlight specific parts of the text to leave a comment on). (Ha. That's your first lesson!).The BasicsAuthorea is a collaborative document editor built primarily for researchers. It allows you to collaboratively write in real-time in normal text, LaTeX, and Markdown all within the same document. In addition to easily writing together, each article on Authorea is a git repository, which allows you to host data, interactive figures, and code. But first, let's get started! 1. Sign up.If you're not already signed up, do so at authorea.com/signup.  Tip: if you are part of an organization, sign up with your organizational email.  2. First stepsDuring the signup process you will be asked a few questions: your location, your title, etc. You will be also prompted to join a group. Groups are awesome! They allow you to become part of a shared document workspace. Tip: during signup, join a group or create a new one for your team. Overall, we suggest you fill out your profile information to get the best possible Authorea experience and to see if any of your friends are already on the platform. If you don't do it initially during sign up, don't worry; you can always edit your user information in your settings later on.Once you've landed on your profile page (see below). There are a few things you should immediately do:Add a profile picture. You've got a great face, show it to the world :) For reference, please see Pete, our chief dog officer (CDO), below. Add personal and group information. If you haven't added any personal information, like a bio, a group affiliation, or your location, do it! You might find some people at your organization already part of Authorea, plus it is a great way to build your online footprint, which is always good for getting jobs.Invite your colleagues. Click here to invite contacts from your Gmail. You'll get extra private documents in your account and you'll make Pete very happy!

Alberto Pepe

and 1 more

INTRODUCTION In the early 1600s, Galileo Galilei turned a telescope toward Jupiter. In his log book each night, he drew to-scale schematic diagrams of Jupiter and some oddly-moving points of light near it. Galileo labeled each drawing with the date. Eventually he used his observations to conclude that the Earth orbits the Sun, just as the four Galilean moons orbit Jupiter. History shows Galileo to be much more than an astronomical hero, though. His clear and careful record keeping and publication style not only let Galileo understand the Solar System, it continues to let _anyone_ understand _how_ Galileo did it. Galileo’s notes directly integrated his DATA (drawings of Jupiter and its moons), key METADATA (timing of each observation, weather, telescope properties), and TEXT (descriptions of methods, analysis, and conclusions). Critically, when Galileo included the information from those notes in _Siderius Nuncius_ , this integration of text, data and metadata was preserved, as shown in Figure 1. Galileo's work advanced the "Scientific Revolution," and his approach to observation and analysis contributed significantly to the shaping of today's modern "Scientific Method" . Today most research projects are considered complete when a journal article based on the analysis has been written and published. Trouble is, unlike Galileo's report in _Siderius Nuncius_, the amount of real data and data description in modern publications is almost never sufficient to repeat or even statistically verify a study being presented. Worse, researchers wishing to build upon and extend work presented in the literature often have trouble recovering data associated with an article after it has been published. More often than scientists would like to admit, they cannot even recover the data associated with their own published works. Complicating the modern situation, the words "data" and "analysis" have a wider variety of definitions today than at the time of Galileo. Theoretical investigations can create large "data" sets through simulations (e.g. The Millennium Simulation Project). Large scale data collection often takes place as a community-wide effort (e.g. The Human Genome project), which leads to gigantic online "databases" (organized collections of data). Computers are so essential in simulations, and in the processing of experimental and observational data, that it is also often hard to draw a dividing line between "data" and "analysis" (or "code") when discussing the care and feeding of "data." Sometimes, a copy of the code used to create or process data is so essential to the use of those data that the code should almost be thought of as part of the "metadata" description of the data. Other times, the code used in a scientific study is more separable from the data, but even then, many preservation and sharing principles apply to code just as well as they do to data. So how do we go about caring for and feeding data? Extra work, no doubt, is associated with nurturing your data, but care up front will save time and increase insight later. Even though a growing number of researchers, especially in large collaborations, know that conducting research with sharing and reuse in mind is essential, it still requires a paradigm shift. Most people are still motivated by piling up publications and by getting to the next one as soon as possible. But, the more we scientists find ourselves wishing we had access to extant but now unfindable data , the more we will realize why bad data management is bad for science. How can we improve? THIS ARTICLE OFFERS A SHORT GUIDE TO THE STEPS SCIENTISTS CAN TAKE TO ENSURE THAT THEIR DATA AND ASSOCIATED ANALYSES CONTINUE TO BE OF VALUE AND TO BE RECOGNIZED. In just the past few years, hundreds of scholarly papers and reports have been written on questions of data sharing, data provenance, research reproducibility, licensing, attribution, privacy, and more--but our goal here is _not_ to review that literature. Instead, we present a short guide intended for researchers who want to know why it is important to "care for and feed" data, with some practical advice on how to do that. The set of Appendices at the close of this work offer links to the types of services referred to throughout the text. BOLDFACE LETTERING below highlights actions one can take to follow the suggested rules.

Alyssa Goodman

and 10 more

ABSTRACT The very long, thin infrared dark cloud Nessie is even longer than had been previously claimed, and an analysis of its Galactic location suggests that it lies directly in the Milky Way’s mid-plane, tracing out a highly elongated bone-like feature within the prominent Scutum-Centaurus spiral arm. Re-analysis of mid-infrared imagery from the Spitzer Space Telescope shows that this IRDC is at least 2, and possibly as many as 8 times longer than had originally been claimed by Nessie’s discoverers, ; its aspect ratio is therefore at least 150:1, and possibly as large as 800:1. A careful accounting for both the Sun’s offset from the Galactic plane (∼25 pc) and the Galactic center’s offset from the (lII, bII)=(0, 0) position defined by the IAU in 1959 shows that the latitude of the true Galactic mid-plane at the 3.1 kpc distance to the Scutum-Centaurus Arm is not b = 0, but instead closer to b = −0.5, which is the latitude of Nessie to within a few pc. Apparently, Nessie lies _in_ the Galactic mid-plane. An analysis of the radial velocities of low-density (CO) and high-density (${\rm NH}_3$) gas associated with the Nessie dust feature suggests that Nessie runs along the Scutum-Centaurus Arm in position-position-velocity space, which means it likely forms a dense ‘spine’ of the arm in real space as well. No galaxy-scale simulation to date has the spatial resolution to predict a Nessie-like feature, but extant simulations do suggest that highly elongated over-dense filaments should be associated with a galaxy’s spiral arms. Nessie is situated in the closest major spiral arm to the Sun toward the inner Galaxy, and appears almost perpendicular to our line of sight, making it the easiest feature of its kind to detect from our location (a shadow of an Arm’s bone, illuminated by the Galaxy beyond). Although the Sun’s (∼25 pc) offset from the Galactic plane is not large in comparison with the half-thickness of the plane as traced by Population I objects such as GMCs and HII regions (∼200 pc; ), it may be significant compared with an extremely thin layer that might be traced out by Nessie-like “bones” of the Milky Way. Future high-resolution extinction and molecular line data may therefore allow us to exploit the Sun’s position above the plane to gain a (very foreshortened) view “from above" of dense gas in Milky Way’s disk and its structure.