Alberto Pepe

and 2 more

The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of  new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since it's creation.  Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.IntroductionThe arXiv, pronounced "archive", is the most popular preprint repository in the world.  Started in 1991 by physicist Paul Ginsparg, the arXiv allows researchers to freely share post publication-ready articles prior to formal peer review and publication. Today, the arXiv publishes over 10,000 articles each month from high-energy physics, computer science, quantitative biology, statistics, quantitative finance, and others (see Fig \ref{104668}). The early success of arXiv stems from the introduction of new technological advances paired to a well-developed culture of collaboration and sharing. Indeed, before the arXiv even existed, physicists were already physically sharing recently finished manuscripts via mail, first, and by email, later.  To understand the success of the arXiv it is important to understand the history of the arXiv. Below we highlight a brief history of technology, services, and cultural norms that predate the arXiv and were integral to its early and continued success.  The history of the arXivPrior to the arXiv, preprinting was performed by institutional repositories, such as the SPIRES-HEP database (Stanford Physics Information REtrieval System- High Energy Physics) at the Stanford Linear Accelerator Center (SLAC) and the Document Server at CERN. Developed in the early 70's, SPIRES created a bibliographic standard and centralized resource that allowed researchers across universities in high energy physics to email the database and request a list of preprints be sent to them.  Since papers themselves could not be emailed at the time, the system relied on traditional mail. This resource was immediately successful with requests numbering in the thousands within the first few years \cite{Elizalde_2017}.  While SPIRES greatly improved the flow of information, it still took weeks for articles (preprints) to be sent and received. A new typesetting system would soon emerge and change this.TeX, pronounced "tech", was developed by Donald Knuth in the late 70's as a way for researchers to write and typeset articles programmatically. Soon after the introduction of TeX, Leslie Lamport set a standard for TeX formatting, called LaTeX, which made it very easy for all researchers to professionally typeset their documents on their own.  This system made sharing papers easier and cheaper than ever before. Indeed, many, if not most, researchers at the time relied upon secretaries or typists to write their work, which then had to be photocopied in order to be sent via mail to a handful of other researchers. Tex allowed researchers to write their documents in a specified manner (binary) that could be emailed and then downloaded and compiled without the need for physical mail. Soon, physicists were emailing and downloading .tex files at great rates hastening the process of research communication like never before.Such a system immediately created a new problem for researchers: information overload. Researchers were exchanging emails containing preprints at great rates, and given the size of computer hard drives at the time, email servers were running out of space \cite{Ginsparg_2011}.  To address this problem, an automated email server, called arXiv, was set up in the early 90's. The arXiv would allow researchers to automatically request preprints via email as needed. It would soon become one of the world's first web servers and today still serves as one of the most open and efficient forms of research communication in the world.  The arXiv was a leader in introducing and utilizing new technology when it was launched, however it has arguably changed very little since its inception, despite a wealth of new technologies now available. Here we look at the strengths and weaknesses of the arXiv in an effort to identify what possible improvements can be made based on new technologies and tools and propose that a modern arXiv might in fact not look at all like the arXiv of today --- a development that will likely occur with or without arXiv.

Roland Lindh

and 1 more

The subject paper\cite{Aquilante_2015} is the 5th paper in a series of papers\cite{Aquilante_2012,Roos_1990,Roos_1991,Veryazov_2004,Aquilante_2010} on the development of the MOLCAS program package.  In this short back story I will try to put the MOLCAS quantum chemistry program package into a brief historical context, shortly describe its development, and finally argue the case why papers like the subject paper is needed. The Molcas project started in 1989 by the theoretical chemistry group of  the late Prof. Björn Roos (see Figure 1) at Lund University, Sweden. The Swedish government struck a deal with the banks -- no increase of tax if they supported research. As a consequence of this the Molcas project materialised as a collaboration between IBM and the research group in Lund. Swedish theoretical chemistry had made a serious impart on the ab inition field at the time with contributions from researchers as Jan Almlöf, Per. E. M. Siegbahn, and Björn Roos - the former two the first Ph.D. students of the latter at Stockholm University. During this time the three of them had developed specialized software. Jan Almöf developed the Molecule\cite{Taylor_2017} program (computation of two-electron integrals), Per E. M. Siegbahn the MRCI code\cite{Siegbahn_1992,Roos_1977} (multi reference configuration interaction), and Björn O. Roos developed the CASSCF program\cite{Roos_1980} (complete active space self-consistent field). The goal of the Molcas project was to bring these pieces of software together in a single package designed for the IBM 3090 machine. Version 1.0 was distributed to the public in the late 1989.  Subsequent versions were released 1991, 1993, 1997, 2000, 2003, 2007, and 2014, covering version 2-8. All versions have been commercial versions.Today the package support multiple options and methods, and several hardware and software platforms. In 2005 the project started the "Molcas users' workshops" with the most recent workshop, the 8th, taking place in Uppsala November 2017.  Over the time and under the leadership of Björn Roos the project have had several success stories which have been seminal to the field. Let us mention two here, the complete active space 2nd order perturbation theory model\cite{Andersson_1993} and the complete active space state interaction\cite{Malmqvist_1989} method. From the formation of the project until about 2010 the project was mainly a project which was heavenly dominated by the Lund group, especially with respect to the leadership and strategic decisions,  however, with significant programming contributions from international collaborators. During 2009 Björn Roos retired from the project due to poor health\cite{Siegbahn_2011}, the baton was passed on to the long time Molcas co-developer Roland Lindh. Starting in Zürich 2013 the first "Molcas developers' workshop" took place. This has been followed by annual workshops at Alcalá (Spain), Siena (Italy), Vienna (Austria), Jerusalem (Israel) and this year at Leuven (Belgium). During the same time the project have developed from a national Swedish project -- dominated by a single Swedish research group -- to an international project with 30-40 active developers from some 10 different universities and institutes. The authors list of the subject paper is a testament to this development. In 2017 the project went open-source having the most significant part being released under the "Lesser GLP" license and is now distributed free of charge under the name of OpenMolcas.The subject paper was written on the request of the developers after one of our developers' workshops. People argued that a single paper, including the most recent developments, would be needed to make new developments and implementations known to the computational chemistry community. Additionally, the issue of lack of recognition and credits for software development was mentioned as one of the most important reasons for the need of a paper like the subject paper -- in many aspects a mini-review paper with no novel contributions. Normally hard-working software developers seldom get proper credits for their work, although it can and is fundamental to the ability to perform accurate quantum chemical simulations. In particular if this development is not associated with new wave-function models. Some of us, like me, contribute with significant software and methods, which are completely instrumental for the calculations, but hardly ever get any credit for this contribution. Let me give an example, I'll use my own contribution, two-electron integrals \cite{Lindh_1991} (since long also a part of MOLPRO), which without no calculations with the package would be possible,  as an example. Since its publication in 1991 this paper, on the computation of two-electron integrals, has, according to Google Scholar, attracted 258 citations. In the same time the two packages have attracted 7249 citation -- the use of the two-electron code was surely significant to the research the citations corresponds to but they handed credit to the developer in less than 3.6% of the time. If I would have designed the basis set, however, I would have been assured the full 7249 citations -- we always cited the basis sets but hardly ever how we efficiently compute the matrix elements they generate. There are several other developments and features in a quantum chemistry package which are not considered worthy citations but are still as essential to calculations. Here comes the paper, as the subject paper, in as an equalizer and makes sure that all developers of a package gets the credit and respect they deserve. With these type of papers around we kill two flies with one stone -- we reduce the number of references to theoretical papers and at the same time make sure that all developers get the recognition they all deserve and need.

Josh Nicholson

and 1 more

Research is really f**king important.  This statement is almost self-evident by the fact that you're reading this online.  From research has come the web, life-saving vaccines, pasteurization, and countless other advancements. In other words, you can look at cat gifs all day because of research, you're alive because of research, and you can safely add milk to your coffee or tea without contracting some disease, because of research. But how research is done today is being stymied by how it is being communicated.  Most research is locked behind expensive paywalls \cite{Bj_rk_2010}, is not communicated to the public or scientific community until months or years after the experiments are done \cite{trickydoi}, is biased in how it is reported - only "positive" results are typically published \cite{Ahmed_2012}, does not supply the underlying data to major studies \cite{Alsheikh_Ali_2011}, and has been found to be irreproducible at alarming rates \cite{Begley_2012}.Why is science communication so broken?Many would blame the fault of old profit-hungry publishers, like Elsevier, and in many respects, that blame is deserved. However, here's a different hypothesis: what is holding us back from a real shift in the research communication industry is not Elsevier, it's Microsoft Word. Yes, Word, the same application that introduced us to Clippy is the real impediment to effective communication in research.Today, researchers are judged by their publications, both in terms of quantity and prestige.  Accordingly, researchers write up their documents and send them to the most prestigious journals they think they can publish in.  The journals, owned by large multinational corporations, charge researchers to publish their work and then again charge institutions to subscribe to the content. Such subscriptions can run into the many millions of dollars per year per institution \cite{Lawson_2015} with individual access costing $30-60 per article.The system and process for publishing and disseminating research is inimical to scientific advancement and accordingly Open Access and Open Science movements have made big steps towards improving how research is disseminated. Recently, Germany, Peru, and Taiwan have boycotted subscriptions to Elsevier \cite{Schiermeier_2016} and an ongoing boycott to publish or review for certain publishers has accumulated the signatures of 16,493 researchers and counting.  New developments such as Sci-hub, have helped to make research accessible, albeit illegally.  While regarded as a victory by many, the Sci-hub approach is not the solution that researchers are hoping for as it is built on an illegal system of exchanging copyrighted content and bypassing publisher paywalls \cite{Priego}.  The most interesting technologist view of the matter is that the real culprit for keeping science closed isn't actually the oligopoly of publishers \cite{Larivi_re_2015}-- after all, they're for-profit companies trying to run businesses and they're entitled to do any legal thing that helps them deliver value to shareholders. We suggest that a concrete solution for true open access is already out there and it's 100% legal.What is the best solution to truly and legally open access to research?The solution is publishing preprints -- the last version of a paper that belongs to an author before it is submitted to a journal for peer review. Unlike other industries (e.g. literature, music, film, etc.), in research, the preprint version copyright is legally held by the author, even after publication of the work in a journal.Pre-prints are rapidly gaining adoption in the scientific community, with a couple of preprint servers (e.g. arXiv which is run by Cornell University and is primarily for physics papers, and bioRxiv which is similarly for biology papers) receiving thousands of preprints per month.Some of the multinationals are responding with threats against authors not to publish (or post) preprints. However they are being met with fierce opposition from the scientific community, and the tide seems to be turning. Multinationals are now under immense pressure not just from authors in the scientific community, but increasingly from the sources of public and private funding for the actual research. Some organizations are even mandating preprints as a condition of funding. But what is holding back preprints and in general a better way for Authors to have more control of their research?We think the inability for scientists to independently produce and disseminate their work is a major impediment and at the heart of that of that problem is how scientists write. How can Microsoft Word harm scientific communication?Whereas other industries, like the music industry, have been radically transformed and accelerated by providing creators with powerful tools like Youtube, there is no parallel in research.  Researchers are reliant upon publishers to get their ideas out and because of this, they are forced into an antiquated system that has remained largely stagnant since it's inception over 350 years ago.Whereas a minority of researchers in math-heavy disciplines write using typesetting formats like LaTeX, the large majority of researchers (~82%) write their documents in Microsoft Word \cite{brischoux2009don}. Word is easy to use for basic editing but is essentially incompatible with online publishing. Word was created for the personal computer: offline, single-author use. Also, it was not built with scientific research in mind - as such, it lacks support for complex objects like tables and math, data, and code. All in all, Word is extraordinarily feature-poor compared to what we can accomplish today with an online collaborative platform. Because publishers have traditionally accepted manuscripts formatted in Word, and because they consistently fail to truly innovate from a technological standpoint, millions of researchers find themselves using Word. In turn, the research they publish is non-discoverable on the web, data-less, non-actionable, not reusable and, most likely, behind a paywall.  What does the scientific communication ecosystem of the future look like?What is needed is a web-first solution. Research articles should be available on distinct web pages, Wikipedia style. Real data should live underneath the tables and figures. Research needs to finally be machine readable (instead of just tagged with keywords) so that it may be found and processed by search engines and machines. Modern research also deserves to have rich media enhancement -- visualizations, videos, and other forms of rich data in the document itself.All told, researchers need to be able to disseminate their ideas in a web first world, while playing the "Journal game" as long as it exists. Our particular dream (www.authorea.com) is to construct a democratic platform for scientific research -- a vast organizational space for scientists to read and contribute cutting edge science. There is a new class of startups out there doing similar things with the research cycle, and we feel like there is a real and urgent demand for solutions right now in research.
I was accepted into the cell biology program at Virginia Tech under conditional terms due to a mediocre undergraduate GPA. This was the deal: maintain good grades and I’d get to continue, or slip up and I was out. As an undergraduate, I spent a lot of time surfing and very little time  cramming for tests -- what can I say? I wasn’t exactly a traditional grad student applicant.Despite my shortcomings on paper, I was ambitious. Before grad school, I contacted a researcher from Harvard who’d proposed through mathematical models that we could kill cancer cells with cancer cells \cite{Deisboeck_2008}.  I told him I wanted to test his proposal experimentally. When he wrote back and I brought the proposal to my potential PI, I quickly realized that incoming grad students don’t actually do this. You’re supposed to go through rotations first, and then select a lab, pick a project that falls within the scope of your PI’s research, and so on. This wasn’t exactly my style.The deeper I got into my PhD, the more I realized the game you have to play in order to be successful: publish in certain journals, publish with the best coauthors you can manage, publish as much as you can. I played the game and published as much as possible within the scope of cell biology and cancer, but also papers within the scope of the scientific communication process itself -- papers on funding, peer review in high-impact journals, and peer review at the NIH. I wrote about cancer but I also wrote about all the problems I was seeing around me in the process itself.I never thought about actually doing anything about these systemic problems until I read The Trouble with Medical Journals \cite{Smith_2006}. The key tenet  -- that peer review misses most major errors -- is the idea that sent me down the path of building a publishing company to take the whole publishing process and flip it in favor of openness. Instead of filtering results and then publishing, I wanted scientists first to publish and then to filter -- to publish and then winnow, so to speak. That’s why the Winnower was born.From Scientist to EntrepreneurI didn’t know anything about starting a business but I knew I needed some money to do it.  I wrote up some ideas for a new publication, entered a business contest on campus, and lost. It was harder than I thought.  But then I sent that proposal to some people I knew from undergrad and through a lot of luck managed to get 50k from a private investor.The Winnower launched in May 2014 and over the course or two years we shifted away from publishing traditional papers to publishing so-called grey literature -- informal documents that traditional publishers ignore.  We published scholarly reddit AMAs, foldscope images, responses to NIH RFIs, journal clubs, and some of the coolest essays I’ve ever read. We formalized blogs and journal clubs so that they could act as reviewers themselves. People liked what we were doing, as judged by the growth of publications and readership. Why shouldn’t reddit AMA’s have DOIs and be given real scientific consideration? I gave talks around the world, raised more money, and met other academics doing similar things with their own companies. I felt lucky and privileged to be doing what I loved, despite the fact I was making less now running a company than I was as a grad student. The End.Okay, the story doesn’t have an ending yet because the story is still ongoing.  Very recently The Winnower was acquired by Authorea, another early company working on the same problem but from a different direction. Authorea, which is also founded by former academics, is fixing how researchers write, collaborate, and share online.  Together we’re working to become the place where researchers can write and publish whatever they want collaboratively and online. It’s an ambitious goal but so too was cancer research.I can’t say if we’ll achieve our goals and I know the road ahead is still daunting but I think the problems we’re working to solve are as hard as some of the most complex problems in science. What is certainly true is that we must work collaboratively to solve them.  I hope this essay inspires more academics to follow their own “crazy” ideas and I hope you’ll stand with our mission to build a more transparent system of research communication.  Let’s get it right.
Up-goer Five EntryThe body is made up of lots of cells and cells have sticks that make them what they are. Normal cells have a normal number of cell sticks. Bad cells, which can kill a body, have the wrong number of cell sticks. More cell sticks in a cell can cause cell problems that can lead to even more cells sticks in a cell. When this happens more cells take over a body and kill it. Some cell sticks when added cause new changes to happen in a cell as well that can lead to a problem when all cell sticks have to be given to new cells.  Peer-Reviewed PublicationCancer cells display aneuploid karyotypes and typically mis-segregate chromosomes at high rates, a phenotype referred to as chromosomal instability (CIN). To test the effects of aneuploidy on chromosome segregation and other mitotic phenotypes we used the colorectal cancer cell line DLD1 (2n = 46) and two variants with trisomy 7 or 13 (DLD1+7 and DLD1+13), as well as euploid and trisomy 13 amniocytes (AF and AF+13). We found that trisomic cells displayed higher rates of chromosome mis-segregation compared to their euploid counterparts. Furthermore, cells with trisomy 13 displayed a distinctive cytokinesis failure phenotype. We showed that up-regulation of SPG20 expression, brought about by trisomy 13 in DLD1+13 and AF+13 cells, is sufficient for the cytokinesis failure phenotype. Overall, our study shows that aneuploidy can induce chromosome mis-segregation. Moreover, we identified a trisomy 13-specific mitotic phenotype that is driven by up-regulation of a gene encoded on the aneuploid chromosome. From \cite{Nicholson_2015}.
At the foundation of research is data.  The papers we write and the figures we make revolve around it and it is what we spend countless hours collecting. And yet, most raw data remains absent from major studies \cite{Alsheikh_Ali_2011}.  This is a problem that has received much attention the past few weeks,  with preliminary findings being released from the Cancer Reproducibility Project, a large multi-year effort to see how robust top cancer studies are \cite{2017}.  Like previous studies in psychology \cite{2015} and cancer \cite{Begley_2012}, the findings from the reproducibility project, that a large percentage of findings are irreproducible or at least very difficult to reproduce raise serious questions and doubts about how we conduct and communicate our research.Authorea was founded to reinvent the research article so that it is data-rich, interactive, transparent, and replicable.  Not only did we want to make Authorea a place where researchers could collaborate easier and communicate their results more quickly, we also wanted to make sure that the data behind the study could be easily shared.  This is why each article on Authorea is a repository in itself that allows you to host data directly within your article.  We enabled integrations with Jupyter notebooks and various data visualization tools not just to make the document more aesthetically pleasing, but to make it easier to analyze each other's work.  A quote in The Atlantic summarized one problem we're working to fix quite well:"If people had deposited raw data and full protocols at the time of publication, we wouldn’t have to go back to the original authors," says Iorns. That would make it much easier for scientists to truly check each other’s work.- The AtlanticWe believe that static snapshots of research living in PDFs behind paywalls are inimical to the advancement of research and the findings from the various efforts looking at reproducibility in research support this.  Authorea is first and foremost a modern collaborative editor--we want to make it easy to write your work and utilize the power of the web-- but we're much more than this, with preprint capabilities (DOIs coming soon), direct submissions to journals, and data hosting, we are working to make research communication more robust on numerous levels. Why should the most important documents in the world be shared and disseminated so poorly? They don't have to be and in fact, we're seeing encouraging signs that the next generation of researchers will do it differently.The following are just a few student papers all utilizing open data sets and analyses on Authorea.Analysis of ground-level ozone formation and its correlation with concentration of other pollutants and weather elementsVision Zero Crash Data AnalysisWe hope you'll join us and write your next paper with us.  How we make research more robust as a community starts with us as individuals.