Goal: To create a set of open source tools that can be easily used by individual researchers, publishers, and scholarly institutions to create web-native, data-rich, JATS-compliant research documents.
Authorea, an organization focused specifically on research document creation and conversion, has experience in every mainstream format for academic writing (LaTeX, Word, MarkDown, and RichText) and has built a system supporting many combinations of import-to-export conversion flows. The core conversion tasks of Authorea happen at automatic import or submission of a document. At the same time, the document remains accessible to editors, typesetters, and most importantly, authors, during the main writing flow. Edits, comments and suggestions can be made easily without any knowledge of XML or the underlying document conversion process. We seek to develop and maintain the following open source components which would allow individuals, publishers, and scholarly organizations to take advantage of and build upon the in-house technology of Authorea -- writing scholarly research documents that are web-native, machine readable, discoverable, and data-rich.
An Open API
Authorea seeks funding to develop a RESTful API with OAuth 2 authentication, which will allow any organization or individual to extract content from, and deposit into, Authorea in a variety of formats. Development of the API will allow individual academics and organizations to use Authorea as a method to convert documents in a many-to-many array of formats, optionally styled to thousands of vendor styles or custom form factors. For example, bioRxiv would be able to utilize an open-source API to completely run their submission and publication process via Authorea, in a manner indistinguishable from that of a leading academic publisher. While the API would be the engine of conversion and publication, bioRxiv could build atop any peer review, and metrics they wish.
A toolkit for converting Authorea documents into JATS-compliant XML
Authorea seeks funding to develop and refine the conversion process of its documents into publisher-grade JATS XML. In combination with opening our ingestion system API and offering an open specification of our internal formats, the implied boost will benefit the entire ecosystem of academic representations. We offer to effectively enable reliable and exhaustive conversion from any of LaTeX/Markdown/Word into JATS XML, and additionally offer automated pipelines for interacting with web vendors that interoperate via these languages.
Authorea is currently focused on making it easier for researchers to write and collaborate on documents online within the current publishing paradigm. As such, our end users have no direct need for JATS-XML and we have not devoted resources towards enabling this capacity. That said, JATS-XML conversion is readily achievable by Authorea, and opening up that capacity will have significant payoff for preprint servers, publishers, and institutional repositories. Funding will be used to develop this process for near-exhaustive format coverage, provide a stress test suite for quality assurance, and bootstrap an API endpoint for the JATS output target.
An open specification for data-driven scholarly writing
The fundamental enabling technology of Authorea comprises a set of guidelines and conventions for organizing, managing and representing scholarly works and their editing lifecycle. We seek funding to formalize and open these best practices as a cohesive specification for scholarly writing, enabling all of: collaborative writing; powerful version control and history management; academic import/export capabilities; and continuous integration and synchronization with third-party services. As Authorea already aggregates many best-in-class techniques from the state of art, such as the git model of version control, and uses exclusively open formats for its internal representations, we will specify scholarly writing as a logical building block of the open scholarly stack. The specification will cover file system organization, document and auxiliary representations, common workflows and best practices.