AbstractLANGA is a game-based platform for second language learning. LANGA is first of its kind in that it blends a research-driven approach to teaching with advanced animation to make learning fun and engaging. Here, we present a detailed description of the architecure of LANGA and empirical evidence for its effectiveness.IntroductionThe technological revolution is paving new ways to deliver education, holding the promise to make it more accessible and to be centered around the strengths and weaknesses of each individual user. In the field of second language (L2) learning, computer and smartphone-based applications are now gaining momentum as a form of support — or as an alternative — to more conventional ways of L2 teaching such as periods of immersion or classroom-based learning. These technologies — which we refer to as L2 apps — are extremely popular; for example as of March, 2016 the L2 app Duolingo reported having 100 million users, and being the most-downloaded educational app for the iOS platform ({}) . Many L2 apps feature recent technological advances such as animation, and use game mechanics to make interaction fun and engaging. However, these products differ widely in terms of teaching philosophy, as reflected in how the material is delivered and how the learner interacts with the software. Despite the growing popularity of these applications there are still many unresolved issues concerning their theoretical foundations and, most importantly, their efficacy — issues shared with related domains, such as educational apps, cognitive training, and rehabilitation (e.g., Hirsh-Pasek 2015, Simons 2016; also {} and {}). First of all, products in the education market are not always designed and validated using scientific evidence; further, there are no accepted standards for empirical evidence of efficacy as there are for drugs or medical devices (Better reading throug..., Goswami 2006). In fact, in most cases there are no empirical studies available to back up the claims made by the developers of educational apps Hirsh-Pasek 2015. Thus even when research (either independent or conducted in-house by developers) is available, the extent of this may not be sufficient to fully address questions of efficacy. We are not aware of any effectiveness study published in a peer-reviewed journal.This is not to say that available L2 apps are not supported by empirical evidence, as many report in-house research and some even make research reports available on their Web sites (e.g., {}). Moreover, use of many L2 apps suggests that they are at least partially based on principles shown by previous empirical research to be effective in L2 teaching (e.g., spaced repetition; Pimsleur 1967). For example, Vesselinov R.Vesselinov 2009, R.Vesselinov 2012, tested the effectiveness of two popular L2 apps (RosettaStone and Duolingo) in teaching basic knowledge of Spanish. In one study R.Vesselinov 2009, participants used the software for a total of 55 hours, a period that is equivalent to the amount of time a typical semester-long beginner’s class would last. Although the study reported statistically significant improvements in proficiency at the group level, by-subject analysis of performance showed no change for 36% of users, suggesting high individual variability in learning outcomes using the app. In a second study, data were collected from online users whose total usage Notably, the report did not provide any indication about the training intensity, i.e. how many days/week and hours/day subjects were expected to use the software. This makes interpretation of results difficult, since intensity is a strong predictor for rate of learning and retention of knowledge. In this study, the primary outcome measure was the change in spoken proficiency from pre- to post-training. Although the study reported statistically significant improvements in proficiency at the group level, by-subject analysis of performance showed no change for 36% of users.A likely explanation for this result is that individuals’ ability to learn a new language is influenced by a multitude of factors (citation needed). For example, people receiving the same training and amount of exposure to a given language might differ significantly in how fast they learn, how long they retain the material, how well they respond to a specific type of training etc. Cognitive neuroscience research has provided great contributions in support of the view that individual differences (IDs) in socioeconomic status, level of education, age, lifestyle, motivation, engagement with the task (citations needed), and cognitive skills (i.e. working memory (WM) span, executive attention, verbal memory etc.) (citations needed) predict the extent to which people improve as a result of training. Notably, some of the above-mentioned factors such as motivation were not controlled for in the study conducted by Vesselinov R.Vesselinov 2009. While any single study cannot account for all possible variables, interpretation of the scant data available is difficult since it is not clear to what extent the changes in performance observed in a subgroup of subjects are attributable to the use of the software, or simply to participants’ motivation to learn a second language. The latter hypothesis is supported by results of the second study R.Vesselinov 2012 conducted by the same researcher, testing the effectiveness of Duolingo. In this study data were collected through the online accounts of a pool of Duolingo clients who gave consent to participate. Although in this case the time spent using the software between pre- and post-training was not the same among participants, it was estimated that using Duolingo for a period of time between 26 and 49 hours would be sufficient to observe statistically significant gains, and that motivation and baseline level of Spanish proficiency predicted between-subjects differences in change in performance. Collectively, the studies R.Vesselinov 2012, R.Vesselinov 2009 are of particular relevance considering that they currently represent the only instances of empirical investigation of computer-based second-language training platforms. Nonetheless, their limitations point to the importance of a careful study design in terms of choosing appropriate control groups and measures of a comprehensive set of factors to adequately parse out the effect of training from other confounding variables. Furthermore, one of the ultimate goals of research in second language acquisition is to develop accurate models of how IDs in social and cognitive factors predict optimal ways to learn a second language for each individual in terms of training strategy, intensity, type of content etc.. This would allow us to move from the traditional “one-size fits all” approach to a personalized one that optimally adapts to the unique strengths and weaknesses of each individual. Such an approach requires large-scale studies (hundreds of subjects) involving people with the most diverse background and cognitive skills, two conditions that are hardly met by traditional lab-based research for obvious reasons. Instead, web-based applications can be a game-changer and represent an ideal solution to this problem since they can reach out virtually everyone with an internet connection with no cost, removing physical barriers due to transportation and speeding-up the research process.These considerations have largely inspired our approach to the design of LANGA (LANguage GAming), a language learning platform founded on five principles:Use of compelling video-games that make training fun and engagingUse of an advanced SRE system to provide real-time feedback about pronunciationServe as a powerful research tool that, operating through the web, can reach a vast and heterogenous population that cannot be accessed with traditional lab-based experimentsCollect behavioral and neurophysiological data, at every stage of learning, in order to systematically test the effectiveness of current versions and inform the design of the next generation of training platformsAllow researchers to easily manipulate the main parameters of the training tasks in order to quickly test specific hypothesis about efficacy.LANGA is the result of a collaboration between our laboratory and a private company, Copernicus Studios Inc.. We have thus combined the strengths of our lab in empirical behavioural and cognitive neuroscientific work on language processing with the company’s strengths in animation and game design. Our goal is to produce a language learning software platform that is effective and enjoyable for learners, but that also houses a flexible “back end” that will allow teachers and researchers to customize the learning process for groups of people, and empirically test hypotheses concerning language learning to generate better understanding of the mechanisms leading to successful L2 acquisition. In the remainder of this paper we discuss the design principles and architecture of LANGA, and provide details of one example study used to assess the efficacy of some games implemented on the platform.The architectureIn this section we describe the high-level organization of LANGA. Details about the specific components (i.e. the grammars, games, lessons etc.) and how they relate to each other are presented in the following subsections. The tasks executed by the user are made of three main building blocks: content, games and curricula. Content refers to the material being taught in the new language – including vocabulary, sounds, and grammatical structures as well as the associated media (artwork and sound files). The games define, intuitively, the set of rules through which the user interacts with the content in any given trial. The curricula are the higher-order mesh that connect games and content – including the selection of what content to teach in what order, the combination and sequencing of vocabulary, pronunciation, and grammar training, and the duration and frequency of training sessions (dose). As mentioned before, one of the aims of LANGA is to allow researchers with little experience with software engineering to easily test novel hypotheses. To do that, researchers must be able to quickly build and deploy new content and curricula and to be able to visualize data analytics performed on the behavioral and neurophysiological data to validate their hypothesis. The Learning Management System (LMS) accomplishes both these goals through its main components: the Authoring Tool and the User Database. The Authoring Tool allows researchers to build the content (i.e. vocabulary, grammars etc.), implement different training strategies, assemble these components into lessons and, finally, implement cognitive tests needed to analyze subjects’ background profile. On the other end, the User Database stores all the information required to research purposes such as the scores on the cognitive tests and online information gathered from subjects’ performance (i.e. accuracy in pronouncing specific words/sentences, session by session improvement, number of attempts to pronounce a given item etc.). GrammarsThe grammar is the functional unit of training. Briefly, grammars define the set of words or sentences that are taught in any given instance of a game. Grammars are always made of four words/sentences. Within the grammar there is a picture for each word, the correspondent written translation and sound file. Every time the user is prompted to name a word from a given grammar, the SRE matches the sound file recorded through a microphone and compares it with the sound files of the words belonging to that grammar. A matching score is produced for each comparison and, in case the highest one overcome a pre-established threshold, the attempt to name the word is given positive feedback. The threshold can be set for each individual word based on how difficult they are to pronounce, allowing minimization of both false positives and negatives.Training strategiesCurrently, we have implemented and tested the effectiveness of two different strategies for teaching vocabulary. One strategy, referred as “rote” training, is a stimulus-paired associative task typically used both in classroom settings or by other computer-based programs. Simply, in this task each word to be learned (nouns and verbs) is represented by a picture and paired with its spoken form (see Comparison of performance in the match-mismatch task across different stages of training. Barplots represent mean and 95% CI of the accuracy scores ., left panel). The other strategy, the “inferential” training consists of pairing a picture depicting an actor performing an action on an object with the corresponding Spanish written and spoken three-word sentence with a subject-verb-object (SVO) structure (e.g., La bruja abate el basurero —- “The witch knocks over the garbage can”). The learner is required to guess the meaning of the individual words in the sentence based on the picture, prior knowledge, and the SVO syntax; in a series of training items each picture/sentence varies from the last only in one word (i.e., one of the nous or the verb). This aids the learner in making inferences as to word meanings, and “bootstrapping” later learning based on newly-acquired knowledge about the language. Some prior research has shown that such “implicit” training for L2 grammar can lead to better long-term retention and patterns of brain activity that more closely resemble those of native speakers Morgan-Short 2012, Morgan-Short 2012.