Over the last decade, video-enabled mobile devices have become almost ubiquitous, while advances in markerless pose estimation allow an individual’s body position to be tracked across the frames of a video. Previous work by this and other groups has shown that pose-extracted kinematic features can be used to reliably measure motor impairment in Parkinson’s disease (PD). This presents the prospect of developing an asynchronous and scalable, video-based assessment of motor dysfunction. Crucial to this endeavour is the ability to automatically recognise the class of an action being performed, without which manual labelling is required. Representing the evolution of body joint locations as a spatio-temporal graph, we implement a deep-learning model for frame-level classification of activities performed according to part 3 of the Movement Disorder Society Unified PD Rating Scale (MDS-UPDRS). We train and validate this system using a dataset of n=7220 video clips, recorded at 5 independent sites. This approach achieves human-level performance in classifying and labelling periods of activity within monocular video clips. Our framework could support clinical workflows and patient care at scale through applications such as quality monitoring of clinical data collection, automated labelling of video streams, or a module within a remote self-assessment system.