Many clinical applications involve in-bed patient activity monitoring, from intensive care and neuro-critical infirmary, to semiology-based epileptic seizure diagnosis support or sleep monitoring at home, which require accurate recognition of in-bed movement actions from video streams. The major challenges of clinical application arise from the domain gap between common in-the-lab and clinical scenery (e.g. viewpoint, occlusions, out-of-domain actions), the requirement of minimally intrusive monitoring to already existing clinical practices (e.g. non-contact monitoring), and the significantly limited amount of labeled clinical action data available. Focusing on one of the most demanding in-bed clinical scenarios - semiology-based epileptic seizure classification – this review explores the challenges of video-based clinical in-bed monitoring, reviews video-based action recognition trends, monocular 3D MoCap, and semiology-based automated seizure classification approaches. Moreover, provides a guideline to take full advantage of transfer learning for in-bed action recognition for quantified, evidence-based clinical diagnosis support. The review suggests that an approach based on 3D MoCap and skeleton-based action recognition, strongly relying on transfer learning, could be advantageous for these clinical in-bed action recognition problems. However, these still face several challenges, such as spatio-temporal stability, occlusion handling, and robustness before realizing the full potential of this technology for routine clinical usage.