Abstract
The rapid growth of the live streaming industry brought about the VTuber
trend, where content creators use character avatars to stream. One of
the most accessible way to move a character in real time is by using a
vision-based facial motion capture technique. However, previous works
still suffer from jittering issues, which hinder the quality of the
character’s movements. This work aims to develop a smoothed facial
motion capture system that works with a live2D VTuber model. The system
combines Mediapipe Face Mesh and OpenCV solutions to capture facial
landmarks, which are then used to calculate head pose estimation using
the Perspective-n-Point (PnP) function. In addition, the system uses EAR
and MAR functions to detect facial features. The motion values obtained
from this process are then filtered using a Kalman filter. Finally, the
filtered motion data is sent to the Unity engine, which drives the
Live2D VTuber model by adjusting the character’s motion parameters. The
developed system successfully captures and drives the Live2D VTuber
model with smoother motion, overcoming the jitter problem prevalent in
previous facial motion capture approaches. The system’s improved motion
capture quality makes it a more viable option for a wide range of
potential uses.