FL-VLD: A Novel Voice Liveness Detection Framework Based on Asynchronous
Federated Learning
Abstract
Voice Liveness Detection (VLD) has become one of the hot research topics
in the Internet of Things era. Conventional VLD methods are centralized
solutions that are trained on abundant data collected from local clients
and stored on the server, however, they have the risk of causing data
islands and privacy leakage. To address this problem, we propose a novel
word-level VLD framework based on asynchronous federated learning (FL)
with pop noise, named FL-VLD. Structurally, FL-VLD takes the
preprocessed voice for local model training and constructs a global
model by only transmitting the learned weights after differential
privacy with FL’s central server in an asynchronous manner. In addition,
the local network of the framework incorporates the residual network and
the spatial grouping enhancement module to optimize the complexity and
accuracy of the global model. With the advantage of FL’s distributed
structure, FL-VLD solves the data island problem in VLD scenario without
threatening users’ privacy. Experimental results on the popular POCO
dataset show that our proposal is clearly superior to the traditional
centralized methods as well as overperforming other federated schemes in
terms of fairness, stability, accuracy, and lightweightness. Further,
for attacks involving far-field replay, synthesis, and conversion,
FL-VLD has high generalisation capabilities. Finally, the ablation study
attests to its efficacy.