Bidirectional Long short-term memory and Recurrent Neural Network model
for speech recognition
Abstract
Speech-to-text is essential as it converts spoken words to text, thus
making it easy to store. It has several components; from a basic model,
it is viewed in four stages; Signal pre-processing, feature extraction,
feature selection, and modeling. Several works of literature have been
documented on improving and achieving better results in speech
recognition. However, works remains in resolving the issue of word error
rate and accuracy on continuous input stream without increasing the
required bandwidth. This research evaluates recurrent neural networks,
long short-term memory neural networks, gated recurrent units, and
bi-directional long short-term memory. It further tests the signal’s
performance after introducing bias to the long short-term memory. This
research then proposes a model bi-directional long short-term memory
recurrent neural network. Experimental results demonstrate that even
with a bias of one on long short-term memory, the bidirectional long
short-term memory recurrent neural network model still achieves better
results with a word error rate of 8.92%, accuracy of 91.08% and mean
edit distance of 0.1910 using the Libri speech training dataset. Future
work will evaluate the use of the transformer models in the reduction of
the word error rate and accuracy on a continuous input stream.