Essential Site Maintenance: Authorea-powered sites will be updated circa 15:00-17:00 Eastern on Tuesday 5 November.
There should be no interruption to normal services, but please contact us at [email protected] in case you face any issues.

loading page

Bidirectional Long short-term memory and Recurrent Neural Network model for speech recognition
  • +1
  • Mercy Kimani,
  • Lawrence Nderu,
  • Dalton Ndirangu,
  • Mwalili Tobias
Mercy Kimani
Machakos University

Corresponding Author:[email protected]

Author Profile
Lawrence Nderu
Jomo Kenyatta University of Agriculture and Technology
Author Profile
Dalton Ndirangu
United States International University School of Science and Technology
Author Profile
Mwalili Tobias
Jomo Kenyatta University of Agriculture and Technology College of Pure and Applied Sciences
Author Profile

Abstract

Speech-to-text is essential as it converts spoken words to text, thus making it easy to store. It has several components; from a basic model, it is viewed in four stages; Signal pre-processing, feature extraction, feature selection, and modeling. Several works of literature have been documented on improving and achieving better results in speech recognition. However, works remains in resolving the issue of word error rate and accuracy on continuous input stream without increasing the required bandwidth. This research evaluates recurrent neural networks, long short-term memory neural networks, gated recurrent units, and bi-directional long short-term memory. It further tests the signal’s performance after introducing bias to the long short-term memory. This research then proposes a model bi-directional long short-term memory recurrent neural network. Experimental results demonstrate that even with a bias of one on long short-term memory, the bidirectional long short-term memory recurrent neural network model still achieves better results with a word error rate of 8.92%, accuracy of 91.08% and mean edit distance of 0.1910 using the Libri speech training dataset. Future work will evaluate the use of the transformer models in the reduction of the word error rate and accuracy on a continuous input stream.