Speech Recognition

This project uses wav2vec2 model introduced by facebook AI in their paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations . The project uses implementations of wav2vec2 from hugging face transformers to create an ASR system which takes input speech signal as input and outputs transcriptions asynchronously. This project also includes training notebooks to train your own speech recognition system.

I have also written a post explaining wave2vec2 in some detail with some further learning directions.

Installation

Get Started

Install Python3 or anaconda and install them. For detailed steps follow installation guide for Python3 and Anaconda
Install required packages via pip or conda.

Installing via pip

Download and Install python
Create a virtual environment using python -m venv env_name
enable created environment env_path\Scripts\activate
Install PyTorch pip install torch==1.8.0+cu102 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
Install required dependencies pip install -r requirements.txt

Installing via conda

Download and install miniconda
Create a new virutal environment using conda create --name env_name python==3.8
enable create environment conda activate env_name
Install PyTorch conda install pytorch torchaudio cudatoolkit=11.1 -c pytorch
Install required dependencies pip install -r requirements.txt

Usage Instructions

Download Github repository
Follow README guide for using the application.

Tested Platforms

native windows 10 ✔
windows-10 wsl2 cpu ✔
windows-10 wsl2 gpu ✔
Linux ✔

asr
speech-recognition
speech-to-text
transcribe-audio

Github Source