This project uses wav2vec2 model introduced by facebook AI in their paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations . The project uses implementations of wav2vec2 from hugging face transformers to create an ASR system which takes input speech signal as input and outputs transcriptions asynchronously. This project also includes training notebooks to train your own speech recognition system.
I have also written a post explaining wave2vec2 in some detail with some further learning directions.
python -m venv env_name
env_path\Scripts\activate
pip install torch==1.8.0+cu102 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
conda create --name env_name python==3.8
conda activate env_name
conda install pytorch torchaudio cudatoolkit=11.1 -c pytorch
pip install -r requirements.txt