This project uses wav2vec2 model introduced by facebook AI in their paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations . The project uses implementations of wav2vec2 from hugging face transformers to create an ASR system which takes input speech signal as input and outputs transcriptions asynchronously. This project also includes training notebooks to train your own speech recognition system.
I have also written a post explaining wave2vec2 in some detail with some further learning directions.
python -m venv env_nameenv_path\Scripts\activatepip install torch==1.8.0+cu102 torchaudio===0.8.0 -f https://download.pytorch.org/whl/torch_stable.htmlpip install -r requirements.txtconda create --name env_name python==3.8conda activate env_nameconda install pytorch torchaudio cudatoolkit=11.1 -c pytorchpip install -r requirements.txt