💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️
-
Updated
Aug 1, 2025 - Jupyter Notebook
💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️
jupyter notebooks to fine tune whisper models on Vietnamese using Colab and/or Kaggle and/or AWS EC2
WhisPad is a note management tool where you can write or dictate your notes using local or API AI models (supports speaker diarization). Rewrite your texts using different styles, dive in using AI, translate, summarize, create mind maps, node graphs and even quizs and flashcards based on each note. A powerful companion for researchers and students.
In this notebook, I implemented a script to transcribe YouTube videos (and audio files in general) using Google's speech-to-text API.
Self-containing notebooks to play simply with some particular concepts in Deep Learning
High-performance Google Colab Notebook for fast & accurate audio transcription/translation using OpenAI Whisper. Accelerated on TPUs with PyTorch/XLA. Features an interactive UI for model selection, multi-language support, and long-form audio processing.
A speech emotion recognition notebook that learns a model to identify the emotion within human speech with an accuracy of roughly 60%.
Simple Jupyter Notebook including a Speech Recognition implementation with CMUSphinx
Notebook implementation of Michael Nielsen's online book: Neural Networks and Deep Learning.
Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.
In this notebook, we aim to recognize speech commands using classification. For this purpose, we used the SPEECHCOMMANDS dataset and the deep convolutional model M5. The code is written in Python and designed for the PyTorch platform.
The GitHub repository focuses on transforming audio files into mel-spectrogram images. It was created for the "UrbanSound8k Mel Spectrogram Images" dataset on Kaggle. Key features include sound visualization and dataset creation for sound analysis. The repository includes an Audio-to-Spectrogram.ipynb notebook for creating spectrograms.
This repository provides a Jupyter notebook for (CTC) based Automatic Speech Recognition (ASR) system using TensorFlow and Keras. The primary focus of this repository is to demonstrate the implementation of a CTC ASR model and to show how to train it effectively on the "Yes No" dataset.
✭ MAGNETRON ™ ✭: This is a Google Colab/Jupyter Notebook for developing a HEARING PROXIA (B) when working with ARTIFICIAL INTELLIGENCE 2.0 ™ (ARTIFICIAL INTELLIGENCE 2.0™ is part of MAGNETRON ™ TECHNOLOGY).
In this notebook, we are recognizing digits from 0 to 9 based on audio recordings file. Input data will be in the form of speech signal and output will be a single digit.
Whisper AI is an automated speech recognition (ASR) system. It is open source and can be access via GitHub or HuggingFace. This is the simplest way to implement Whisper AI via Github using python Google Colab Notebook.
In this notebook, we will create to convert an audio file of an English speaker to text using a Speech to Text API using IBM-Watson. Then we will translate the English version to a Spanish version using a Language Translator API.
Add a description, image, and links to the speech-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speech-recognition topic, visit your repo's landing page and select "manage topics."