Speech to text(STT) using whisper and langchain

Here you learn how to convert Speech to Text using Whisper an OpenAI speech transcribing model.

Code link: https://github.com/sushmasush/langGraph/blob/main/SpeechtoText.ipynb

Data: https://github.com/sushmasush/langGraph/blob/main/data/joyfully.wav

Table of Contents

Load the open source or openAI model to transcribe the text

This involves giving the .wav or .mp3 file into the whisper to generate the transcribed text.

Split the text and store it into a Document object in LangChain

The text is then split recursively into chunks using RecursiveCharacterTextSplitter and text_splitter functionality, giving out a list of strings.

Generate a summary or query for anything in the speech

I have demonstrated using create_stuff_documents_chain in this tutorial. It Creates a chain for passing a list of Documents to a model. With a prompt direction and the context, one can invoke the model using the langchain and get the corresponding results.