Speech to text(STT) using whisper and langchain

Here you learn how to convert Speech to Text using Whisper an OpenAI speech transcribing model.

Code link: https://github.com/sushmasush/langGraph/blob/main/SpeechtoText.ipynb

Data: https://github.com/sushmasush/langGraph/blob/main/data/joyfully.wav

Load the open source or openAI model to transcribe the text

This involves giving the .wav or .mp3 file into the whisper to generate the transcribed text.

Split the text and store it into a Document object in LangChain

The text is then split recursively into chunks using RecursiveCharacterTextSplitter and text_splitter functionality, giving out a list of strings.

Generate a summary or query for anything in the speech

I have demonstrated using create_stuff_documents_chain in this tutorial. It Creates a chain for passing a list of Documents to a model. With a prompt direction and the context, one can invoke the model using the langchain and get the corresponding results.

Using these results we could continue to further process the results as per requirement.

Conclusion

Real-world use cases involve

  • Captioning
  • Legal and courtroom transcriptions
  • Healthcare and medical transcriptions
  • Voice assistants and smart devices

🗝My portfolio: https://wp.me/PccXal-wv
🤝 Connect for a 1:1 https://lnkd.in/g6FDTxcM

References

https://github.com/NirDiamant/RAG_Techniques

Leave a Reply