Speech To Text step

Introduction

The "Speech to Text" step leverages OpenAI's capabilities to convert audio files into written text, utilizing the Whisper model for accurate transcription.

Configuration

  • API Token: Your OpenAI API token, which is necessary for accessing the speech-to-text service. This token must be valid and have the appropriate permissions.

  • Model: The specific model used for transcription, with "whisper-1" set as the default. OpenAI's Whisper models are designed for high accuracy in various languages and audio conditions.

  • File: The audio file to be transcribed. This file should contain clear audio of the spoken content you wish to convert into text.

  • Language: (Optional) The ISO-639-1 language code of the audio's language. Specifying the language can enhance the accuracy and efficiency of the transcription process.

  • Prompt: (Optional) A text prompt that can guide the model's understanding or continuation of the audio content. This is particularly useful for context continuation in multi-part audio.

Outputs

  • Text: The transcribed text obtained from the audio file. This output provides the spoken content in written form, ready for use in subsequent steps.

Last updated