# Speech To Text

**Introduction**

The "Speech to Text" step leverages OpenAI's capabilities to convert audio files into written text, utilizing the Whisper model for accurate transcription.

**Configuration**

* **API Token**: Your OpenAI API token, which is necessary for accessing the speech-to-text service. This token must be valid and have the appropriate permissions.
* **Model**: The specific model used for transcription, with "whisper-1" set as the default. OpenAI's Whisper models are designed for high accuracy in various languages and audio conditions.
* **File**: The audio file to be transcribed. This file should contain clear audio of the spoken content you wish to convert into text.
* **Language**: (Optional) The ISO-639-1 language code of the audio's language. Specifying the language can enhance the accuracy and efficiency of the transcription process.
* **Prompt**: (Optional) A text prompt that can guide the model's understanding or continuation of the audio content. This is particularly useful for context continuation in multi-part audio.

**Outputs**

* **Text**: The transcribed text obtained from the audio file. This output provides the spoken content in written form, ready for use in subsequent steps.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.flexy.bot/modules/openai/steps/speech-to-text.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
