Audio to Text

TOX name base_audio_to_txt

Summary

A TouchDesigner component for generating synthetic text from audio with the Google Gemini API. Example use cases here include the ability to transcribe contents from a file to text.

Controls

Refer to the Common Controls for a list of all available parameters.

Parameter Name	Parameter	Type	Description
Source File	`Sourcefile`	file	Available when Use Source File is true, this allows you to select a file from disk to use for the audio transcription model
Use Source File	`Usesourcefile`	toggle	Use a file from disk, or record audio directly in TouchDesigner
Temp File	`Tempfile`	file	(Read Only) Path to currently used temp file
Recording Timeout	`Recordingtimeout`	int	The number of seconds that can elapse before the Recording process will stop automatically. This ensures you don't accidentally fill your whole hard drive with an audio recording.
Record	`Record`	toggle	turns recoding on and off

Outputs

Output Index	Name	Type	Description
0	`out_response`	`DAT`	The text output from the Google Gemini API
1	`out_metadata`	`DAT`	Contains the metadata back from the Google Gemini API, this includes data like total token count, and prompt token count

Summary​

Controls​

Outputs​

Summary

Controls

Outputs