Skip to main content

Audio to Text

TOX name base_audio_to_txt

Summary

A TouchDesigner component for generating synthetic text from audio with the Google Gemini API. Example use cases here include the ability to transcribe contents from a file to text.

Controls

Refer to the Common Controls for a list of all available parameters.

Parameter NameParameterTypeDescription
Source FileSourcefilefileAvailable when Use Source File is true, this allows you to select a file from disk to use for the audio transcription model
Use Source FileUsesourcefiletoggleUse a file from disk, or record audio directly in TouchDesigner
Temp FileTempfilefile(Read Only) Path to currently used temp file
Recording TimeoutRecordingtimeoutintThe number of seconds that can elapse before the Recording process will stop automatically. This ensures you don't accidentally fill your whole hard drive with an audio recording.
RecordRecordtoggleturns recoding on and off

Outputs

Output IndexNameTypeDescription
0out_responseDATThe text output from the Google Gemini API
1out_metadataDATContains the metadata back from the Google Gemini API, this includes data like total token count, and prompt token count