Skip to main content

Text to Speech

TOX name base_txt_to_speech

Summary

A TouchDesigner component for generating synthetic speech with the Google Gemini API.

Controls

Refer to the Common Controls for a list of all available parameters.

Prompts

One interesting element to consider when writing prompts for Text to Speech is how you can provide inline instructions about tonality. You can provide instructions for the model about delivery by enclosing them in square brackets. For example, consider the following:

[menacingly] Have a great day!
[cheerfully] Have a great day!

While the text is the same, these prompts will produce different results. You can change the intonation of your prompt by adding another set of square brackets.

[menacingly] Have a great day! [cheerfully] Just kidding, keep your chin up
Parameter NameParameterTypeDescription
VoiceVoicemenuThe available voice options for generating speech
Export Audio FileExportaudiofilepulseAllows for exporting audio from component - using this parameter will open a dialog asking you where to save the file
FileFilefile(Read Only) Path of source.
ReloadReloadpulsepulseInstantly reload the file from disk.
PlayPlaytoggleAudio will playback when this is set to 1 and stop when set to 0.
SpeedSpeedfloatThis is a speed multiplier which only works when Play Mode is Sequential. A value of 1 is the default playback speed. A value of 2 is double speed, 0.5 is half speed and so on. This node can not play audio backwards so negative values will not work well.
CueCuetoggleJumps to Cue Point when set to 1. Only available when Play Mode is Sequential.
Pulse CueCuepulsepulseInstantly jumps to the Cue Point.
RepeatRepeatmenuRepeats the audio stream when the end is reached.
VolumeVolumefloatSet the level the file is read in at. A setting of 1 is full signal while 0 is muted.
Fade In/OutFadetoggleabout

Outputs

Output IndexNameTypeDescription
0out_responseTOPThe video output from the Google Gemini API
1out_response_audioCHOPThe video audio output from the Google Gemini API