Specifying embedded XML tags

You can specify embedded XML tags to change the way the text-to-speech engine produces output. Depending on your speech engine, you might be using SSML (Speech Synthesis Markup Language) or another XML-based markup language, such as Apple’s embedded speech commands, and Microsoft’s SAPI Text to speech (TTS) markup.

Examples of Microsoft Server Speech using SSL Markups

Audio

Play a pre-recorded audio file as part of a prompt. “<audio src=”c:\the_name_of_your_file.wav”> This is what I will say if the file is not found. </audio>”

Break

Add a pause between words. “Five hundred milliseconds of silence <break time=”500ms” /> just occurred.”

Emphasis

Add emphasis to a word in a sentence. “The word <emphasis level=”strong”> boo </emphasis> is emphasized!”

Note: Currently Microsoft TTS engines do not support this element. See MSDN Article, under paragraph “Specify speech output characteristics”

Say-As Element

The say-as element provides guidance about pronunciation–date formats, cardinal and ordinal numbers, characters, time, and telephone numbers. See Say-As Element for more details. Here are a few examples:

To have each letter spelled out individually: “<say-as interpret-as=”characters”> TEST </say-as>”

To have text spoken as an ordinal number: (e.g. 3rd, 4th) “Select the <say-as interpret-as=”ordinal”>3rd</say-as> option.”

To specify a date format: “Today is <say-as interpret-as=”date” format=”mdy”> 4-24-2017 </say-as>”

Phoneme Element

Using the <phoneme> element, you can specify a phonetic pronunciation for a word or phrase. “His name is Mike <phoneme alphabet=”x-microsoft-ups” ph=”JH AU”> Zhou </phoneme>”

Prosody

Prosody specifies the pitch, contour, range, rate, duration, and volume for speaking the contained text. See Prosody Element for more details. Here are a few examples:

“This is normal volume. <prosody volume=”40″> This is a whisper. </prosody>”

“This is normal pitch. <prosody rate=”-20%” volume=”40″> This is slow and quiet. </prosody>”

Voice

To define the voice to be used, you can use the Voice markup or use the overload on the PlayTTS command:

Voice Markup: “<voice name=”Microsoft Server Speech Text to Speech Voice (en-US, Helen)”> This is the text that the application will speak. </voice>”

PlayTTS Command: m_VoiceResource.PlayTTS(“This is the text that the application will speak.”, “Microsoft Server Speech Text to Speech Voice (en-US, Helen)”);

Examples of SAPI 5.3 Markups:

Volume Control: <volume level=”nn”/> where nn is a value from 1 to 100.

Absolute Rate Of Speech: <rate absspeed=”nn”/> where nn is a value from -10 to 10.

Relative Rate Of Speech: <rate speed=”nn”/> where nn is a value from -10 to 10.

Absolute Pitch: <pitch absmiddle=”nn”/> where nn is a value from -10 to 10.

Relative Pitch: <pitch middle=”nn”/> where nn is a value from -10 to 10.

Emphasis: “The word <emph> boo </emph> is emphasized!”

Spelling Control: “The Word <spell> Spell </spell> is spelled out.”

Silence Control: “Five hundred milliseconds of silence <silence msec=”500″/> just occurred.”

Pronunciation: “<pron sym=”h eh l l ow & w er l l d”/>”

Bookmarks: “<bookmark mark=”bookmark_one”/>Simple Text”

Parts Of Speech: “<partofsp part=”noun”> A </partofsp> is the first letter of the alphabet.”

Context: “<context id=”date_mdy”> 03/04/01 </context> should be March fourth, two thousand one.”

Voice: “<voice required=”Age=Teen”>A teen should speak this sentence – if a female, non-child teen is present, she will be selected over a male teen, for example.</voice>”

Language: “<voice required=”Language=409″>A U.S. English voice should speak this.</voice>”

Was this article helpful to you? No Yes 14