Speech Recognition Sample Solution

This tutorial is a walk-through explanation of the Speech Recognition sample solution that can be downloaded from your demo dashboard on the Voice Elements Customer Portal.

(If you haven’t already, sign up for a demo account and get 100 minutes of free call time for the next 30 days with the Voice Elements servers.)

Speech Recognition Tutorial

The Speech Recognition solution is designed to demonstrate how easily Voice Elements can:

- Play Text To Speech
- Digit Detection
- Use Simple Grammar (e.g. Yes/No)
- Use Complex Grammar (e.g. Multiple Choice)
- Help you understand Continuous Speech (e.g. getting a zipcode, phone number or a credit card number)
- Play TTS using Different Voices (see comments in the code sections)

Download the Sample Solution

If you are not brought directly to the Dashboard screen, click on Dashboard in the top navigation links.

Voice Elements Demo Dashboard

Sample Solutions Available in Three Download Formats

We offer three download formats for running the Sample Solutions: Windows Executable Program, Windows Source Code Download, or .NET Core Cross Platform Code. You can download any option and run them as often as you like.

Demo Dashboard Sample Solutions noting the three download formats available.

Choosing to Download the Windows Executable Program

You’ll enjoy this user friendly option if you are not a programmer, if Visual Studio is not installed on the device or machine you are currently using for the demo, or if you just want to jump straight to seeing Voice Elements in action.

Select the Windows Program button ‘Download exe’. The ZIP file will be downloaded to your machine’s Downloads folder. You might also find it in the tray at the bottom of your browser window. Unzip the folder and extract the files to your chosen location. Look for the application named VoiceApp and run it.

Choosing to Download the Windows Source Code

If you are a programmer and have Visual Studio loaded on the device or machine you are using, you may enjoy downloading the Windows Source Code and seeing it in action from that perspective. Not only will you see how easy it is to program the code, you will also see how simple it is to create your telephony application.

Select the Windows Source Code button ‘Download sln’. Unzip the folder and extract the files to your chosen location. Run the Microsoft Visual Studio Solution.

Choosing to Download the .NET Core Cross Platform Solution

This download format is designed for Linux or Windows You will need a .NET Core compatible compiler to run the solution.

Run Anyway and Allow Access

Your device might recognize the file as an unrecognized app with an unknown publisher and ask if you are sure you want to run it. Select the option that will confirm that you want to run the application anyway.

Your firewall might prompt you to confirm access to the app. Select the option that allows access.

Once you have removed the obstacles, the application will run.

Speech Recognition Client Application

Screenshot of Speech Recognition Sample Solution

Demo Notes:

This is a very simple IVR demonstrating how Voice Elements recognizes DTMF (touch tones) or your speech responses. As the intent is a simple demo, it has not been programmed to respond in a user-friendly way with a variety of choices.

If you dig deeper into the code below, you’ll begin to see how this simple IVR structure can be expanded to handle more complex responses.

Ready to try more Sample Solutions?

You can click on More Samples within the app, or go back to your browser and log in to the Voice Elements Customer Portal. We have a tutorial for each sample solution to guide you through running each sample.

You must close the current sample before running another one.

If you want to run the Sample Solution again, you might consider moving the folder out of downloads to your desktop or a location where you want to store all the Sample Solutions.

We hope you try all our Sample Solutions to really see how comprehensive and robust Voice Elements is.

Understanding the Source Code

For more detailed information about the Voice Elements Classes and Methods, explore our Class Library documentation at Voice Elements Developer Help. We’ve linked a few classes and methods in the sections below to encourage you to take advantage of this treasure-trove of knowledge from our developers.

IvrApplication

The core class of this project is IvrApplication. This class contains a lot of logic that sets up the application as a windows service so you can ignore a lot of the code in it for now.

MainCode()

The most important method here is MainCode(). When the application is run, it starts a new thread which runs MainCode(). This connects to the Voice Elements servers in the cloud. Then loops indefinitely checking for new tasks to run, and inbound call events.

Log.Write()

Note that Log.Write() is used frequently to log call progress and help with debugging. It is recommended that you continue to do this as you program your own Voice Elements applications.

The first thing MainCode() does is connect to the Voice Elements servers. This is done by constructing a new TelephonyServer object passing in server IP, username, and password as parameters. These values have already been generated for your account but you can change them in your Settings.settings file.

MainCode() also sets the CacheMode on the TelephonyServer object. ClientSession mode means that the server will stream and cache the files to and from your client machine. These files are flushed after you disconnect. Server mode means that the files reside on the server and will use the full path name to find them there. Note that Server mode can only be used on your own dedicated Voice Elements server.

After connecting to the server and setting its cache mode the new call event should be subscribed to. This sets a method to be called when an incoming call is received. In this example TelephonyServer_NewCall() is the method to be called on new incoming call events.

RegisterDNIS()

RegisterDNIS() is then called on the TelephonyServer to tell the server which phone numbers the application will handle. This method can be called with no parameters to instruct Voice Elements to handle calls from all phone numbers on your account. Otherwise you can specify numbers to handle as parameters.

try
{
    Log.Write("Connecting to: {0}", Properties.Settings.Default.PhoneServer);

    m_telephonyServer = new TelephonyServer("gtcp://" + Properties.Settings.Default.PhoneServer, Properties.Settings.Default.UserName, Properties.Settings.Default.Password);

    // CHANGE YOUR CACHE MODE HERE
    m_telephonyServer.CacheMode = VoiceElements.Interface.CacheMode.ClientSession;

    // SUBSCRIBE to the new call event.
    m_telephonyServer.NewCall += new VoiceElements.Client.NewCall(TelephonyServer_NewCall);
    m_telephonyServer.RegisterDNIS();

    // Subscribe to the connection events to allow you to reconnect if something happens to the internet connection.
    // If you are running your own VE server, this is less likely to happen except when you restart your VE server.
    m_telephonyServer.ConnectionLost += new ConnectionLost(TelephonyServer_ConnectionLost);
    m_telephonyServer.ConnectionRestored += new ConnectionRestored(TelephonyServer_ConnectionRestored);
}

Inbound Call

The InboundCall class is designated to handle most of the logic for an inbound call. When an inbound call event is generated and the TelephonyServer_NewCall() method in IvrApplication is called.

RunScript()

An object of the InboundCall class is constructed with the TelephonyServer and ChannelResource objects as parameters. The RunScript() method is then called on the new InboundCall object. The RunScript() method contains the logic for handling this inbound call, and in this project, all of the logic for speech recognition.

static void TelephonyServer_NewCall(object sender, VoiceElements.Client.NewCallEventArgs e)
{
    try
    {
        Log.Write("NewCall Arrival! DNIS: {0}  ANI: {1}  Caller ID Name: {2}", e.ChannelResource.Dnis,
        e.ChannelResource.Ani, e.ChannelResource.CallerIdName);

        // Handle The New Call Here
        InboundCall inboundCall = new InboundCall(m_telephonyServer, e.ChannelResource);
        inboundCall.RunScript();
    }
    catch (Exception ex)
    {
        Log.WriteException(ex, "IvrApplication::NewCall");
        e.ChannelResource.Disconnect();
        e.ChannelResource.Dispose();
    }
}

Microsoft Compatible Grammar Files

Voice Elements uses Microsoft compatible grammar files for its speech recognition functionality. The first step to programming speech recognition is to set the SpeechRecognitionEnabled VoiceResource property to true. The SpeechRecognitionGrammarFile property must also be set, this is a string containing the file path to the xml grammar file. The SpeechRecognitionMode property must be set to MultiplePlays this makes it so that when speech is detected all subsequent play commands will be bypassed until speech recognition is stopped. The SpeechRecognititonPermitBargeIn property can be set to true so that a previous play command will stop when the user begins to speak. The MaximumTime property can be set to only wait for a limited amount of time for a response.

GetResponse()

The GetResponse() method is then called to begin recording the speech recognition message. The SpeechRecognitionEnabled property is then set back to false.

If speech was received the program will then check to see if the SpeechRecognitionScore property is high enough. This score ranges from 0 to 1000. Generally, anything above 700 could be considered a strong positive detection. If the speech was recognized the program plays back what it detected then does speech recognition again for confirmation.

try
{
    // Answer the call
    Log.WriteWithId(m_channelResource.DeviceName, "Answering...");
    m_channelResource.Answer();

    while (true)
    {

        // If you want to try a complex response (ie a phone number), test this method
        // string phoneNumber = GetPhoneNumber();


        // Enable the speech recognition functionality
        m_voiceResource.SpeechRecognitionEnabled = true;

        // Select the grammar file to use for this recognition
        // For further information on creating your own grammar file, go to http://support.voiceelements.com/index.php?title=How_do_I_create_Microsoft_Compatible_Grammar_Files%3F
        m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\OneToFour.xml";

        // Then set to multiple plays, when speech is detected it will bypass all 
        // subsequent play commands until speech recognition is stopped
        m_voiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays;

        // Enable Barge-In - This allows the talker to stop the play by speaking
        m_voiceResource.SpeechRecognitionPermitBargeIn = true;

        // Wait up to 5 seconds for a response
        m_voiceResource.MaximumTime = 5;
        // Only allow 1 digit to be entered (if they use the keypad instead of speech)
        m_voiceResource.MaximumDigits = 1;

        Log.Write("Playing menu options...");
        // Play a menu option, allowing users to press or say their response
        m_voiceResource.PlayTTS("Press or say 1,2,3 or 4");

        // If you want, you can specify a different voice to use instead of the default voice
        // m_VoiceResource.PlayTTS("Press or say 1,2,3 or 4"", "Microsoft Server Speech Text to Speech Voice (en-US, ZiraPro)");

        // If the user did not speak during the message or enter a digit, we will
        // wait for a response
        m_voiceResource.GetResponse();

        // Once complete waiting for a response, turn off voice recognition 
        m_voiceResource.SpeechRecognitionEnabled = false;


        // If we received speech, process it now
        if (m_voiceResource.TerminationCodeFlag(TerminationCode.Speech))
        {
            // Log what happened
            Log.Write("Captured Speech: {0} Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord,
            m_voiceResource.SpeechRecognitionScore);

            // Scores range between 0 and 1000. Generally, anything above 700 could be 
            // considered a strong positive detection
            if (m_voiceResource.SpeechRecognitionScore >= 700) 
            {
                // Save this off because it will be overridden by the confirmation
                string response = m_voiceResource.SpeechRecognitionReturnedWord;

                // Turn on voice recognition
                m_voiceResource.SpeechRecognitionEnabled = true;

                // Use yes or no grammar file
                m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\YesNoComplex.xml";

                Log.Write("Playing confirmation...");
                // Check to see if the response was correct
                m_voiceResource.PlayTTS("You said: " + response + ". Is this correct? Say yes or no.");

                // If the user did not speak during the message, we will wait for a response
                m_voiceResource.GetResponse();

                // Turn off voice recognition
                m_voiceResource.SpeechRecognitionEnabled = false;

                // Log what happened
                Log.Write("Captured Speech: {0} Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord, 
                m_voiceResource.SpeechRecognitionScore);

                if (m_voiceResource.SpeechRecognitionReturnedWord == "yes")
                {
                    // Handle the response

                    break;
                }
                else
                    continue; // Replay the main menu if the user didn't say yes
            }
            else
                m_voiceResource.PlayTTS("I'm sorry, I did not understand your response");
        }
        // If a digit was entered, process that now
        else if (m_voiceResource.TerminationCodeFlag(TerminationCode.Digit) || m_voiceResource.TerminationCodeFlag(TerminationCode.MaximumDTMF))
        { 
            m_voiceResource.PlayTTS("You pressed " + m_voiceResource.DigitBuffer);

            // TODO - Handle the response

            switch (m_voiceResource.DigitBuffer)
            {
                default:
                    break;
            }

            break;
        }

    }

    // Log often
    Log.Write("Playing 'Goodbye'");

    // Play the goodbye prompt.
    m_voiceResource.PlayTTS("Goodbye");
}

GetPhoneNumber()

GetPhoneNumber() is an example method for using voice recognition with continuous speech. The main difference for continuous speech is that MaximumDigits and MaximumTime are not set.

private string GetPhoneNumber()
{
    // This shows how to get a continuous speech, like a 10 digit phone number
    // Turn on voice recognition
    m_voiceResource.SpeechRecognitionEnabled = true;

    // Use PHone Number grammar file (NOTE - there are many other types of speech in the file that you can switch to)
    m_voiceResource.SpeechRecognitionGrammarFile = @"..\..\AudioFiles\Phone_Number.gram";

    // Then set to multiple plays, when speech is detected it will bypass all 
    // subsequent play commands until speech recognition is stopped
    m_voiceResource.SpeechRecognitionMode = VoiceElements.Interface.SpeechRecognitionMode.MultiplePlays;

    // Enable Barge-In - This allows the talker to stop the play by speaking
    m_voiceResource.SpeechRecognitionPermitBargeIn = true;

    Log.Write("Asking for phone number...");
    m_voiceResource.PlayTTS("Say your 10 digit phone number.");

    Log.Write("Getting response...");
    // If the user did not speak during the message, we will wait for a response
    m_voiceResource.GetResponse();

    // Turn off voice recognition
    m_voiceResource.SpeechRecognitionEnabled = false;

    // Log what happened
    Log.Write("Captured Speech: {0}  Score: {1}", m_voiceResource.SpeechRecognitionReturnedWord,
                                                          m_voiceResource.SpeechRecognitionScore);

    if (m_voiceResource.SpeechRecognitionScore > 700)
    {
        // Handle the response

        m_voiceResource.PlayTTS("You said " + m_voiceResource.SpeechRecognitionReturnedWord);

        return m_voiceResource.SpeechRecognitionReturnedWord;
    }
    else
    {
        m_voiceResource.PlayTTS("I'm sorry, I did not understand your response");
        return string.Empty;
    }
}

For a deeper dive into coding Voice Elements and our Class Library, explore:

Start Coding Voice Elements.
Voice Elements Developer Site