Author’s note: This article supports our legacy products. At Inventive Labs, after two decades of providing telephony tools, we never stop supporting those who rely on our products to run their businesses. We no longer recommend Dialogic. This article is for our customers who are using our CTI32 Legacy Toolkit with Dialogic equipment. For more information about these legacy products and where we are today, read Ditch Dialogic & Convert to Voice Elements.
The CTIPlayAndRecognizeWord function allows you to play a file and recognize a single word using a speech recognition engine.
You must have a CSP enabled port in order to use this function
Signature
.NET
public static TermCode PlayAndRecognizeWord(string GrammarFile, int chdev, Source source, VoiceFormat voice_format, string FileName, int Fhandle, string prompt, string digmask, int ClearDigitBuffer, int EnableBargeIn, int Timeout, string error_file, ref string RecognizedWord, ref int Score)
public static TermCode PlayAndRecognizeWord(string GrammarFile, int chdev, Source source, VoiceFormat voice_format, byte[] buffer, int Fhandle, string prompt, string digmask, int ClearDigitBuffer, int EnableBargeIn, int Timeout, string error_file, ref string RecognizedWord, ref int Score
C++
long far pascal CTIPlayAndRecognizeWord(char *GrammarFile, int chdev, int source, int voice_format, char *FileName, long Fhandle, char *prompt_to_play, char *digmask, BOOL ClearDigitBuffer, int EnableBargeIn, int Timeout, char *error_file, char *RecognizedWord, int *Score);
Parameters/Return Values
* GrammarFile – The name of the recognition engine grammar file for this prompt.
* chdev – The voice resource device
* source – This parameter can be
# CTI_FILE (0 defined in cti.h) which will play a file from disk from beginning to end # CTI_VAP (1 defined in cti.h) which will play one or more prompts from a indexed voice file (.VAP) # CTI_MEMORY_VAP (2 defined in cti.h) which will play from an indexed voice file which has been loaded into memory # CTI_MEMORY (3 defined in cti.h) which will play a file from a memory location for a specified length # CTI_VAP_INDEX_IN_MEMORY (4 defined in cti.h) which will play prompts from the disk, but the VAP index has been loaded in memory by the application
* voice_format – This defines the voice file type and sampling rate. Available options:
# Kbps64 2 //PCM 8Khz VOX format) 8000 b/s # Wave8 5 //8Khz Wave 8000 bytes/sec # Vox8Mulaw
* FileName – This is the name of the file to play if CTI_FILE. It contains the full path to the file. If the Source was CTI_MEMORY or CTI_MEMORY_VAP, this is a memory pointer to the base offset of the memory to be used. For CTI_VAP, this parameter is not used – the VAP file is expected to be open and a file handle is passed in the next parameter. If the Source is CTI_VAP_INDEX_IN_MEMORY, this parameter contains the memory pointer to the start of the VAP index that has been loaded into memory.
* buffer – If the Source was CTI_MEMORY or CTI_MEMORY_VAP, this is a memory pointer to the base offset of the memory to be used. For CTI_VAP, this parameter is not used – the VAP file is expected to be open and a file handle is passed in the next parameter. If the Source is CTI_VAP_INDEX_IN_MEMORY, this parameter contains the memory pointer to the start of the VAP index that has been loaded into memory.
* Fhandle – For CTI_VAP or CTI_VAP_INDEX_IN_MEMORY, this is the file handle to the open index file to use. For CTI_MEMORY, this contains the length of the file in memory to play. For CTI_FILE and CTI_MEMORY_VAP, this parameter is not used.
* prompt – This is a string list of prompts to be played out of a VAP file or memory VAP. It is not used if CTI_FILE or CTI_MEMORY was specified. It can contain one or more prompts that you wish spoken together. Each prompt should be separated by a comma. For example, if you wanted to say three phrased together, and the phrased were prompts 9, 2, and 56 respectively, then you would pass “9,2,56” to this parameter.
New: If CTI_FILE is specified, you can pass the byte offset into the file to start the playing the recording. This way you can start playing in the middle of a file. On return, it will contain how many bytes were played. (Only if the length of the string passed in was greater than zero) If you need to know how many bytes were played, but you want to start playing from the beginning, send in a buffer that is 10 bytes big or so, and pass in “0”.
* digmask – This is a pointer to a string of digits that will terminate playing the file if pressed. Valid digits are 1-9 and the # and * key. For example, if you wished to terminate on the 1 key OR the # key, send the string “1#”. If you pass the string “ANY”, playing will be terminated upon any press of a key. Pass the empty string “” if you wish for the file NOT to be interrupted.
* ClearDigitBuffer – This is a TRUE or FALSE value. If true, any digits sitting in the board’s digit buffer are cleared before getting the new digits. Usually this is set to TRUE.
* EnableBargeIn
0 – Disable the dialogic CSP BargeIn – start streaming to the voice recognition engine as soon as the play starts. This option will require more resources.
1 – Enable the dialogic CSP BargeIn – do not start streaming to the voice recognition engine until dialogic CSP detects user utterance.
2 – Enable the dialogic CSP BargeIn but do not allow a barge in until the whole file has played to completion. Then start streaming.
* Timeout How many seconds to wait for a valid response.
* error_file – This is a full path to a file which speaks of a system error and a technical support number to call. This file must be a separate file and not part of a VAP file.
* RecognizedWord – Returns the word from the grammar file that the user spoke.
* Score – Returns a score 1 to 1000 of how closely of a match. The higher the better or more likely the user actually said the word returned.
This function returns the terminating event. See [[TermCode]] enum for a list of terminating events. (i.e. TM_EOD if it returned because the file played to the end.
Notes/Related Information
The Voice Recognition Engine must be properly configured for this function to succeed.
Version Information
Introduced prior to version 4.5 of CTI32.
Customer Related Questions
Q: Why the restriction to a handful of voice formats?
A: The restriction is the due to the dependency on the voice recognition engine. During speech recognition, the CSP resource is providing echo cancellation. It order for it to be able to do this, the stream of data being played must match the stream of data being received. Since the voice recognition engines typically require an 8Khz data stream, this limits the available formats for Playing.