I recently worked on a telephony project which required me to learn to code using Twilio. For those not familiar, Twilio provides a service which allows developers to host telecommunications programs on web servers, using telephony resources hosted in Twilio’s cloud to make and receive calls, send and receive SMS messages, and other related functions. The native API is provided via a custom Twilio-developed markup language named TwiML. However, Twilio provides SDKs for several other popular web-development languages including Python, Ruby, PHP, Node.js, and C#.
In the past, my tool of choice for quickly and easily creating telephony applications has been Inventive Labs’ Voice Elements. Its ready integration with Microsoft Visual Studio, easy to understand API, and complete feature set make it an easy choice for projects of all types and sizes. However, there’s been a lot of buzz in the industry about Twilio’s services over the last few years; so, I thought this would be a great opportunity to compare the two approaches.
Getting to Know Twilio
Twilio provides an application to help developers get to know their product offerings: Twilio Quest. By working through its various modules, developers are brought up to speed on Twilio’s basics, including examples of using both Twilio’s native TwiML language and some of the available SDKs it provides for other languages. You are also introduced to the other features of their service, including “Functions” and “Bins” (native TwiML applications hosted on Twilio’s servers) and their hosted call center and conferencing services. All-in-all, Quest provides a good introduction to what Twilio offers.
The Experience
My coding background has primarily been with using the languages C, C++, C#, and PHP. For my testing with Twilio, I chose to work with PHP, largely due to its simple integration with the nginx web server. PHP is also very C-like in syntax and structure, which feels more like home to someone who cut their teeth on C and C++.
Coding a telephony application with Twilio isn’t necessarily difficult; the API is in fact fairly straight-forward. However, the program can often seem counterintuitive, with parts which should be logically grouped needing to be separated. Let’s use a simple IVR flow as an example, where a prompt is played to the caller, a DTMF digit response is collected, and the response is evaluated for further processing.
Using the Twilio PHP SDK, it might look something like this:
Twilio Code – Simple IVR Call Flow Example
inbound_call.php
[csharp]
$response = new VoiceResponse;
$gather = $response->gather(array(‘numDigits’ => 1, ‘action’ => ‘/handle_response.php’, ‘method’ => ‘GET’));
$gather->say(‘Press 1 for sales, press 2 support.’);
echo $response;
[/csharp]
handle_response.php
[csharp]
$response = new VoiceResponse;
$response->say(‘You entered: ‘ . $_REQUEST[‘Digits’], array("voice" => "man")
);
switch($_REQUEST[‘Digits’])
{
case 1:
$response->say("Transferring to sales.");
break;
case 2:
$response->say(“Transferring to support.”);
break;
}
echo $response;
[/csharp]
Let’s Review Twilio’s Code
When a call arrives at Twilio’s servers, and destined for one of your purchased DIDs, the webhook configured to be called for that number is invoked. In this case, that script is inbound_call.php. This PHP script handles the initial call request. It instantiates a VoiceResponse object which will ultimately be returned as reply to the client; remember, this is running on a web server. It then populates the response by calling the gather() method. The arguments to the gather method are the number of digits to collect, the script which should be called with the DTMF digit reply, and the method the client should use to call the script. Then, the say() method is called to prompt the caller for their input. Finally, the response is returned to the client using the PHP echo directive. It is at this point that the Twilio server handling the call prompts the caller and collects their input.
After playing the prompt and collecting the DTMF input, Twilio sends a second request to the web server, this time for handle_response.php, the “action” we specified in our call to gather() previously. This script checks the ‘Digits’ parameter passed in the HTTP GET request and plays a message based on the response.
Deferred Execution
The first thing that is a bit unintuitive is that the scripts themselves aren’t actually doing anything immediately. Each of the method calls in the scripts is populating an HTTP response to the Twilio server which made the request. This can make troubleshooting more challenging as it becomes more difficult to track down exactly which of the method calls failed without examining logging on Twilio’s servers.
Multiple Scripts for Simple Interactions
Another unintuitive part is that two scripts were needed to handle a simple “prompt user and gather input” operation. The first script is called to tell the Twilio server’s how to handle the call initially. The second script is then called to tell the servers how to handle the response. More complex interactions result in applications which become littered with scripts responsible handling small bits of application logic.
Scripts Invoked Recursively
There is a way to do this same operation in a single script; by leaving out the ‘action’ parameter in the call to gather(), the same script will be called again. This leaves it to the programmer to determine whether or not the script has been called recursively. While this can cut down on the number of discrete scripts required for a project, the resulting logic is even less intuitive than the two-script approach. Projects written in this way can quickly become littered with conditional logic just to determine whether it has been called recursively.
The Voice Elements Approach
Now, let’s look at the same process in Voice Elements (C#):
Voice Elements Code – Simple IVR Call Flow Example
[csharp]
VoiceElements.Client.ChannelResource chan = args.ChannelResource;
//VoiceElements.Client.VoiceResource voice = chan.VoiceResource;
string input;
try
{
Console.WriteLine("Received new call from: " + args.ChannelResource.Ani);
chan.Answer();
chan.VoiceResource.PlayTTS("Hello " + chan.CallerIdName + " Please press 1 for sales or 2 for support.");
chan.VoiceResource.GetDigits(1, 5, "");
input = chan.VoiceResource.DigitBuffer;
switch (input)
{
case "1":
chan.VoiceResource.PlayTTS("Transferring to Sales.");
// Logic to perform transfer
break;
case "2":
chan.VoiceResource.PlayTTS("Transferring to Support.");
// Logic to perform transfer
break;
default:
chan.VoiceResource.PlayTTS("Invalid option, goodbye.");
break;
}
}
[/csharp]
Let’s Review Voice Elements’ Code
The general flow is similar. We first store our ChannelResource (this represents the telephony channel handling the call) in a local variable for easy reference. Next, the call is answered by calling the Answer() method. The caller’s options are played using text-to-speech via the PlayTTS() function and their input is gathered using GetDigits(). Finally, the input is used to determine which of multiple IVR paths to take.
Intuitive Top-Down Approach
This top-down approach is very intuitive. The call logic flows linearly from step to step; there are no jumps to scripts to handle what should be the next logical step in call processing. Everything required to understand the call flow is represented in this single section of source code. As the program logic becomes more complex, more steps (method calls) are simply added in a linear fashion. This results in call flows which are easier to visualize, and which more closely resemble our conceptual model of how calls flow from beginning to end.
Easy Access to Logs for Debugging
Also, each method call results in some action being immediately performed on the server. If a call to a method fails, or an exception is thrown, the programmer knows immediately and can more easily debug it. There is also no need to log into a remote server to check its logs; the relevant logging is stored locally on the customer’s server to aid in debugging.
Summary
So what’s the bottom line?
Trading Code Simplicity for Project Complexity
The project for which I needed to learn Twilio was not overly complex; the call logic was simple and didn’t require many interactions with the callers. Indeed, owing to the fact that most interactions need to be separated into their own discrete scripts, they are each relatively simple. Even so, the number of scripts required to handle just the telephony logic in the application was disproportionate to the work actually being done. It’s just not as satisfying or intuitive to work on an application where logic, which should be related, is necessarily separated into multiple source files. My experience is that this increases the overall complexity of the project, thereby decreasing its maintainability. Who wants to come back to a project in a year to troubleshoot it or add new features only to spend valuable time tracing call flows across multiple source files?
Server Maintenance Requirements
There is also the need to build and maintain a web server to host the application. This is not something to be taken lightly. A secure and up-to-date web server is not something you set up once and forget. It must be maintained, and security updates must be applied regularly lest it become vulnerable to known threats.
Voice Elements: A Flexible, Portable API
With Voice Elements on the other hand, developers have access to a proven, featureful, and intuitive API. And since it is built to support both the Microsoft .NET Framework and .NET Core, applications can be deployed on Microsoft Windows, Linux, and even macOS. Basically, they can be deployed on any platform which supports .NET Core.
Voice Elements: Free Application Hosting
Inventive Labs also recently added free application hosting for customers, on Inventive Labs cloud servers. Customers who build their applications using the .NET Core framework can upload them free of charge. This is very well suited to customers who don’t want or need to maintain their own infrastructure for telephony applications.
Voice Elements: The Easy Choice
For those that want more end-to-end control of the entire solution, customers can deploy their own locally hosted Voice Elements telephony servers as well. While most customers will opt to utilize Voice Elements cloud servers, on-premises server’s allow complete control of all aspects of the solution.
Philip Weeks is a Telecommunications Engineer with over 15 years of industry experience with ScanSource Communications as a Technical Support Analyst, Application Engineer, and Technical Team Lead. He is now an independent consultant working with customers to help them maintain, optimize, and update their telecommunications infrastructure, with a focus on Voice over IP technologies.