30-Second Summary:

  • In order to operate, VoIP depends upon a series of protocols that make its features possible. 
  • It facilitates enterprise networks to transmit telephone conversations over the same infrastructure as data communications. 
  • A VoIP endpoint can be a physical IP phone, a softphone installed on a laptop, or a VoIP app installed on a smartphone.
  • Enterprises are changing the way in which they do business, making use of a distributed workforce, rather than a brick-and-mortar-based operation
  • These VoIP services employ an arsenal of VoIP protocols to enable transmitting voice-over data networks.
  • In this article, we’ll shed some light on these protocols, and explain their operations, which empower VoIP to offer its state-of-the-art and groundbreaking services.

Introduction

If you’ve heard of Voice over IP (VoIP), then you’ve probably already come across the various innovations and advanced services that are delivered by this cutting-edge technology.  It has been especially useful in enabling communication for businesses and their customers, particularly over the past few years.  In order to operate, VoIP depends upon a series of protocols that make its features possible.  These protocols often sound mysterious, and it is not always clear what they do.

In this article, we’ll shed some light on these protocols, and explain their operations, which empower VoIP to offer its state-of-the-art and groundbreaking services.

What is VoIP and how does it work?

In order to comprehend the protocols that support it, we must first gain a fundamental understanding of what VoIP is.  VoIP defines a collection of technologies that allow voice services to leverage an existing packet-based network infrastructure to enable communication.  In other words, it facilitates enterprise networks to transmit telephone conversations over the same infrastructure as data communications.  This integration of voice services on a data network is called network convergence. 

Components of a VoIP conversation

Let’s imagine a VoIP conversation taking place between two VoIP endpoints.  A VoIP endpoint can be a physical IP phone, a softphone installed on a laptop, or a VoIP app installed on a smartphone, such as that provided by services like Freshcaller or OomaOffice

There are two components to such communication as described below:

  • Signaling – This involves the exchange of information between the VoIP endpoints. Signaling is responsible for establishing and maintaining the connection during the call.  It deals with the interpretation of dial strings, makes the devices ring, and delivers the various tones you hear, such as ringback tone and busy tone.  It also enables features such as call waiting, call hold, and call transfer, to name a few.  The signaling is also responsible for ensuring that the same codec is used by both devices for the encoding of voice.
  • Transmission of voice – This component of the communication involves the transmission of the actual voice itself. The voice is digitized and then segmented into individual packets and sent via the IP network.  Once they reach their destination they are reassembled.  The transmission must be performed so that the packets arrive at their destination in a timely and consistent manner so that they can be reassembled into an intelligible reproduction of the original voice.

How protocols are involved in VoIP

Each of the components of communication requires the use of one or more protocols.  A protocol is a set of specifically defined rules that are used by end devices to allow for successful communication to take place.  Protocols allow each end device to correctly interpret information that is received and to properly structure information that is sent.  When the same set of rules is used on both ends of the conversation, communication is successful.

The IP protocol provides the means for routing both signaling and voice packets from one VoIP device to another.  However, beyond enabling routing to the intended destination, IP does not include information on how to interpret the actual payload of these packets, that is, the voice itself.  That’s where additional VoIP-specific protocols are necessary.

Protocols used for VoIP

Both the signaling and voice transmission components require protocols.  Since both components have different requirements, there are distinct protocols used for each.  These are described further below.

Signaling protocols

Keep in mind that signaling protocols don’t actually carry the voice packets.  They are only involved in the communication between the VoIP end devices to establish, maintain, and tear down conversations.  Signaling protocols are also involved in communicating with the centralized control servers, or IP PBXs (either cloud-based or physical) that provide the various capabilities and permissions that a particular VoIP endpoint has.

The most common protocols used for signaling include:

Session Initiation Protocol (SIP)

This is the most widely used protocol for signaling.  The majority of VoIP equipment vendors and service providers, such as GoToConnect, employ SIP to distribute their VoIP services.  More details about SIP and how it works can be found in the What is the SIP Protocol and Why is it so Great article.

Session Description Protocol (SDP)

But SIP doesn’t work alone.  It leverages another protocol called the Session Description Protocol.  Where SIP is used to exchange signaling information, SDP is activated to contain and deliver a description of the multimedia sessions between endpoints.  Specifically, it allows endpoints to negotiate various aspects and parameters of the voice conversation, such as the media type, format, and associated properties.  Again, SDP, like SIP, is not used to carry voice packets, but it is actually included as a payload of the SIP messages themselves.

H.323

Another option for signaling protocols is the use of the H.323 protocol.  This protocol is a recommendation from the International Telecommunications Union.  It is a standard that delivers audio-visual communication sessions over packet networks.  Although not as prevalent as SIP, it is a widely used protocol especially for sessions that include a video component, such as those used with teleconferencing equipment and services.

Voice packet transmission protocols

Regardless of whether H.323 or SIP is being used for signaling, the underlying transmission of voice packets takes place through a series of specially designed protocols, which are further described below.

Real-time Transport Protocol (RTP)

If SIP and H.323 don’t actually carry voice packets, then what does?  That responsibility is given to the Real-time Transport Protocol.  For those of you familiar with the OSI model, this protocol sits on top of the Transport Layer and is specially designed to transport audio and video over IP networks.  In the event that SIP is being used as the signaling protocol, once SIP establishes the connection between the two end devices and SDP determines the various parameters of the voice session, the RTP stream of voice packets begins, using the negotiated parameters.  RTP functions similarly when employed with the H.323 signaling protocol.

RTP Control Protocol (RTCP)

RTP has a companion protocol called the RTP Control Protocol.  While RTP carries the actual voice packets, RTCP is used to monitor transmission statistics and quality of service (QoS) in real-time.  This functionality aids in the synchronization of multiple voice streams.  The QoS statistics it collects include packet loss, packet counts, jitter, and round-trip delay time.  This information is then used to dynamically modify various parameters of the transmission to improve upon the quality of the transmission.

Proprietary protocols

The protocols we mentioned so far are open protocols and can be freely used by vendors and service providers.  This enables ease of interoperability between services and devices of different manufacturers.  However, some VoIP service providers have chosen to use their own self-designed protocols for communication.  Examples include:

  • The Skinny Client Control Protocol (SCCP), used by Cisco IP telephones and the Call Manager call control server.
  • The Skype protocol, which needless to say is used by Skype applications. Nevertheless, this protocol has been deprecated by Microsoft and has since been replaced with the MSNP24 protocol, also proprietary.
  • Inter-Asterisk eXchange (IAX) protocol is used by Asterisk to transport VoIP sessions between VoIP servers. Although not strictly a proprietary protocol, it is primarily used by Asterisk servers.

Conclusion

Enterprises are changing the way in which they do business, making use of a distributed workforce, rather than a brick-and-mortar-based operation.  As such the deployment and use of VoIP services are on the rise.  These VoIP services employ an arsenal of VoIP protocols to enable transmitting voice-over data networks.  These protocols are the hidden workhorses that allow these phenomenal and truly innovative applications to become a reality, improving both the operation and efficiency with which businesses can function and communicate today.