Introduction

Employing VoIP services over an IP network can sometimes introduce unforeseen and non-intuitive behavior.  This can result in users experiencing strange-sounding phenomena that can interfere with the quality of the voice heard.  One of the most common and troublesome phenomena is called jitter and can be difficult to fix.

Understanding what jitter is and why it takes place is a good starting point to resolving such problems on your voice network.  In this article, I’ll explain in plain language what it is, why it takes place, and the methods used to minimize it on a network.

What is Jitter?

Jitter in general is defined as a deviation from the true periodicity of a presumably periodic signal.  In other, simpler terms, if you tap your pencil on the desk at a frequency of exactly once a second, then there is no jitter in that tapping.  If some taps occur too early or too late compared to the rest of the taps, then the tapping is not truly periodic but is displaying a level of jitter.  The greater the deviation from the expected time of each tap, the greater the jitter.

Jitter applied to VoIP

Jitter is an important concept for VoIP networks.  Because voice is continuous in nature, voice packets that are sent on a network must arrive at their destination (your VoIP endpoint) at a consistent and continuous rate, so they can be successfully reassembled and reproduced as a sound for you to hear.  

Jitter in the context of VoIP occurs when voice packets arrive at the destination device in a non-consistent and untimely manner.  IP networks by their very nature, introduce some level of jitter in the rate of arrival of voice packets, and of any IP packets for that matter.  This is due to network congestion, latency, and other phenomena that can cause poor voice quality.  This can result in a distorted sounding voice, or even gaps of silence in the voice.  

Why does Jitter occur?

Depending upon the configuration, voice packets are sent from a VoIP endpoint at a constant rate.  For example, when using the G.711 voice codec, a voice packet is sent every 20 milliseconds (ms) with zero jitter at the outset.  Each packet is then individually routed through the network to reach its destination.  This means that each packet will experience a slightly different journey.  Continuously changing traffic patterns can cause one packet to be delayed due to congestion.  Dynamic routing may cause packets to take different paths to reach their destination.  A network failure may cause redundant links to kick in, a process that may cause several tens of voice packets to be lost completely.

The result is that packets arrive at the destination not with the original perfect periodicity of arrival at every 20 ms, but with some level of jitter, simply due to the very nature of an IP packet network. 

What does Jitter look like?

The following diagram illustrates and compares voice communication between an IP phone and a laptop with VoIP software, with and without jitter:

Theoretically, without jitter, each voice packet arrives with the interval between arrivals remaining constant at a value of X.  In a more realistic scenario, when jitter exists, this interval between packet arrivals is variable, where X ≠ Y ≠ Z.

How is Jitter measured?

Jitter is also known by its more descriptive title of packet delay variation.  From this title, one can understand that the level of jitter can be quantified using units of time and is defined as the difference between the expected arrival of a packet and its actual arrival.  Jitter can be evaluated for each individual packet, but such a measurement is not helpful.  More useful measurements include average jitter and maximum jitter over the course of a voice conversation.

What does Jitter sound like?

Jitter in the arrival of voice packets will result in a voice conversation where users hear voice drop-outs and clipped words.  According to industry best practices, an average jitter of about 30 ms is the maximum acceptable value a voice conversation should experience.  Average jitter of between 30 and 100 ms makes a voice conversation still intelligible, but noticeable gaps can be heard.  Anything above 100 ms renders a conversation unintelligible.

Dealing with Jitter

Jitter cannot be eliminated completely from a network, due to the very nature of the technology.  Fortunately, however, jitter can be dealt with and minimized in various ways.

Sound network design

This is especially important for businesses that maintain an internal data network over which VoIP communication takes place.  This can occur in situations where VoIP desk phones are used with a local SIP server, or even where cloud-based VoIP services such as Freshcaller or Aircall are leveraged locally on laptops, desktops, or smartphones.  Ensuring the network is provisioned appropriately for the expected traffic volume of all network applications, including voice services, is vital to mitigate against jitter.  

Quality of Service (QoS)

QoS is a series of methodologies that classify the various types of traffic on a network and give priority to certain types over others.  No matter how sound your network design is, there will always be situations where network traffic will surpass the infrastructure’s ability to support it and network congestion will occur.  QoS essentially ensures that priority traffic, such as voice packets, will be sent over the network with the appropriate priority and thus will not be delayed even in the event of severe network congestion, thus minimizing jitter for VoIP packets.

Jitter buffers

There are always cases where sound network design and QoS are unable to eliminate jitter completely.  This is especially true if you are using your VoIP services over a third-party network, such as over your mobile network’s data connection, a guest Wi-Fi network at a hotel or airport, or even on your home Internet connection.  In such cases, the end devices themselves must employ what is known as a jitter buffer.

What is a jitter buffer?

A jitter buffer is a small amount of memory storage in a VoIP endpoint and as its name suggests, is capable of buffering incoming packets, storing them for several milliseconds before sending them to the voice processor to be reassembled to reproduce the original sent sound.  This temporary storage introduces a slight delay that gives packets that may be late the opportunity to “catch up” so that the sound served to the receiver’s ear remains as continuous as possible.

Where is the jitter buffer?

A jitter buffer may be found within the hardware of an IP phone or programmed within the VoIP software on a laptop or desktop, or on a VoIP app on a smartphone.  The following diagram depicts a jitter buffer found within the VoIP software installed on a laptop:

Jitter Buffer

The red voice packets arrive with some level of jitter, but the jitter buffer stores them with a slight delay and serves them to the voice processor at a consistent rate, thus completely eliminating the jitter with which the packets arrived.

Are there drawbacks?

But you can get something for nothing.  The drawback here is that the jitter buffer will introduce a delay in the reassembly and regeneration of the voice for the receiver to hear.  If the delay is too great, it can become bothersome for the speakers.  A jitter buffer is able to compensate for jitter that is less than or equal to the delay it introduces.  In other words, the greater the delay, the greater the level of jitter that can be compensated for, but the more uncomfortable it becomes for the callers.

For example, in order to eliminate a jitter of about 100 ms, the delay introduced by the jitter buffer must be at least 100 ms.  But delay in voice conversations begins to become bothersome when it exceeds 150 ms, so you cannot simply increase the size of the jitter buffer indefinitely.  There’s a trade-off you must take into account.

Most VoIP endpoints, either as IP phones or as software, have the capability of configuring the size of the jitter buffer.  Typical buffer sizes should be around 30 ms since most well-designed networks should not exceed that.

Conclusion

Jitter is a phenomenon that is as strange as its name sounds.  It is not something we see in conventional telephony and even within the VoIP framework, it is not something that is readily or intuitively understood.  Comprehending concepts such as jitter helps those responsible for maintaining the technology to troubleshoot more effectively and resolve problems pertaining to poor voice quality.  Jitter, with the appropriate steps, can be identified and minimized, resulting in a higher quality user experience.