Voice over IP is composed of a whole series of technologies that interconnect and interact in very complex ways. For this reason, it can be difficult to comprehend the various elements of VoIP and how they all fit together.

One of the most misunderstood components of VoIP is the codec.  What is it and what does it do?  Where does it reside within the whole scheme of things and why is it important?  In this article, I will endeavor to simplify and clarify this concept and to enlighten you with an understanding of the role and significance of codecs within a voice network.

What is a codec?

Etymologically, the term codec is a portmanteau (a linguistic blend of words) of “coder-decoder”.  So a voice codec, in its simplest description, is something that encodes and decodes voice.  English lessons aside, that may be true, but what does it mean practically?  Let’s get into some more details.

Why do we need codecs?

Voice, like all sounds we hear, is inherently analog in nature.  VoIP as a technology requires this analog voice to be stored in a digital format and separated into a series of packets to be transmitted over a network.  We need some mechanism to convert analog voice into digital form for VoIP to function.  

This is what a codec does.  It encodes or digitizes voice using specific standardized parameters, allowing that voice to be packetized and transmitted over an IP network.  The same codec is then used to decode and reassemble the digitized voice and to reproduce it as a sound that can be heard and understood by human ears.

Where does a codec reside?

A codec is not a device, but a series of rules that are used for encoding and decoding voice.  These rules determine the method by which the voice will be read, converted, and stored.  It is the role of a Digital Signal Processor (DSP), a dedicated hardware microprocessor, to actually perform the encoding and decoding based on the particular rules of the chosen codec.  DSPs are found in all VoIP-enabled devices including IP phones, voice gateways, and IP PBXs.  In cases where VoIP endpoints are found on a smartphone or a desktop or laptop, it is simply the device’s CPU that plays the role of the DSP.

Each device that encodes and decodes voice is configured to do so using a particular codec—in other words, to use a specific set of rules for the converting of voice to the necessary format for transmission using VoIP.

Are there different types of codecs?

In actual fact, there are many different types of codecs and each one has specific uses and parameters.  The choice of codec for each voice communication has profound effects on many aspects of that communication including both user experience and network efficiency.  What codec you select to use will affect the quality of the voice heard, as well as the nature of the voice packets placed on the network.

What are the characteristics that define a codec?

As we said before, a voice codec is essentially a set of rules used to encode, digitize, and transfer voice over an IP network.  It must be noted that this is something distinct from the SIP protocol.  The primary parameters that a codec defines in this procedure are sampling rate, bit depth, and compression.

Sampling rate

The primary difference between an analog and a digital signal is the fact that an analog signal is continuous, while a digital signal is discrete.  When taking something like sound, which is analog and thus continuous, and digitizing it, you must record “samples” of that sound at discrete intervals of time. 

Refer to the following diagram: 

Sampling rate

The grey line represents a sound wave as it progresses over time.  As you can see it is continuous.  The black dots represent the points in time where the samples are taken.  As you can see they are discrete but are “sampled” at a consistent rate over time.  Each sample is then stored as a digital number that represents the amplitude, or the intensity of the sound wave at that point in time, which is indicated by the red lines, resulting in a digital representation of that sound.

How often a sound is sampled when digitizing it is called the sampling rate.  The more often the sound waveform is sampled, the more accurate the digital representation of that sound becomes.  The following diagram shows the same sound wave sampled at twice the rate.  

Twice the rate

This results in a truer digital representation of the original sound but also increases the amount of data needed to represent that sound.  You have twice as many data points, and thus, twice as much data to store and transmit.  The sampling rate is measured in Hertz (Hz), which is a unit of frequency per second.

Telephone quality voice typically uses 8000 Hz as the sampling rate, which means that a sample of the voice is taken 8000 times a second.  This results in the familiar timbre that voice over the telephone possesses.  In comparison, CD-quality sound is sampled at a minimum of 44100 Hz.

A voice codec, therefore, defines the sampling rate that is used to digitize voice.

Bit depth

When digitizing voice, each sample of sound must be represented as a series of bits.  The more bits used to represent each sample, the more accurate the representation.  Take a look at the following depiction of a sound:

Original sound

The waveform is continuous, as expected.  Now take a look at this same waveform in its digital representation that is sampled using a different number of bits for each sample.

1 bit - 16 bit

The more bits used to represent each sample, the more accurate each sample is represented digitally.  The number of bits used to represent each sample is called bit depth.  A 4-bit bit depth will provide 24=16 values for each sample while a 16-bit bit depth will provide 216=65536 values per sample, vastly increasing the accuracy of approximation of each individual sample.

Typically, telephone quality voice uses a bit depth of 8 bits, allowing up to 256 values of representation for each sample.  For comparison, CD-quality audio uses a minimum of 16-bit bit depth.  The higher the bit depth, the more accurately each sample is represented and stored.  Like sampling rate, however, the higher the bit depth, the more data is needed to represent that same sound.

So a voice codec also determines the bit depth that is used to digitize voice.

Compression

In order to achieve a better quality of voice, both the sampling rate and bit depth should be increased.  But, there is a tradeoff.  This also increases the amount of information needed to represent the voice.  In order to counteract this tradeoff, a codec also introduces compression algorithms that are used to maintain a higher level of quality while decreasing the actual size of the digitized voice.  While compression does decrease the sound quality somewhat, it can achieve better quality to size ratio, increasing the efficiency of voice transmission.

Bringing it all together

Even if you haven’t fully grasped the physics and mathematics involved, the codec you use for your communication simply defines the following:

  • Bit depth – determines the accuracy of the representation of sound for each sample
  • Sampling rate – determines the frequency with which samples are taken of a soundwave
  • Compression – decreases the size of the digital form of the sound using compression algorithms so it can be transmitted more efficiently

Codecs are primarily configured within the VoIP end device or inside the control panel of the VoIP provider you use, such as Ooma or Aircall.  The codec you choose to use will depend upon the network bandwidths that are made available to your VoIP endpoints, as well as the voice quality that you desire to be experienced by your users.

Examples of commonly used Codecs

Codecs are defined by standardized bodies such as the International Telecommunications Union (ITU) and the Internet Engineering Task Force (IETF) so that products of multiple vendors can interact and communicate.  Some of the most common codecs used in today’s VoIP networks are included in the following table:

Codec

Sampling

Rate

Bit depth

Compression rate

Bitrate on the network

Standardized by:

G.711

8 kHz

8-bit

Average

64 kb/s

ITU

G.729

8 kHz

16-bit

Very high

8 kb/s

ITU

G.722

16 kHz

14-bit

High

48-64 kb/s

ITU

G.726

8 kHz

13-bit

High

16-40 kb/s

ITU

iLBC

8 kHz

16-bit

Very high

13.33 – 15.2 kb/s

IETF

Opus

8 – 48 kHz

8-bit to 32-bit

Very high

5 kb/s – 32 kb/s 

IETF

From this table, you will see that there are codecs that can provide close to CD-quality sound over the telephone, something that was unheard of on more traditional PSTN technology.

Conclusion

Voice codecs are all about delivering the best voice quality for the least amount of bandwidth.  By making an informed decision about which codec to use, you are ensuring that your users enjoy a high quality of service while guaranteeing the most efficient usage of network bandwidth.