Training:How does VOIP Work
How Does VoIP Work?
This section provides an introductory overview of Voice over Internet Protocol (VoIP), a technology that enables the transmission of voice communications via IP networks.
VoIP works on the principle of audio sampling, where a computer records a sound (such as a human voice) at a high rate (typically at least 8,000 times per second) and converts these audio samples into digital data. Unlike traditional recording, where these samples are stored locally, VoIP sends these samples over an IP network to be played back on a different device.
The process of making VoIP function efficiently involves several key steps. Initially, the computer compresses the recorded sound samples to minimize the space they require, focusing particularly on voice frequencies. This compression and decompression process is handled by a tool known as a CODEC (compressor/decompressor). Numerous CODECs are available, and VoIP uses those optimized for voice compression, significantly reducing the bandwidth required compared to uncompressed audio.
After compression, the samples are grouped into larger units and inserted into data packets ready for transmission over the IP network, a process known as packetization. A typical IP packet can contain 10 or more milliseconds of audio, with 20 or 30 milliseconds being the most common.
A comparison could be made to sending postcards through traditional mail. Each postcard (packet) carries a limited amount of information. Sending a lengthy message would require multiple postcards (packets), and to ensure they can be assembled correctly at the destination, they are organized using a sequence number or similar mechanism.
Packets are sometimes delayed, just as with the postcards sent through the post office. This is particularly problematic for VoIP systems, as delays in delivering a voice packet means the information is too old to play. Such old packets are simply discarded, just as if the packet was never received. This is acceptable to a certain degree, as long as the assembled packets do not distort the sound. Too much delay will cause the sound to have less than desirable quality.
IP Devices generally measure the packet delay and expect the delay to remain relatively constant, though delay can increase and decrease during the course of a conversation. Variation in delay is called jitter. Delay, itself, just means it takes longer for the recorded voice spoken by the first person to be heard by the user on the far end. In general, good networks have an end-to-end delay of less than 100ms, though delay up to 400ms is considered acceptable (especially when using satellite systems). Jitter can result in choppy voice or temporary glitches, so VoIP devices implement jitter buffer algorithms to compensate for jitter. Essentially, this means that a certain number of packets are queued before play-out and the queue length may be increased or decreased over time to reduce the number of discarded, late-arriving packets or to reduce "mouth to ear" delay. Such "adaptive jitter buffer" schemes are also used by a wide variety of devices that deal with variable delay.
Jitter in Packet Voice Networks
Jitter is defined as a variation in the delay of received packets. At the sending side, packets are sent in a continuous stream with the packets spaced evenly apart. Due to network congestion, improper queuing, or configuration errors, this steady stream can become lumpy, or the delay between each packet can vary instead of remaining constant.
This diagram illustrates how a steady stream of packets is handled.
When an IP device receives a Real-Time Protocol (RTP) audio stream for Voice over IP (VoIP), it must compensate for the jitter that is encountered. The mechanism that handles this function is the playout delay buffer. The playout delay buffer must buffer these packets and then play them out in a steady stream to the digital signal processors (DSPs) to be converted back to an analog audio stream. The playout delay buffer is also sometimes referred to as the de-jitter buffer.
This diagram illustrates how jitter is handled.
If the jitter is so large that it causes packets to be received out of the range of this buffer, the out-of-range packets are discarded and dropouts are heard in the audio. For losses as small as one packet, the DSP interpolates what it thinks the audio should be and no problem is audible. When jitter exceeds what the DSP can do to make up for the missing packets, audio problems are heard.
This diagram illustrates how excessive jitter is handled.
Video works in much the same way as voice. Video information received through a camera is broken into small pieces, compressed with a CODEC, placed into small packets, and transmitted over the IP network. This is one reason why VoIP is promising as a new technology: adding video or other media is relatively simple. Of course, there are certain issues that must be considered that are unique to video (e.g., frame refresh and much higher bandwidth requirements), but the basic principles of VoIP equally apply to video telephony.
Of course there is much more to VoIP than just sending the audio/video packets over the Internet. There must also be an agreed protocol for how computers find each other and how information is exchanged in order to allow packets to ultimately flow between the communicating devices. There must also be an agreed format (called payload format) for the contents of the media packets.
VoIP is implemented in a variety of hardware devices, including IP phones, analog terminal adapters (ATAs), and gateways. In short, a large number of devices can enable VoIP communication, some of which allow one to use traditional telephone devices to interface with the IP networks.
In a well performing network, VoIP calls should be as clear or clearer that and other type of audio transmissions. VoIP calls are pure digitized sound. Each audio packet contains the pure audio just exactly as it is spoken into the microphone.
High definition voice contains a wider range of frequencies than typical voice transmissions and will deliver surprisingly good audio that contains a richer sound than most toll quality calls.
VoIP Protocols
The success of VoIP communication hinges on employing an appropriate set of protocols. Here, we'll discuss IPitomy SIP Trunking, the preferred protocol for all IPitomy devices currently in circulation as well as many other third-party industry offerings.
The Real-Time Protocol (RTP) is a standard used globally by nearly every device for transmitting audio and video packets between computers. RTP, guided by open standards outlined in various documents, manages issues like packet order and employs mechanisms such as the Real-Time Control Protocol to address delay and jitter.
Before media can flow between two devices, protocols are utilized to locate the remote device and negotiate the media flow methods. These crucial protocols are known as call-signaling protocols, with the Session Initiation Protocol (SIP) being the most widely used.
Advantages of IPitomy SIP Trunks
SIP, a text-based protocol, is relatively straightforward for machines to understand. Its easy readability facilitates troubleshooting by allowing the inspection of packet contents without needing to decompile the software entirely. SIP's operation mirrors that of email, making it familiar and intuitive. Its addressing methods are particularly similar. SIP calls offer undiluted digital voice quality, free of distortion, delay, or echo in its native environment. Instant Scalability: Given IPitomy's IP PBX platform, capacity can be easily adjusted via a simple process in the admin GUI. Interoperability with a wide range of devices provides users with choices far beyond what any single vendor can offer. International dialing is supported. User location is inconsequential; remote users can be deployed globally without losing direct connectivity. An adequate internet connection is the only requirement.
Understanding a SIP Call
A SIP call comprises a signaling component and a Voice component. The paths for signaling and actual voice transmission differ for each call. Setup and teardown signaling for all calls operate over port 5060. Voice transmissions are conducted within the range of ports 10,000 to 20,000. These are virtual ports integral to TCPIP communication.
Preconditions for a SIP Call
Before a SIP call can occur, SIP endpoints capable of finding and being found by other SIP endpoints are needed. We'll restrict our SIP examples to endpoints that register to an IPitomy IP PBX for this training guide. While peer-to-peer SIP calls are possible, most are facilitated via an IP PBX or soft switch for a PSTN-like ease of dialing. The advantage of an IP PBX is that it simplifies calling another endpoint, allows call information to be stored for reporting, and manages call routing to local and remote extensions. Users dial phone numbers and extension numbers as they would with a legacy PBX system. Although no longer recommended due to their End-Of-Life status, an IPitomy IP PBX still supports analog PSTN lines and T1/PRI cards. SIP endpoints must register with the PBX to be included in the PBX database. Once registered, they can dial phone numbers and receive calls from other endpoints. These endpoints can be phones on the Local Area Network (LAN) or any location on the internet.
Starting a SIP Call
To start a call, the endpoint sends an invite to the server requesting the other endpoint's availability. This occurs on the signaling port (port 5060). If the other endpoint is ready, it sends an acknowledgement back to the initiating phone, which then sends the call information instructing the other phone on the ports to commence the RTP (voice) session. The RTP session is opened using the communicated ports.
Call termination is initiated by one of the endpoints sending a "bye" message, causing the call to hang up. This is a simplified explanation of a SIP call's lifecycle. When the call occurs on the LAN, it bypasses the router. The PBX instructs the endpoints on the ports to connect the RTP (voice) stream.
For calls to remote phones, the PBX understands that the phone is beyond the firewall. Router port configuration becomes necessary for signaling and RTP traffic at this stage.
Signaling Port 5060
Proper port configuration enables port 5060 to be forwarded in the router to the PBX system's LAN IP address, allowing the PBX to send signals to remote phones and receive requests from them.
RTP Ports 10,000 – 20,000 Port Range Forwarding
Once call setup occurs via signaling on Port 5060, RTP is set up using a range of ports forwarded to the PBX LAN IP address. The Port Range Forwarding feature in the Router is used to forward the range of Ports 10,000 – 20,000 for this purpose.
Each call requires two ports for RTP - one for sending and one for receiving. This is organized by the initiating phone. The router ports are open from inside the firewall. The remote phone receives the information about which ports to use from the SIP packets.
Local Phone Diagram
Remote Phone Diagram
Network Address Translation – NAT
TCPIP is the protocol for sending data on the Internet. It relies on unique IP addresses in order to get the proper data to the proper computer/device on the network. There are several different types and classes of IP address.
If you are reading this, you are most likely connected to the Internet and there's a very good chance that you are using Network Address Translation (NAT) right now!
The Internet has grown larger than anyone ever imagined it could be. Although the exact size is unknown, A total of 5 billion people around the world use the internet today – equivalent to 63 percent of the world's total population. Internet users continue to grow too, with the latest data indicating that the world's connected population grew by almost 200 million in the 12 months to April 2022.
So what does the size of the Internet have to do with NAT? Everything! For a computer to communicate with other computers and Web servers on the Internet, it must have an IP address. An IP address (IP stands for Internet Protocol) is a unique 32-bit number that identifies the location of your computer on a network. Basically it works just like your street address: a way to find out exactly where you are and deliver information to you.
When IP addressing first came out, everyone thought that there were plenty of addresses to cover any need. Theoretically, you could have 4,294,967,296 unique addresses (232). The actual number of available addresses is smaller (somewhere between 3.2 and 3.3 billion) because of the way that the addresses are separated into Classes and the need to set aside some of the addresses for multicasting, testing or other specific uses.
With the explosion of the Internet and the increase in home networks and business networks, the number of available IP addresses is simply not enough. The obvious solution is to redesign the address format to allow for more possible addresses. This is being developed (IPv6) but will take several years to fully implement because it requires modification of the entire infrastructure of the Internet.
NAT Diagram – One Public IP Address is used by many Devices/Users Under the current IP addressing scenario (IPv4) there are a finite number of IP addresses available on the Internet. There are not enough IP addresses available for each device to have their own unique IP address. To solve this problem, all routers have the ability to send data to devices through a Network Address Translation (NAT) process. This process allows a group of devices (like PC’s and Phones, etc.) to all share one Internet IP address. This process has stretched out the usefulness of the current IP address scheme until the next numbering scheme (IPv6) is fully deployed.
NAT works by the router passing data to devices because it is aware of the address of the specific devices on the local area network. The information you download to your PC comes directly to your PC because you have a unique internal IP address and a unique MAC ID.
When a device from outside of the local area network, wants to communicate from the Internet to a device on the LAN, it needs a path to guide it to the specific device (like the PBX). In the case of a remote IP phone, when the remote phone wants to make a call, it needs to send some packets to the PBX. In order to do that, the router needs to be instructed on where to send the IP phone packets. When port 5060 is forwarded to the PBX on the LAN, all traffic that comes in on port 5060 gets directed to the PBX.
Once the call is setup, the RTP traffic is directed to ports for sending and receiving. These ports are determined through instructions in the call setup SIP packets. If the port forwarding is not configured properly, the remote phone will not function properly. The symptom most often associated to “one way audio” is almost always caused by improper configuration of the RTP ports in the router. Some routers support Application Layer Gateway(ALG) functionality. While this usually appears to be designed for SIP, it most often interferes with packet delivery and must be turned off.
It is easy to see how the RTP stream can be disrupted if the voice packets cannot reach the proper destination. Sometimes this is caused by the router configuration. Sometimes it can be the inability of the router to properly perform NAT functions. Some routers are simply not capable of NAT and therefore will not work with remote IP phones.
It is essential to be in a position to have port forwarding enabled for remote access for maintenance, remote phones and branch office connectivity. If a third party is in control of the router, it is in everyone’s best interest to have these ports forwarded and the ALG turned off and confirmed before the IP PBX is installed. Failure to have these ports forwarded will result in implementation delays and must be a consideration when proposing a price for the end customer.
IP Telephony over TCPIP using the SIP protocol produces pure digitized sound. There are no functions inside the PBX to add sounds like “static”, echo, hiss or hum. All of these sounds if present are produced in the analog world or are the result of packet loss. In order to troubleshoot issues on a TCPIP packetized network, it is necessary to look for the solutions in the most likely places.
If a customer complains of static, it is most often packet loss in an IP network or an analog entry point like a handset. To identify the source of the problem, first check the analog connections e.g. handset, handset cable etc. Try a known good handset and cord. If that doesn’t solve the problem, run a test for packet loss.
IP Phones are intelligent devices and are not dependent on a circuit. It is easy to simply unplug the phone and plug it into another Ethernet connection. If that fixes the problem, plug a known good phone into the Ethernet connection of the phone that had issues. If a known good phone is plugged in to the Ethernet connection and exhibits the same problem, check the cables and connections for problems. Make sure the Ethernet cables are not draped over fluorescent lights are other devices that can induce distortion into the packet delivery process.
Implementing Quality of Service (QOS) is Critical in Your VoIP Installation
Implementing QOS has huge benefits for your VoIP application. Don’t underestimate the importance of setting this up properly. Proper configuration can save customers from a difficult experience as well as keep your support costs down.
What Does QOS Do?
QOS sets the priority for data packets on your LAN. The LAN has packets from a diverse set of applications all traveling through a limited amount of bandwidth. Voice occupies a very small portion of the bandwidth. Since the voice packets are delivered in a time sensitive manner, it is important that they do not get interrupted or delayed. If they do, the audio quality on the call can deteriorate to a noticeable degree.
Voice Packets vs. Regular Data Packets
Voice Packets are distinguished from other data packets by a designation in the voice packet Header. This allows the data switch to know how to prioritize the individual packets to avoid delaying voice packets. Networks always try to deliver data on a best efforts basis. If there is bandwidth available, the data switch will try to pass all of the packets through as soon as it gets them using all of the available bandwidth. If this happens, the voice packets can be momentarily blocked by all of the other data. Even though this may only take a few seconds, it is enough of a delay to cause the phone call to experience audio interruptions as packets are delivered too late to be able to be used. By prioritizing the voice packets, you insure that the voice will never be interrupted. Since the voice is a very small percentage of total bandwidth, there is no noticeable effect on all of the other data packets.
An example would be that 10 people on the LAN are trying to download a 20 meg file at the same time. In a normal 100 base T network that could completely block all data traffic for a brief time. By prioritizing the voice packets to always take priority over the data packets, the voice is delivered without delay because the downloaded file makes room for the voice packets with little or no perceptible delay to the downloads.
How Do I Set up QOS?
QOS is set up in the data switch. The IPitomy server will have settings that it uses to identify the data packets. These settings are set to CS3 by default. The data switch will need to be configured to give the highest possible priority to these data packets. Switches use a variety of QOS labels so you will have to determine the scheme (specification) of the switch to be used. Since IPitomy uses the DSCP Class label, just match that label in the switch to the switch’s highest priority (this may be a digit or “Highest” as in the Netgear FS728TP). It’s important to know that no other devices on the network are utilizing that Class ID. If there are, change them or the IPitomy PBX under PBX Setup/SIP/Advanced. The Class ID used for voice traffic must not be used by other, non-voice data devices.
Note: QOS can only be set on the LAN [in the data switch(es)], it is not relevant on the WAN (Internet) since this media is routed by “hops” for which you have no control. The exception to this is private WAN’s like MPLS where the network provider may be able to configure QOS point-to-point.
VOIP (RTP) works best when QOS is set on the LAN. As a rule, always implement QOS.
For more information on setting up QOS: Click Here.