What are Thresholds for Good and Poor Network Packet Loss, Jitter and Round Trip Time for Unified Communications?
With Skype for Business and Microsoft Teams, we know that having a “good” network is important for user experience, but what defines good?
At Modality Systems we have a Diagnostics product that reports on Skype for Business network performance and we primarily look at average packet loss, jitter and round-trip time.
- Packet Loss This is often defined as a percentage of packets that are lost in a given window of time. Packet loss directly affects audio quality—from small, individual lost packets having almost no impact, to back-to-back burst losses that cause complete audio cut-out.
- Inter-packet arrival jitter or simply jitter This is the average change in delay between successive packets. Most modern VoIP software including Skype for Business can adapt to some levels of jitter through buffering. It’s only when the jitter exceeds the buffering that a participant will notice the effects of jitter.
- Latency This is the time it takes to get an IP packet from point A to point B on the network. This network propagation delay is essentially tied to the physical distance between the two points and the speed of light, including additional overhead taken by the various routers in between. Latency is measured as one-way or Round-trip Time (RTT).
Different codecs deal with imperfect networks better or worse, modern codecs like RTAudio and Silk dealing better with network issues than older codecs like G711. You should consider that in an SfB environment you’ll be using different codecs in different scenarios. You can argue over exactly what level of the above metrics is “good” or “bad” depending on the codec and the tolerance of the user, ultimately there are no definitively correct or incorrect thresholds as long as they are used to find issues and improve network performance. Getting hung up on exactly what is a “good” vs “poor” threshold is less important than finding and correcting issues.
What does Microsoft Define as Poor?
In QoE (The SfB Server Quality of Experience session performance database), going over one or more of these limits gets your session marked as “ClassifiedPoorCall”
So that’s over 500ms RTT, over 10% average packet loss and over 30 ms Jitter. Reference Jen’s great post here.
Personally, I think these are quite high, but there is little doubt if you are hitting 10% packet loss or over 500ms round trip.
What is Good?
More recent guidance from Microsoft for SfB Online Performance from the client to the Microsoft network edge recommends the following for optimal Skype for Business media quality,
It is interesting in this guidance the RTT must be below 100 for “optimal” performance, Packet loss at under 1% for any 15s interval (so effectively under 1% average) and Jitter under 30ms
|Latency (one way)||< 50ms|
|Latency (RTT or Round-trip Time)||< 100ms|
|Burst packet loss||<10% during any 200ms interval|
|Packet loss||<1% during any 15s interval|
|Packet inter-arrival Jitter||<30ms during any 15s interval|
|Packet reorder||<0.05% out-of-order packets|
The Lync Server Networking Guide from Microsoft (Lync_Server_Networking_Guide_v2.3.docx –a great detailed document), recommends the following thresholds:
- Packet Loss: On any managed wired network link, a packet loss threshold of 1% is a good value to use to find infrastructure issues
- Jitter: On a managed wired link, you should investigate jitter above 3ms
Thresholds formed around jitter values to determine whether audio is good or poor can be very misleading. This is because most modern VoIP software can adapt to high levels of jitter through buffering
- RTT: Much of the existing documentation about latency thresholds describes the 150ms threshold that the International Telecommunication Union – Telecommunication Standardization Sector (ITU-T) defines as acceptable for VoIP
For our reporting purposes, we use the thresholds of < 1% for Packet loss, < 20ms of Jitter and <300ms RTT as our “good”. The RTT is set as 300 as the ITU-T’s 150ms above is one way, not RTT, and SFB reports RTT.
So there is a range between “good” and “bad/poor”, where it’s OK, but not perfect, to not ideal but maybe passable. At Modality, we refer to that as “Impacted”. Some customers on their cooperate network suggest impacted is unacceptable. Others, with more variable networks or less network investment, might consider some impacted, while not ideal, a reality.
In Microsoft’s reports “impacted” is equivalent to yellow highlighted metrics. I have never been able to get an exact range from Microsoft on what triggers “yellow” on SSRS QoE reports or Call Analytics. Essentially anything between “good” and Microsoft’s “Poor” is “impacted”.
What about Mean Opinion Score (MOS)?
Mean Opinion Score (MOS) is an industry standard to measure voice quality. It is a score out of 5:
It is a score out of 5, but certain codecs can only reach certain levels. so you can’t consider it a pure network performance score where 5 is good, and 4 is worse. For example, G711 can score up to 4.30, RTAudio Wideband 4.10, but Siren only 3.72 and RTAudio Narrow Band 2.95. So you can only measure MOS relative to the codec used.
Microsoft doesn’t really recommend the use of MOS as a primary measure of Quality:
“ Real MOS measurement relies on individuals to provide their opinion of quality regarding audio clips of standardized lengths. Over the years, computer algorithms and databases have been developed to try to estimate MOS programmatically, based on payload analysis or network metrics analysis. These models are generally very accurate if the test audio samples are also of fixed lengths. Typically, these samples are around eight seconds long. Individuals can generally reach a consistent consensus in evaluating audio quality for short audio samples.
However, if the algorithms are used to calculate MOS for entire calls, the metric starts to deviate from real-world opinions of quality. For example, users experiencing audio distortions in calls might also consider the convenience or novelty of being able to place a call from their mobile application and disregard any actual issues. On the other hand, if the distortions interrupted an important conversation, even for a brief moment, individuals might not be so inclined to dismiss them. In large metrics database systems such as QoE, aggregating MOS using statistical functions such as AVERAGE() or MIN() can distort the view even further.” From The Lync Server Networking Guide from Microsoft