Degradation of Speech Quality

massimiliano.montevecchi@xxxxxxxxxxxxxxxx (Massimiliano Montevecchi) · Wed, 24 Sep 2008 09:18:22 +0200

>  Hi all,
>
> I am evaluating the quality of the speech during a VoIP call between two
> PJSUA applications running on two different windows XP hosts.
>
> For this purpose I managed PJSUA  with a script so I was able to perform
> automatically about 1500 calls.
>
> During a call each peer entity plays a reference speech sample that is
> recorded on the other side of the call. Then a speech quality evaluation
is
> performed using a PESQ tool.
>
> The two machines hosting pjsua application are directly connected trough a
> Ethernet switch in order to minimize the network impairment.
>
>
>
> The codec used during the call is G.711 and the expected PESQ score for
> such codec is 4.40.
>
> The results of my tests highlight the speech quality is instable. That is,
> often the PESQ score is the expected one but sometimes (about 10% of total
> measures) the score is significatively less (3.50).
>
> Did anyone perform such type of tests or have experience of such type of
> speech quality problems?
>
>
>
>
Thanks for doing the tests and sharing the results. We also have PESQ tests
as part of the automated unit tests framework (Python based, on
pjsip-apps/src/test-pjsua directory), and got the report about intermittent
audio degradation too.

Our suspicion now lies with the jitter buffer. During the initial call
establishment, perhaps due to high activity in the signaling thread, or the
difference in the call establishment time between caller an callee, some RTP
packets will be queued in the socket buffer and once the media is started
these RTP packets will be stored in the jbuf in a burst.

Often this burst exceeds the jbuf maximum size (it was 340ms), hence it will
be discarded. At other times, some frames will also be discarded by the jbuf
when it tries to optimize the latency. These discard operations will cause
click noise in the playback, causing the PESQ score to degrade.

That's probably the cause of your results.

We have a ticket for this (http://trac.pjsip.org/repos/ticket/638) and
Nanang has a pending commit for this ticket, hopefully the situation will
improve by then. It will be good if you could retest again with the new
changes then, to get a second opinion on this.

How long did you set the call duration to? I suspect we will get better PESQ
score if you run the call to longer duration, after the media is stabilized
after the initial setup activity.

Cheers
 Benny

[Montevecchi] Hi, Benny.
Thank you for your answer. I can perform PESQ tests on each pjsip version
you are going to release. As I told you, I have an automatic test
environment based on windows xp hosts (cpu: intel Pentium M 1.8 Ghz).

A single call of my tests is about 1 minute long and each party plays the
same speech sample 4 times. PESQ evaluation is performed for each played
sample. I noticed (that confirm your opinion) that often the worst PESQ
score is the first one. But sometimes also some PESQ score in the middle is
bad.

If it could be helpful I can provide you a lot of traces.

Best Regards.
Massimiliano