> Hi all, > > I am evaluating the quality of the speech during a VoIP call between two > PJSUA applications running on two different windows XP hosts. > > For this purpose I managed PJSUA with a script so I was able to perform > automatically about 1500 calls. > > During a call each peer entity plays a reference speech sample that is > recorded on the other side of the call. Then a speech quality evaluation is > performed using a PESQ tool. > > The two machines hosting pjsua application are directly connected trough a > Ethernet switch in order to minimize the network impairment. > > > > The codec used during the call is G.711 and the expected PESQ score for > such codec is 4.40. > > The results of my tests highlight the speech quality is instable. That is, > often the PESQ score is the expected one but sometimes (about 10% of total > measures) the score is significatively less (3.50). > > Did anyone perform such type of tests or have experience of such type of > speech quality problems? > > > > Thanks for doing the tests and sharing the results. We also have PESQ tests as part of the automated unit tests framework (Python based, on pjsip-apps/src/test-pjsua directory), and got the report about intermittent audio degradation too. Our suspicion now lies with the jitter buffer. During the initial call establishment, perhaps due to high activity in the signaling thread, or the difference in the call establishment time between caller an callee, some RTP packets will be queued in the socket buffer and once the media is started these RTP packets will be stored in the jbuf in a burst. Often this burst exceeds the jbuf maximum size (it was 340ms), hence it will be discarded. At other times, some frames will also be discarded by the jbuf when it tries to optimize the latency. These discard operations will cause click noise in the playback, causing the PESQ score to degrade. That's probably the cause of your results. We have a ticket for this (http://trac.pjsip.org/repos/ticket/638) and Nanang has a pending commit for this ticket, hopefully the situation will improve by then. It will be good if you could retest again with the new changes then, to get a second opinion on this. How long did you set the call duration to? I suspect we will get better PESQ score if you run the call to longer duration, after the media is stabilized after the initial setup activity. Cheers Benny [Montevecchi] Hi, Benny. Thank you for your answer. I can perform PESQ tests on each pjsip version you are going to release. As I told you, I have an automatic test environment based on windows xp hosts (cpu: intel Pentium M 1.8 Ghz). A single call of my tests is about 1 minute long and each party plays the same speech sample 4 times. PESQ evaluation is performed for each played sample. I noticed (that confirm your opinion) that often the worst PESQ score is the first one. But sometimes also some PESQ score in the middle is bad. If it could be helpful I can provide you a lot of traces. Best Regards. Massimiliano