I have finished the first stage of my work on resampler quality evaluation. The scripts are here: https://gitorious.org/psy-eval/psy-eval/ The results are here: https://imgur.com/a/jtIEj Note: they are valid only for 44100 -> 48000 Hz resampling. But that's the common case. TL;DR summary: it makes sense to change the default resampler quality from the current "speex-float-1" value to "speex-float-3" or even "speex-float-5" on capable machines, otherwise the distortion is sometimes noticeable. And, speex-float-{3,5} are similar to what proprietary OSes offer. The work is based on the question: does a human listener notice the distortion introduced by a resampler? To answer that, I used a psychoacoustical model publicly available at the following URL: http://www.mp3-tech.org/programmer/docs/6_Heusdens.pdf The paper was chosen because it is short, the model is simple, newer than the PEAQ monster, does not need special treatment of noise vs tones, provides one number as the answer, and because I have already used it in dcaenc. From that paper, Eq. (5) is the equation that we need. We put the power of signal and distortion at each frequency in, and get a single number out. If this number is less than 1, the distortion is not audible. If it is greater than 1, then the distortion is not audible. As that number turns out to be a ratio of powers, it can also be converted to dB with the usual 10 * log10(D(m,s)) formula. The paper takes the following factors into account: * absolute threshold of hearing, * perceptual masking of nearby frequencies by a tone, * temporal masking. I have removed the temporal masking from the model by omitting L? from Eq. (5), because it is not relevant in the resampler-evaluation case, as users can play arbitrarily-long tones. So, given the formula, we need to feed something as input. The idea is: * Generate a test wav file (with wavegen.py). * Play it through the resampler. * Capture the output as a wav file. * Analyze the result (with resampler_plots.py). To capture the resampler output, two techniques were used. For PulseAudio resamplers, we can create a null sink, play a wav file with paplay and record the result with parecord through its monitor. Unfortunately, parecord inserts some garbage at the beginning. For resamplers built into third-party operating systems, a patched QEMU was used. The patch deliberately cripples the emulated HD Audio card, so that it accepts only 48 kHz, forcing the guest to resample. The resampled output was captured using QEMU_AUDIO_DRV=wav. Some other environment variables have to be set so that QEMU itself does not resample and to reduce the chance of dropouts in the recording. Patch: --- qemu/hw/audio/hda-codec.c 2014-07-06 18:46:20.764429441 +0600 +++ qemu/hw/audio/hda-codec.c 2014-08-20 21:58:32.661701409 +0600 @@ -114,7 +114,7 @@ #define QEMU_HDA_ID_VENDOR 0x1af4 #define QEMU_HDA_PCM_FORMATS (AC_SUPPCM_BITS_16 | \ - 0x1fc /* 16 -> 96 kHz */) + 0x040 /* 48 kHz only */) #define QEMU_HDA_AMP_NONE (0) #define QEMU_HDA_AMP_STEPS 0x4a The test signal is a TPDF-dithered 16-bit sine wave with a linearly changing frequency. This way, we can know the frequency of the signal given only a timestamp. The scripts can detect the frequency/time slope automatically and extrapolate it into the area where the resampler (rightfully or not) suppresses the signal. So, for each portion of the resampled wave, we know the signal frequency. Ideally, this frequency component should have the same amplitude as input if it is below half of the new sample rate, and the zero amplitude otherwise. Also, there should be no other frequency components. So, the conclusion is quite obvious: treat the reproduced part of that component as the signal, and all others (plus the missing part of the main component) as a distortion. Under that definition, the plots that say "Limited bandwidth counts as distortion" below them were made. They display audibility of all distortions, as defined above, as a function of the input sine wave frequency, for a selection of resamplers. The sine wave is assumed to be at the full amplitude, which corresponds (as it is a common convention in psychoacoustical models) to 92 dB SPL. Note: do not listen at this volume. It is harmful. But it is also the worst case for the psychoacoustical model. Also, audibility of the distortions inherent in a TPDF-dithered 16-bit input is shown as "quantization noise" on the same plots. As you see, 16-bit input and TPDF dithering do not result in audible distortions. Unfortunately, there is a bug on win81 plots, because Windows Media Player by default attenuates the file by 6 dB, and my scripts compensate for that, but also amplify the quantization noise. I am too lazy to fix this today. Please shift the whole win81-wmp curve down by 6 dB, and you'll hopefully get an approximately correct result. As you can see, some resamplers allegedly create audible distortions for high-frequency inputs. That's expected: to offer good attenuation of unrepresentable frequencies (those above either old or new Nyquist frequency), they need to somewhat attenuate representable ones. This attenuation is counted as a distortion, and it indeed can be noticed if one is offered a direct comparison of resamplers that put the cut-off frequency in different places. All that is needed is a high-frequency sine wave that is attenuated, although ideally it shouldn't be attenuated. Obviously, nobody listens to such sine waves, so this is an artifact of the method. This artifact is somewhat ignorable for 44100 -> 48000 Hz conversion, as it doesn't prevent one from creating a resampler that never introduces audible distortions (example: speex-float-5). However, it is expected to become a problem if one considers the VoIP use case, with lower sample rates, and lower transition frequencies. As an attempt to work around the problem, I have also plotted audibility of the distortion vs input signal frequency without treating this attenuation of the main tone as a distortion. Look for "Limited bandwidth does not count as distortion" below the plot. As you can see, under the old problematic definition, the following resamplers are indistinguishable from a perfect one (i.e. audibility of distortions never goes above 0 dB): speex-float-5, soxr-mq, src-sinc-medium-quality, and their better variants from the corresponding families. Under the new definition of distortion, the following resamplers also become perfect: soxr-lq, src-sinc-fastest, macosx, wine. And maybe win81-wmp if I remeasure it. It's quite sad that the current default in PulseAudio was influenced by the needs of low-power embedded devices at the measurable expense of the sound quality on the typical desktop. Now, with plots, figures and knowledge in hand, we can fix it. I'll leave other metrics, different sample rates, and evaluation of distortions introduced into typical music and speech for my talk at the audio mini conference. P.S. The following resamplers are not on the plots: src-zero-order-hold: exactly the same as trivial. speex-float-4: very very similar to speex-float-3. Not perfect. speex-float-2: worse than speex-float-1. Please ignore them. -- Alexander E. Patrakov