[tl;dr: this is still a synthetic test that, unlike the previous one, you have to perform yourself if you want to see any results] Previously, I have posted some quality-evaluation results for resamplers that can be used by PulseAudio, and compared them to the resamplers used in Windows 8.1 and Mac OS X: http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html The conclusion of that work was that we need to use speex-float-5 to match the metric of "never introducing audible distortions" (that other operating systems meet by default) when resampling from 44.1 to 48 kHz. However, David Henningsson argued that this "never" included a lot of unrealistic worst-case conditions, i.e. that the quality achieved in proprietary OSes is actually overkill. The worst-case conditions include: 1. Unbearably loud (92 dB SPL) sound from speakers or headphones. People don't listen at such levels. At lower levels, the distortions also have lower sound pressure, and may become unnoticeable. 2. Absolutely quiet room (except for this sound and resampler distortions). In a noisy room, noise can mask ("outvoice") the distortion. 3. Perfect speakers or headphones that don't distort sounds at all by themselves. Maybe headphone distortions can mask resampler distortions? 4. Sine wave (and not music or speech) as a test sound to be distorted by a resampler. Maybe other frequency components can mask resampler distortions? This email deals with the first two objections. I plan to take (4) into account later (and won't consider the result worth any salt until I do that), but I can't take (3) and (4) into account simultaneously due to the lack of required theoretical knowledge. Taking only (3) into account is meaningless, because a trivial solution exists. Namely, if a resampler's distortion of a pure tone with the frequency above 10 kHz is audible on ideal speakers (even in a noisy room), then it is also audible in the same room on arbitrary crappy speakers that reproduce the _distortion_ with the correct amplitude and don't amplify the signal too much. Indeed, crappy speakers (unlike resamplers), when fed with a sine wave, only produce harmonics as distortions. In the case of a sine wave with frequency greater than 10 kHz, all such harmonic distortions are ultrasound, which is inaudible and cannot mask resampler-introduced distortions. As there are no two rooms with the same noise [see http://stevetarzia.com/localization.php], and because people don't agree on the proper sound pressure level from the playback equipment, the proposal is for you, the reader, to get the results for your listening equipment and your room, using my scripts. Spoiler: if you listen to music at such volume that full scale corresponds to 60 dB SPL, and your room noise is 35 dBA, you may find that speex-float-0 is adequate. On my Sony VAIO Z23A4R laptop and its built-in speakers, in my room, speex-float-1 does produce audible distortions on full-scale high-frequency sine waves (where only the distortion is audible, and not the original signal), and I have verified it with a direct test. git clone git://gitorious.org/psy-eval/psy-eval.git You will also need python2.7, numpy, scipy and matplotlib. Also you need, as a 16-bit uncompressed wav file, a recording of your room noise with a high-quality condenser microphone and sound card with known sensitivity (so that you can get the sound pressure in physical units from the samples). Alternatively, you can use an uncalibrated recording paired with a noise meter reading on its "A" setting (so that the result is in dBA). High quality is needed so that the scripts see the actual room noise and not microphone/soundcard self-noise, especially at high frequencies. If you don't have that, please use the room noise recording provided by David Henningsson (see the end of this email). So, here is a procedure to determine mathematically whether a resampler produces distortions audible in your room on sine wave signals of your typical listening volume. I realize that the steps starting from 2 can be short-circuited by playing back test.wav (with both the default and alternate rate set to a non-matching value in /etc/pulse/daemon.conf) and listening for additional weak tones of obviously-wrong frequency. Please treat that shortcut as model validation. We still need a model so that we can judge new resamplers for you without ever needing your ears or playback equipment again. 1. Generate a linear-frequency-sweep signal: [for the 44.1 -> 48 kHz case, you can find pre-generated resampled files via the link at the end of this email and skip directly to step 3 or even 5] ./wavegen.py --rate 44100 --length 1048576 --amplitude 0.9 --format s16 --padding 131072 test.wav --rate: the sample rate you want to resample from --length: the length of the useful portion of the file, in samples. The half of the FFT size squared (i.e. 524288 for the FFT size of 1024) is the bare minimum which may produce unreliable results, especially when downsampling. The other script autodetects the rate at which the frequency changes, so the end result should be the same if you produce a longer file. --amplitude: the amplitude of the wave, with 1.0 being the full scale. Keep it slightly lower, as some resamplers overamplify certain frequencies a little. The other script autodetects the amplitude, so the result should be the same. --format: s16 or float. As the quantization noise is inaudible at 16 bits, this doesn't really matter. --padding: adds some silence before and after the useful portion of the wav file. The analysis script automatically cuts it out, provided that there are no clicks before the leading silence in the recording. test.wav: the script will save the signal there. 2. Resample the test signal. A slow but easy way to do this involves a null sink and its monitor source. First, set the needed resample method in /etc/pulse/daemon.conf and restart PulseAudio. Then, load the null sink with the rate you want to resample to, play the test signal through it and record the result using its monitor. pacmd load-module module-null-sink rate=48000 parec -d null.monitor --fix-rate --rate=48000 --file-format=wav resampled.wav & paplay -d null test.wav ; killall parec 3. Get a recording of your room noise, as a 16-bit uncompressed wav file. 5-10 seconds are enough. Stereo recordings are OK, in this case the script will only use the left channel. The analysis script is smart enough to ignore short bursts of unwanted sound (e.g. clock ticks). As already said, you will need either the dB SPL number corresponding to the full scale of the recording, or a dBA reading of the noise meter. If you don't have a noise meter, assume 35 dBA. 4. Get a measurement of sound pressure level corresponding to the full scale at your preferred volume. There are two ways to do this: with a hardware noise meter or with a calibrated microphone. In both cases, you will need a 1 kHz test file. Here is how to make a 1 kHz test file: ./wavegen.py --rate 44100 --length 1000000 --amplitude 1.0 --constant-freq 1000 1000Hz.wav Play this file back, and either take the noise meter dBA reading (which at 1000 Hz is the same as dB SPL), or record the sound using a microphone and sound card with a known sensitivity. In the second case, make sure that the sine wave occupies at least 90% of the recording duration (i.e. that there is not too much leading or trailing silence), and use the software noise meter: ./noise.py --noise-full-scale 84 --sine recorded-1000Hz.wav where --noise-full-scale is the dB SPL value corresponding to the full-scale recorded signal. You can get it if you know the sensitivity of your microphone and the sound card. --sine turns off the median-vs-mean adjustment logic that is invalid for stationary pure tones. Note: noise.py intentionally does not implement the standard peak-decay function, because that would interfere with ignoring the clock. So the results are valid for stationary noise or stationary signal only. 5. Make some plots, here is how: ./resampler_plots.py --rate-from 44100 --skip 32768 --save newplot --fftsize 1024 --noise-file noise.wav --noise-full-scale 84 resampled.wav --rate-from: the sample rate of the original file (test.wav) --skip: skip this many samples from the beginning. This is needed with some versions of PulseAudio because they add a click at the beginning of a recording. --save: says how you want to name the plots. In the example, you'll get newplot_*.png for various values of "*". --fftsize: the FFT size. Meaningful values are between 1024 and 8192, inclusive. Big FFTs need longer test files, the dependency is quadratic. --noise-file: a file with the recording of your room noise. If you have an absolutely quiet room, don't specify this parameter. --noise-full-scale: if you recorded room noise with a calibrated microphone and sound card, then you know the dB SPL value corresponding to a full-scale sine wave. Put it here. --noise-dba: if you have a noise meter instead, put its reading (with the "A" setting) here. If you have nether a calibrated microphone nor a noise meter, put 35 here. This will produce some plots. On all plots, dB means relative to the "standard" full scale used in the earlier versions of psy-eval, i.e. 0 dB on any plot means 92 dB SPL. newplot_response.png: a spectrogram showing the response of the resampler to sine waves of the full amplitude. On the X axis, there will be the input frequency. The amplitude of each output frequency component is then described by the color at the height corresponding to the output frequency. Ideally, there should be only one frequency, equal to that of the input (see the bright diagonal line), but actually there are distortions. newplot_envelope.png: shows the amplitude of output signal vs frequency if the input signal contains only this frequency at the full scale. newplot_response_plus_noise.png: a spectrogram showing the response of the resampler to sine waves at the target listening volume, plus room noise. Handy to visualize what's hidden by noise and what isn't. newplot_distortion.png, newplot_distortion_eq.png: the same spectrogram as newplot_response.png, with some areas blacked out, so that only distortions remain. Without the _eq, attenuating the main tone counts as a distortion. With _eq, attenuating the main tone does not count as a distortion. newplot_audibility.png, newplot_audibility_eq.png: these plots show whether a human can detect the resampler distortion in the presence of the main signal of the specified frequency (X axis) at the preferred volume and the room noise. If the result is higher than 0 dB, a human will notice the distortion given the chance to compare the (possibly-equalized) correct and the distorted sounds. If it is lower, then the distortion is not noticeable. In both cases, the absolute value plotted tells how much the distortion needs to be changed in order to become just-noticeable. For those who just want to see the plots for speex at various listening volumes but don't want to waste time with the null sink, here is an archive with the results of 44.1 -> 48 kHz resampling and a noise file (recorded by David Henningsson) that can be scaled to an arbitrary dBA reading: https://yadi.sk/d/RzV7JGAxbfUve (a zip archive with flac files and a README) Note: the provided resampling results are usable with FFT sizes from 1024 up to 4096. The provided noise file was recorded on equipment with known sensitivity, the full scale is known to be 84 dB SPL, and the noise level is thus 35 dBA. To pretend that you have more or less than 35 dBA of noise in your room, use the --noise-dba option. -- Alexander E. Patrakov