On scaling the HRIR in module-virtual-surround-sink

patrakov@xxxxxxxxx (Alexander E. Patrakov) · Sun, 09 Mar 2014 00:57:29 +0600

Hello.

[Do not blindly apply patches from this e-mail! They mutually exclusive, 
and I don't have a firm opinion which one of them is correct.]

Today I tried to improve the existing module-virtual-surround-sink (but 
the same issue also affects the IIR-based rewrite that is still sitting 
on my laptop). The problem is: the current normalization code does not 
do what it is designed to do. The module clips on some testcases. Let me 
copy-paste the problematic code for easy discussion.

     /* normalize hrir to avoid clipping */
     hrir_max = 0;
     for (i = 0; i < u->hrir_samples; i++) {
         hrir_sum = 0;
         for (j = 0; j < u->hrir_channels; j++)
             hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);

         if (hrir_sum > hrir_max)
             hrir_max = hrir_sum;
     }
     if (hrir_max > 1) {
         for (i = 0; i < u->hrir_samples; i++) {
             for (j = 0; j < u->hrir_channels; j++)
                 u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
         }
     }

And here is where clipping (i.e. sum_left or sum_right becoming > 1.0) 
can happen:

         sum_right = 0;
         sum_left = 0;

         /* fold the input buffer with the impulse response */
         for (j = 0; j < u->hrir_samples; j++) {
             for (k = 0; k < u->channels; k++) {
                 current_sample = 
u->input_buffer[((u->input_buffer_offset + j) % u->hrir_samples) * 
u->channels + k];

                 sum_left += current_sample * u->hrir_data[j * 
u->hrir_channels + u->mapping_left[k]];
                 sum_right += current_sample * u->hrir_data[j * 
u->hrir_channels + u->mapping_right[k]];
             }
         }

So, basically, here is the "normalization" logic: for each HRIR sample, 
take the sum of all absolute values of HRIR channels at that time. Find 
the maximum of these sums. Divide that factor out, with an extra "fudge 
factor" of 1.2. Without the fudge factor, the corresponding explanation 
in terms of the second code fragment is: if a single-sample click is 
played simultaneously through all 5.1 channels, the output of the filter 
(sum_left or sum_right) should just reach 1.0 at its maximum. With 
hrir-kemar.wav from http://stuff.salscheider-online.de/hrir_kemar.tar.gz 
(mentioned in the original submission), this logic (including the fudge 
factor) divides the HRIR by 1.862.

What I don't understand is why "simultaneous click in all input 
channels" is considered at all.

If the task is to avoid clipping even in the worst possible case, then 
let's construct this worst possible case. Here are the instructions. 
Take hrir-kemar.wav. Time-reverse it. Saturate all positive samples to 
1.0 and all negative samples to -1.0. With this specially-constructed 
sound, sum_left would at one point in time receive the sum of all 
absolute values of all samples (in all channels) in the HRIR, so, that 
is the value to divide out, and not what is written in the first code 
fragment. I.e., this would be the patch (sorry if mangled, it is not 
meant to be applied anyway):

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -533,7 +533,7 @@ int pa__init(pa_module*m) {

      const char *hrir_file;
      unsigned i, j, found_channel_left, found_channel_right;
-    float hrir_sum, hrir_max;
+    float hrir_sum;
      float *hrir_data;

      pa_sample_spec hrir_ss;
@@ -759,20 +759,21 @@ int pa__init(pa_module*m) {
      }

      /* normalize hrir to avoid clipping */
-    hrir_max = 0;
+    hrir_sum = 0.0;
      for (i = 0; i < u->hrir_samples; i++) {
-        hrir_sum = 0;
          for (j = 0; j < u->hrir_channels; j++)
              hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);
+    }
+

-        if (hrir_sum > hrir_max)
-            hrir_max = hrir_sum;
+    if (hrir_sum < 0.01) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
      }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_sum;
      }

      /* create mapping between hrir and input */

(Here I also removed the "don't normalize quiet HRIRs" logic, as I don't 
understand its purpose and it had an obvious bug of not including the 
fudge factor.)

Result: after this patch, the HRIR is divided by 24.339 and the module 
indeed survives the worst possible testcase just without clipping. But 
then, it becomes too quiet for normal use. That's why I don't want you 
to apply the above patch.

The original code, even though it clips heavily in the worst possible 
testcase, indeed produces no audible clipping on most (but not all!) 
music DVDs that I tried, and I think that the correct problem to solve 
here is indeed "produce no audible clipping on typical content". So 
there is some grain of truth in the current code.

Still, I want to change the fudge factor, because, as of now, the module 
does not survive the following testcases (i.e. clips, with this clipping 
being very easy to detect by ear):

  * A full-amplitude sine sweep from 100 Hz to 10 kHz, same samples in 
all channels.
  * The 5.1 soundtrack from the "Lichtmond 2: Universe of Light" music 
BluRay.

The fudge factor required to play the whole Lichtmond 2 soundtrack 
without clipping is 2.4, corresponding to dividing the HRIR by 3.724. So:

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -768,11 +768,15 @@ int pa__init(pa_module*m) {
          if (hrir_sum > hrir_max)
              hrir_max = hrir_sum;
      }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    if (hrir_max < 0.01) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
+    }
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 2.4;
      }

      /* create mapping between hrir and input */

I don't like the above (working and almost minimal) patch, as it just 
replaces one magic number with another, and I still don't understand why 
the calculated hrir_max value is relevant. I.e. the same "correct" 
scaling factor, 3.724, can be also approximately produced this way, from 
the energy contained in the impulse response:

--- a/src/modules/module-virtual-surround-sink.c
+++ b/src/modules/module-virtual-surround-sink.c
@@ -533,7 +533,7 @@ int pa__init(pa_module*m) {

      const char *hrir_file;
      unsigned i, j, found_channel_left, found_channel_right;
-    float hrir_sum, hrir_max;
+    float hrir_sample, hrir_energy;
      float *hrir_data;

      pa_sample_spec hrir_ss;
@@ -758,21 +758,25 @@ int pa__init(pa_module*m) {
          goto fail;
      }

-    /* normalize hrir to avoid clipping */
-    hrir_max = 0;
+    /* normalizg hrir to avoid clipping on typical music */
+    hrir_energy = 0.0;
      for (i = 0; i < u->hrir_samples; i++) {
-        hrir_sum = 0;
-        for (j = 0; j < u->hrir_channels; j++)
-            hrir_sum += fabs(u->hrir_data[i * u->hrir_channels + j]);
+        for (j = 0; j < u->hrir_channels; j++) {
+            hrir_sample = u->hrir_data[i * u->hrir_channels + j];
+            hrir_energy += hrir_sample * hrir_sample;
+       }
+    }

-        if (hrir_sum > hrir_max)
-            hrir_max = hrir_sum;
+    if (hrir_energy < 0.0001) {
+        pa_log("hrir file is too quiet!");
+        goto fail;
      }
-    if (hrir_max > 1) {
-        for (i = 0; i < u->hrir_samples; i++) {
-            for (j = 0; j < u->hrir_channels; j++)
-                u->hrir_data[i * u->hrir_channels + j] /= hrir_max * 1.2;
-        }
+
+    hrir_sample = sqrt(hrir_energy) * 1.67;
+
+    for (i = 0; i < u->hrir_samples; i++) {
+        for (j = 0; j < u->hrir_channels; j++)
+            u->hrir_data[i * u->hrir_channels + j] /= hrir_sample;
      }

      /* create mapping between hrir and input */

Of course, the same criticism applies as for the second patch - why 
would the energy be relevant here?

Some final remarks.

I don't insist on the 3.724 scaling factor and can accept occasional 
clipping if it is not detectable by ear.

Due to the fact that we are dealing with "typical content", I understand 
that a fudge factor is needed anyway, to deal with its typicalness.

The code has to deal with HRIRs of arbitrary origin, including those 
from http://stuff.salscheider-online.de/hrir_listen.tar.gz , that's why 
a static factor to divide out won't cut it. But I haven't tested any 
other HRIRs yet.

Wrong equalization of the HRIRs (i.e. expensive headphones sounding like 
cheap plugs with this module) is a separate bug that will be addressed 
in a separate e-mail when I become able to discuss it.

Thoughts?

-- 
Alexander E. Patrakov