Re: [RFC PATCH 1/6] ALSA: compress: add Sample Rate Converter codec support

Jaroslav Kysela <perex@xxxxxxxx> · Thu, 8 Aug 2024 17:51:36 +0200

On 08. 08. 24 14:19, Pierre-Louis Bossart wrote:
  files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/
sound/compress_offload.h
index 98772b0cbcb7..8b2b72f94e26 100644
--- a/include/uapi/sound/compress_offload.h
+++ b/include/uapi/sound/compress_offload.h
@@ -112,10 +112,12 @@ struct snd_compr_codec_caps {
    * end of the track
    * @SNDRV_COMPRESS_ENCODER_DELAY: no of samples inserted by the
encoder at the
    * beginning of the track
+ * @SNDRV_COMPRESS_SRC_RATIO_MOD: Resampling Ratio Modifier for
sample rate converter
    */
   enum sndrv_compress_encoder {
        SNDRV_COMPRESS_ENCODER_PADDING = 1,
        SNDRV_COMPRESS_ENCODER_DELAY = 2,
+     SNDRV_COMPRESS_SRC_RATIO_MOD = 3,
   };

this sounds wrong to me. The sample rate converter is not an "encoder",
and the properties for padding/delay are totally specific to an encoder
function.

There is only decoder and encoder definition for compress,  I know
it is difficult to add SRC to encoder or decoder classification,
SRC is a Post Processing.  I hope you can have a recommandation

I don't. I think we're blurring layers in a really odd way.

The main reason why the compress API was added is to remove the
byte-to-time conversions. But that's clearly not useful for a
post-processing of PCM data, where the bitrate is constant. It really
feels like we're adding this memory-to-memory API to the compress
framework because we don't have anything else available, not because it
makes sense to do so.

It makes sense to offload decoder/encoder tasks as batch processing for
standard compress stream and return back result (PCM stream or encoded
stream) to user space. So it makes sense to use the compress interface
(parameters handshake) for it. Let's talk about the proper SRC extension.

For SRC and dynamic rate modification. I would just create an ALSA
control for this. We are already using the "PCM Rate Shift 100000"
control in the sound/drivers/aloop.c for this purpose (something like
pitch in MIDI) for example. There is no requirement to add this function
through metadata ioctls. As bonus, this control can be monitored with
multiple tasks.

this wouldn't work when the rate is estimated in firmware/hardware,
which is precisely what the 'asynchronous' part of ASRC does.


Then there's the issue of parameters, we chose to only add parameters
for standard encoders/decoders. Post-processing is highly specific and
the parameter definitions varies from one implementation to another -
and usually parameters are handled in an opaque way with binary
controls. This is best handled with a UUID that needs to be known only
to applications and low-level firmware/hardware, the kernel code should
not have to be modified for each and every processing and to add new
parameters. It just does not scale and it's unmaintainable.

At the very least if you really want to use this compress API, extend it
to use a non-descript "UUID-defined" type and an opaque set of
parameters with this UUID passed in a header.

We don't need to use UUID-defined scheme for simple (A)SRC
implementation. As I noted, the specific runtime controls may use
existing ALSA control API.

"Simple (A)SRC" is an oxymoron. There are multiple ways to define the
performance, and how the drift estimator is handled. There's nothing
simple if you look under the hood. The SOF implementation has for
example those parameters:

uint32_t source_rate;           /**< Define fixed source rate or */
				/**< use 0 to indicate need to get */
				/**< the rate from stream */
uint32_t sink_rate;             /**< Define fixed sink rate or */
				/**< use 0 to indicate need to get */
				/**< the rate from stream */
uint32_t asynchronous_mode;     /**< synchronous 0, asynchronous 1 */
				/**< When 1 the ASRC tracks and */
				/**< compensates for drift. */
uint32_t operation_mode;        /**< push 0, pull 1, In push mode the */
				/**< ASRC consumes a defined number */
				/**< of frames at input, with varying */
				/**< number of frames at output. */
				/**< In pull mode the ASRC outputs */
				/**< a defined number of frames while */
				/**< number of input frames varies. */

They are clearly different from what is suggested above with a 'ratio-mod'.

I don't think so. The proposed (A)SRC for compress-accel is just one case for 
the above configs where the input is known and output is controlled by the 
requested rate. The I/O mechanism is abstracted enough in this case and the 
driver/hardware/firmware must follow it.

Same if you have a 'simple EQ'. there are dozens of ways to implement
the functionality with FIR, IIR or a combination of the two, and
multiple bands.

The point is that you have to think upfront about a generic way to pass
parameters. We didn't have to do it for encoders/decoders because we
only catered to well-documented standard solutions only. By choosing to
support PCM processing, a new can of worms is now open.

I repeat: please do not make the mistake of listing all processing with
an enum and a new structure for parameters every time someone needs a
specific transform in their pipeline. We made that mistake with SOF and
had to backtrack rather quickly. The only way to scale is an identifier
that is NOT included in the kernel code but is known to higher and
lower-levels only.

There are two ways - black box (UUID - as you suggested) - or well defined 
purpose (abstraction). For your example 'simple EQ', the parameters should be 
the band (frequency range) volume values. It's abstract and the real filters 
(resp. implementation) used behind may depend on the hardware/driver capabilities.

From my view, the really special cases may be handled as black box, but 
others like (A)SRC should follow some well-defined abstraction IMHO to not 
force user space to handle all special cases.

						Jaroslav

--
Jaroslav Kysela <perex@xxxxxxxx>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.