Re: [RFC PATCH 1/6] ALSA: compress: add Sample Rate Converter codec support

Pierre-Louis Bossart <pierre-louis.bossart@xxxxxxxxxxxxxxx> · Thu, 8 Aug 2024 14:19:40 +0200

 files changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/
>>>>> sound/compress_offload.h
>>>>> index 98772b0cbcb7..8b2b72f94e26 100644
>>>>> --- a/include/uapi/sound/compress_offload.h
>>>>> +++ b/include/uapi/sound/compress_offload.h
>>>>> @@ -112,10 +112,12 @@ struct snd_compr_codec_caps {
>>>>>    * end of the track
>>>>>    * @SNDRV_COMPRESS_ENCODER_DELAY: no of samples inserted by the
>>>>> encoder at the
>>>>>    * beginning of the track
>>>>> + * @SNDRV_COMPRESS_SRC_RATIO_MOD: Resampling Ratio Modifier for
>>>>> sample rate converter
>>>>>    */
>>>>>   enum sndrv_compress_encoder {
>>>>>        SNDRV_COMPRESS_ENCODER_PADDING = 1,
>>>>>        SNDRV_COMPRESS_ENCODER_DELAY = 2,
>>>>> +     SNDRV_COMPRESS_SRC_RATIO_MOD = 3,
>>>>>   };
>>>>
>>>> this sounds wrong to me. The sample rate converter is not an "encoder",
>>>> and the properties for padding/delay are totally specific to an encoder
>>>> function.
>>>
>>> There is only decoder and encoder definition for compress,  I know
>>> it is difficult to add SRC to encoder or decoder classification,
>>> SRC is a Post Processing.  I hope you can have a recommandation
>>
>> I don't. I think we're blurring layers in a really odd way.
>>
>> The main reason why the compress API was added is to remove the
>> byte-to-time conversions. But that's clearly not useful for a
>> post-processing of PCM data, where the bitrate is constant. It really
>> feels like we're adding this memory-to-memory API to the compress
>> framework because we don't have anything else available, not because it
>> makes sense to do so.
> 
> It makes sense to offload decoder/encoder tasks as batch processing for
> standard compress stream and return back result (PCM stream or encoded
> stream) to user space. So it makes sense to use the compress interface
> (parameters handshake) for it. Let's talk about the proper SRC extension.
> 
> For SRC and dynamic rate modification. I would just create an ALSA
> control for this. We are already using the "PCM Rate Shift 100000"
> control in the sound/drivers/aloop.c for this purpose (something like
> pitch in MIDI) for example. There is no requirement to add this function
> through metadata ioctls. As bonus, this control can be monitored with
> multiple tasks.

this wouldn't work when the rate is estimated in firmware/hardware,
which is precisely what the 'asynchronous' part of ASRC does.

>> Then there's the issue of parameters, we chose to only add parameters
>> for standard encoders/decoders. Post-processing is highly specific and
>> the parameter definitions varies from one implementation to another -
>> and usually parameters are handled in an opaque way with binary
>> controls. This is best handled with a UUID that needs to be known only
>> to applications and low-level firmware/hardware, the kernel code should
>> not have to be modified for each and every processing and to add new
>> parameters. It just does not scale and it's unmaintainable.
>>
>> At the very least if you really want to use this compress API, extend it
>> to use a non-descript "UUID-defined" type and an opaque set of
>> parameters with this UUID passed in a header.
> 
> We don't need to use UUID-defined scheme for simple (A)SRC
> implementation. As I noted, the specific runtime controls may use
> existing ALSA control API.

"Simple (A)SRC" is an oxymoron. There are multiple ways to define the
performance, and how the drift estimator is handled. There's nothing
simple if you look under the hood. The SOF implementation has for
example those parameters:

uint32_t source_rate;           /**< Define fixed source rate or */
				/**< use 0 to indicate need to get */
				/**< the rate from stream */
uint32_t sink_rate;             /**< Define fixed sink rate or */
				/**< use 0 to indicate need to get */
				/**< the rate from stream */
uint32_t asynchronous_mode;     /**< synchronous 0, asynchronous 1 */
				/**< When 1 the ASRC tracks and */
				/**< compensates for drift. */
uint32_t operation_mode;        /**< push 0, pull 1, In push mode the */
				/**< ASRC consumes a defined number */
				/**< of frames at input, with varying */
				/**< number of frames at output. */
				/**< In pull mode the ASRC outputs */
				/**< a defined number of frames while */
				/**< number of input frames varies. */

They are clearly different from what is suggested above with a 'ratio-mod'.

Same if you have a 'simple EQ'. there are dozens of ways to implement
the functionality with FIR, IIR or a combination of the two, and
multiple bands.

The point is that you have to think upfront about a generic way to pass
parameters. We didn't have to do it for encoders/decoders because we
only catered to well-documented standard solutions only. By choosing to
support PCM processing, a new can of worms is now open.

I repeat: please do not make the mistake of listing all processing with
an enum and a new structure for parameters every time someone needs a
specific transform in their pipeline. We made that mistake with SOF and
had to backtrack rather quickly. The only way to scale is an identifier
that is NOT included in the kernel code but is known to higher and
lower-levels only.