Re: [PATCH v15 00/16] Add audio support in v4l2 framework

Amadeusz Sławiński <amadeuszx.slawinski@xxxxxxxxxxxxxxx> · Thu, 9 May 2024 11:50:19 +0200

On 5/9/2024 11:36 AM, Shengjiu Wang wrote:
On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
<amadeuszx.slawinski@xxxxxxxxxxxxxxx> wrote:

On 5/8/2024 10:00 AM, Hans Verkuil wrote:
On 06/05/2024 10:49, Shengjiu Wang wrote:
On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> wrote:

Em Fri, 3 May 2024 10:47:19 +0900
Mark Brown <broonie@xxxxxxxxxx> escreveu:

On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
Mauro Carvalho Chehab <mchehab@xxxxxxxxxx> escreveu:

There are still time control associated with it, as audio and video
needs to be in sync. This is done by controlling the buffers size
and could be fine-tuned by checking when the buffer transfer is done.

...

Just complementing: on media, we do this per video buffer (or
per half video buffer). A typical use case on cameras is to have
buffers transferred 30 times per second, if the video was streamed
at 30 frames per second.

IIRC some big use case for this hardware was transcoding so there was a
desire to just go at whatever rate the hardware could support as there
is no interactive user consuming the output as it is generated.

Indeed, codecs could be used to just do transcoding, but I would
expect it to be a border use case. See, as the chipsets implementing
codecs are typically the ones used on mobiles, I would expect that
the major use cases to be to watch audio and video and to participate
on audio/video conferences.

Going further, the codec API may end supporting not only transcoding
(which is something that CPU can usually handle without too much
processing) but also audio processing that may require more
complex algorithms - even deep learning ones - like background noise
removal, echo detection/removal, volume auto-gain, audio enhancement
and such.

On other words, the typical use cases will either have input
or output being a physical hardware (microphone or speaker).

All, thanks for spending time to discuss, it seems we go back to
the start point of this topic again.

Our main request is that there is a hardware sample rate converter
on the chip, so users can use it in user space as a component like
software sample rate converter. It mostly may run as a gstreamer plugin.
so it is a memory to memory component.

I didn't find such API in ALSA for such purpose, the best option for this
in the kernel is the V4L2 memory to memory framework I found.
As Hans said it is well designed for memory to memory.

And I think audio is one of 'media'.  As I can see that part of Radio
function is in ALSA, part of Radio function is in V4L2. part of HDMI
function is in DRM, part of HDMI function is in ALSA...
So using V4L2 for audio is not new from this point of view.

Even now I still think V4L2 is the best option, but it looks like there
are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
a duplication of code (bigger duplication that just add audio support
in V4L2 I think).

After reading this thread I still believe that the mem2mem framework is
a reasonable option, unless someone can come up with a method that is
easy to implement in the alsa subsystem. From what I can tell from this
discussion no such method exists.

Hi,

my main question would be how is mem2mem use case different from
loopback exposing playback and capture frontends in user space with DSP
(or other piece of HW) in the middle?

I think loopback has a timing control,  user need to feed data to playback at a
fixed time and get data from capture at a fixed time.  Otherwise there
is xrun in
playback and capture.

mem2mem case: there is no such timing control,  user feeds data to it
then it generates output,  if user doesn't feed data, there is no xrun.
but mem2mem is just one of the components in the playback or capture
pipeline, overall there is time control for whole pipeline,

Have you looked at compress streams? If I remember correctly they are 
not tied to time due to the fact that they can pass data in arbitrary 
formats?

From:
https://docs.kernel.org/sound/designs/compress-offload.html

"No notion of underrun/overrun. Since the bytes written are compressed 
in nature and data written/read doesn’t translate directly to rendered 
output in time, this does not deal with underrun/overrun and maybe dealt 
in user-library"

Amadeusz