Re: [PATCH v15 00/16] Add audio support in v4l2 framework

Jaroslav Kysela <perex@xxxxxxxx> · Thu, 16 May 2024 16:50:39 +0200

On 15. 05. 24 22:33, Nicolas Dufresne wrote:
Hi,

GStreamer hat on ...

Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
On 15. 05. 24 12:19, Takashi Iwai wrote:
On Wed, 15 May 2024 11:50:52 +0200,
Jaroslav Kysela wrote:

On 15. 05. 24 11:17, Hans Verkuil wrote:
Hi Jaroslav,

On 5/13/24 13:56, Jaroslav Kysela wrote:
On 09. 05. 24 13:13, Jaroslav Kysela wrote:
On 09. 05. 24 12:44, Shengjiu Wang wrote:
mem2mem is just like the decoder in the compress pipeline. which is
one of the components in the pipeline.

I was thinking of loopback with endpoints using compress streams,
without physical endpoint, something like:

compress playback (to feed data from userspace) -> DSP (processing) ->
compress capture (send data back to userspace)

Unless I'm missing something, you should be able to process data as fast
as you can feed it and consume it in such case.

Actually in the beginning I tried this,  but it did not work well.
ALSA needs time control for playback and capture, playback and capture
needs to synchronize.  Usually the playback and capture pipeline is
independent in ALSA design,  but in this case, the playback and capture
should synchronize, they are not independent.

The core compress API core no strict timing constraints. You can eventually0
have two half-duplex compress devices, if you like to have really independent
mechanism. If something is missing in API, you can extend this API (like to
inform the user space that it's a producer/consumer processing without any
relation to the real time). I like this idea.

I was thinking more about this. If I am right, the mentioned use in gstreamer
is supposed to run the conversion (DSP) job in "one shot" (can be handled
using one system call like blocking ioctl).  The goal is just to offload the
CPU work to the DSP (co-processor). If there are no requirements for the
queuing, we can implement this ioctl in the compress ALSA API easily using the
data management through the dma-buf API. We can eventually define a new
direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
handle this new data scheme. The API may be extended later on real demand, of
course.

Otherwise all pieces are already in the current ALSA compress API
(capabilities, params, enumeration). The realtime controls may be created
using ALSA control API.

So does this mean that Shengjiu should attempt to use this ALSA approach first?

I've not seen any argument to use v4l2 mem2mem buffer scheme for this
data conversion forcefully. It looks like a simple job and ALSA APIs
may be extended for this simple purpose.

Shengjiu, what are your requirements for gstreamer support? Would be a
new blocking ioctl enough for the initial support in the compress ALSA
API?

If it works with compress API, it'd be great, yeah.
So, your idea is to open compress-offload devices for read and write,
then and let them convert a la batch jobs without timing control?

For full-duplex usages, we might need some more extensions, so that
both read and write parameters can be synchronized.  (So far the
compress stream is a unidirectional, and the runtime buffer for a
single stream.)

And the buffer management is based on the fixed size fragments.  I
hope this doesn't matter much for the intended operation?

It's a question, if the standard I/O is really required for this case. My
quick idea was to just implement a new "direction" for this job supporting
only one ioctl for the data processing which will execute the job in "one
shot" at the moment. The I/O may be handled through dma-buf API (which seems
to be standard nowadays for this purpose and allows future chaining).

So something like:

struct dsp_job {
     int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
     int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
     ... maybe some extra data size members here ...
     ... maybe some special parameters here ...
};

#define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)

This ioctl will be blocking (thus synced). My question is, if it's feasible
for gstreamer or not. For this particular case, if the rate conversion is
implemented in software, it will block the gstreamer data processing, too.

Yes, GStreamer threading is using a push-back model, so blocking for the time of
the processing is fine. Note that the extra simplicity will suffer from ioctl()
latency.

In GFX, they solve this issue with fences. That allow setting up the next
operation in the chain before the data has been produced.

The fences look really nicely and seem more modern. It should be possible with 
dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share 
the state between user space and kernel driver.

In this case, I think that two non-blocking ioctls should be enough - add a 
new job with source/target dma buffers guarded by one fence and abort (flush) 
all active jobs.

I'll try to propose an API extension for the ALSA's compress API in the 
linux-sound mailing list soon.

					Jaroslav

--
Jaroslav Kysela <perex@xxxxxxxx>
Linux Sound Maintainer; ALSA Project; Red Hat, Inc.