Re: [PATCH v15 00/16] Add audio support in v4l2 framework

Nicolas Dufresne <nicolas@xxxxxxxxxxxx> · Wed, 15 May 2024 16:33:48 -0400

Hi,

GStreamer hat on ...

Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> > > 
> > > On 15. 05. 24 11:17, Hans Verkuil wrote:
> > > > Hi Jaroslav,
> > > > 
> > > > On 5/13/24 13:56, Jaroslav Kysela wrote:
> > > > > On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> > > > > > On 09. 05. 24 12:44, Shengjiu Wang wrote:
> > > > > > > > > mem2mem is just like the decoder in the compress pipeline. which is
> > > > > > > > > one of the components in the pipeline.
> > > > > > > > 
> > > > > > > > I was thinking of loopback with endpoints using compress streams,
> > > > > > > > without physical endpoint, something like:
> > > > > > > > 
> > > > > > > > compress playback (to feed data from userspace) -> DSP (processing) ->
> > > > > > > > compress capture (send data back to userspace)
> > > > > > > > 
> > > > > > > > Unless I'm missing something, you should be able to process data as fast
> > > > > > > > as you can feed it and consume it in such case.
> > > > > > > > 
> > > > > > > 
> > > > > > > Actually in the beginning I tried this,  but it did not work well.
> > > > > > > ALSA needs time control for playback and capture, playback and capture
> > > > > > > needs to synchronize.  Usually the playback and capture pipeline is
> > > > > > > independent in ALSA design,  but in this case, the playback and capture
> > > > > > > should synchronize, they are not independent.
> > > > > > 
> > > > > > The core compress API core no strict timing constraints. You can eventually0
> > > > > > have two half-duplex compress devices, if you like to have really independent
> > > > > > mechanism. If something is missing in API, you can extend this API (like to
> > > > > > inform the user space that it's a producer/consumer processing without any
> > > > > > relation to the real time). I like this idea.
> > > > > 
> > > > > I was thinking more about this. If I am right, the mentioned use in gstreamer
> > > > > is supposed to run the conversion (DSP) job in "one shot" (can be handled
> > > > > using one system call like blocking ioctl).  The goal is just to offload the
> > > > > CPU work to the DSP (co-processor). If there are no requirements for the
> > > > > queuing, we can implement this ioctl in the compress ALSA API easily using the
> > > > > data management through the dma-buf API. We can eventually define a new
> > > > > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
> > > > > handle this new data scheme. The API may be extended later on real demand, of
> > > > > course.
> > > > > 
> > > > > Otherwise all pieces are already in the current ALSA compress API
> > > > > (capabilities, params, enumeration). The realtime controls may be created
> > > > > using ALSA control API.
> > > > 
> > > > So does this mean that Shengjiu should attempt to use this ALSA approach first?
> > > 
> > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> > > data conversion forcefully. It looks like a simple job and ALSA APIs
> > > may be extended for this simple purpose.
> > > 
> > > Shengjiu, what are your requirements for gstreamer support? Would be a
> > > new blocking ioctl enough for the initial support in the compress ALSA
> > > API?
> > 
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> > 
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> > 
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
> 
> It's a question, if the standard I/O is really required for this case. My 
> quick idea was to just implement a new "direction" for this job supporting 
> only one ioctl for the data processing which will execute the job in "one 
> shot" at the moment. The I/O may be handled through dma-buf API (which seems 
> to be standard nowadays for this purpose and allows future chaining).
> 
> So something like:
> 
> struct dsp_job {
>     int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>     int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>     ... maybe some extra data size members here ...
>     ... maybe some special parameters here ...
> };
> 
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
> 
> This ioctl will be blocking (thus synced). My question is, if it's feasible 
> for gstreamer or not. For this particular case, if the rate conversion is 
> implemented in software, it will block the gstreamer data processing, too.

Yes, GStreamer threading is using a push-back model, so blocking for the time of
the processing is fine. Note that the extra simplicity will suffer from ioctl()
latency.

In GFX, they solve this issue with fences. That allow setting up the next
operation in the chain before the data has been produced.

In V4L2, we solve this with queues. It allows preparing the next job, while the
processing of the current job is happening. If you look at v4l2convert code in
gstreamer (for simple m2m), it currently makes no use of the queues, it simply
synchronously process the frames. There is two option, where it does not matter
that much, or no one is using it :-D Video decoders and encoders (stateful) do
run input / output from different thread to benefit from the queued.

regards,
Nicolas

> 
> 						Jaroslav
>