Re: V4L2 M2M driver architecture question for a new hardware

Dave Stevenson <dave.stevenson@xxxxxxxxxxxxxxx> · Fri, 14 Oct 2022 15:49:38 +0100

Hi Karthik

On Fri, 14 Oct 2022 at 15:26, Karthik Poduval <karthik.poduval@xxxxxxxxx> wrote:
>
> Thanks for the reply Jacopo and Dave.
>
> On Thu, Oct 13, 2022, 2:25 AM Dave Stevenson <dave.stevenson@xxxxxxxxxxxxxxx> wrote:
>>
>> Hi Karthik and Jacopo
>>
>> On Thu, 13 Oct 2022 at 08:18, Jacopo Mondi <jacopo@xxxxxxxxxx> wrote:
>> >
>> > Hello Karthik
>> >
>> > On Wed, Oct 12, 2022 at 10:59:50PM -0700, Karthik Poduval wrote:
>> > > Hi All,
>> > >
>> > > I have hardware that does some sort of image manipulation. The
>> > > hardware takes 2 inputs.
>> > > - image buffer
>> > > - config param buffer
>> > > and generates one output which is also an image buffer.
>> > > The input and output images formats fall under standard image
>> > > definitions of V4L2 like various YUV/RGB formats (interleaved or
>> > > multiplanar).
>> > >
>> > > The config param buffer is kind of like a set of instructions for the
>> > > hardware that needs to be passed with every input and output image
>> > > which tells the hardware how to process the image.
>> > > The hardware will be given different input images and output images
>> > > every time and possibly different config param buffers too (in some
>> > > cases). The config param buffers may have variable sizes too based on
>> > > the nature of processing for that frame, but input and output images
>> > > are fixed in size for a given context. I should also mention that the
>> > > config param buffers are a few KBs in size so zero copy is a
>> > > requirement. The config params buffers are written by userspace
>> > > (possibly also driver in kernel space) and read by hardware.
>> > >
>> >
>> > This sounds very much how a regular M2M ISP driver works. I can't tell
>> > about codecs as I'm no expert there, but I expect them to be similar,
>> > so your use case is covered by existing drivers.
>> >
>> > > Here were two mechanisms I had in mind while trying to design a V4L2
>> > > M2M driver for this hardware.
>> > > - Use a custom multiplanar input format where one plane is a config
>> > > param buffer with remaining planes for input images (in case the input
>> > > image is also multiplanar).
>> >
>> > If you're wondering how to pass parameters to the HW I suggest to
>> > consider registering an output video device node, where you simply
>> > queue buffers with your parameters to.
>> >
>> > Your HW could be modeled as a single subdevice with 3 video device
>> > nodes, one output device for input images, one output device for
>> > parameters, and one capture device for output images.
>> >
>> >                    +-----------+
>> >        +----+      | HW subdav |      +------+
>> >        | In | ---> 0           0  --> | out  |
>> >        +----+      |           |      +------+
>> >                    +-----0-----+
>> >                          ^
>> >                          |
>> >                      +--------+
>> >                      | params |
>> >                      +--------+
>>
>> The main drawback of this over the codec model of a single video
>> device with both an _OUTPUT and _CAPTURE queue is that you can not run
>> multiple instances simultaneously - there is no way to tie the
>> relevant clients together. I don't know whether supporting
>> simultaneous multiple clients is a requirement in this case, but that
>> may be a key decision in choosing how to represent the device.
>
>
> Yes multi context feature of V4L2 M2M is a requirement. Is it possible to have a capture, output and param queues for M2M devices ? It's essentially fits the M2M architecture but with a larger control param so we are looking for zero copy instead of relying on V4L2 ctrl's ioctl based approach.

AIUI You can't have multiple input (or output) queues as then it
becomes ambiguous as to which queue triggered a poll/select. With one
_OUTPUT queue it will trigger "write", and one _CAPTURE queue will
trigger "read". Add a 3rd queue and you don't know which one to query.

Using dma-heaps does work - we've used that in the bcm2835-isp driver
[1] for passing in lens shading tables (again several kB, but there
they are largely static).
In that case we need to jump through a couple of hoops to map the
dmabuf into the ISP control software's memory space as well, but
fundamentally it's very similar.

V4L2_CTRL_FLAG_EXECUTE_ON_WRITE is necessary to handle the case where you:
- allocate a buffer from dma-heap and get fd N.
- pass the fd into the V4L2 driver, which acquires the underlying dmabuf.
- close the fd as userspace doesn't want it anymore.
- allocate a new buffer from dma-buf and get fd N again, but it is
referencing a new underlying dmabuf.
- pass the fd into V4L2 - the control framework would generally view
it as "no change" and not call your control handler :-(

If you configure userspace to hang on to the same dmabuf and update
it, then you need to add in some method to ensure the config buffer
isn't in use by your driver at the point you update it. And don't
forget about cache management.

I'll leave it for others to comment on whether it is really acceptable
to mainline to use a dmabuf fd in a control.

  Dave

[1] https://github.com/raspberrypi/linux/blob/rpi-5.15.y/drivers/staging/vc04_services/bcm2835-isp/bcm2835-v4l2-isp.c#L752

>> > The parameters buffer can be of modeled using the v4l2_meta_format[1]
>> > interface. The data format of the buffer could be defined as a custom
>> > metadata format, you can see examples here [2]
>> >
>> > [1] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-meta.html#c.v4l2_meta_format
>> > [2] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/meta-formats.html#meta-formats
>> >
>> > I suggest to look at the IPU3 and RkISP1 drivers for reference.
>> >
>> > > - Use dmabuf heaps to allocate config param buffer. Tie this config
>> > > param buffer fd to an input buffer (using request API). Driver would
>> > > have to attach the config param buffer dmabuf fd, use it and detach.
>> > >
>> >
>> > You should be able to easily allocate buffers in the video device as
>> > you would easily do and export them as dmabuf fds by using
>> > VIDIOC_EXPBUF [3].
>> >
>> > Once you have them you can map them in your application code and
>> > write their content.
>> >
>> > [3] https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/vidioc-expbuf.html
>> >
>> > > Any comments/concerns about the above two mechanisms ?
>> > > Any other better ideas ?
>> > > Are there any existing V4L2 M2M mechanisms present to deal with per
>> > > frame param buffers that are also zero copy ?
>> > > Is the media request API able to do zero copy for setting compound
>> > > controls for large (several KBs) compound controls ? (making the above
>> > > dmabuf heap approach unnecessary)
>> >
>> > Now, all the above assumes your parameters buffer is modeled as a
>> > structure of parameters (and possibly data tables). If you are instead
>> > looking at something that can be modeled through controls you might
>> > have better guidance by looking at how codecs work, but there I can't
>> > help much ;)
>> >
>> > Hope it helps
>> >    j
>> > >
>> > > --
>> > > Regards,
>> > > Karthik Poduval