Re: RFC: add min_num_buffers and clarify V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Hans

On Tue, Oct 29, 2024 at 09:17:57AM +0100, Hans Verkuil wrote:
> On 28/10/2024 16:52, Laurent Pinchart wrote:
> > Hi Hans,
> >
> > On Mon, Oct 28, 2024 at 12:10:22PM +0100, Hans Verkuil wrote:
> >> Hi all,
> >>
> >> This mail thread uncovered some corner cases around how many buffers should be allocated
> >> if VIDIOC_REQBUFS with count = 1 is called:
> >>
> >> https://lore.kernel.org/linux-media/20241003-rp1-cfe-v6-0-d6762edd98a8@xxxxxxxxxxxxxxxx/T/#mc2210597d92b5a0f09fabdac2f7307128aaa9bd8
> >
> > I'll repeat below some comments I've made in that thread, as they're
> > better discussed in the context of this RFC.
> >
> >> When it comes to the minimum number of buffers there are a number of limitations:
> >>
> >> 1) The DMA engine needs at least N buffers to be queued before it can start. Typically
> >>    this is 0, 1 or 2, and a driver sets this via the vb2_queue min_queued_buffers field.
> >>    So if min_queued_buffers = 1, then the DMA engine needs one buffer at all times to
> >>    DMA to. Allocating just one buffer would mean the DMA engine can never return that
> >>    buffer to userspace (it would just keep recycling the same buffer over and over), so
> >>    the minimum must be min_queued_buffers + 1.
> >
> > I think you're mixing hardware and driver constraints here. Drivers can
> > use scratch buffers to relax the hardware requirements, and allow
> > userspace operation with less buffers than strictly required by the
> > hardware.
> >
> > The cost of allocating such scratch buffers vary depending on the
> > device. When an IOMMU is available, or when the device has a line stride
> > that can be set to 0 and supports race-free programming of the stride
> > and buffer addresses, the scratch buffer can be as small as a single
> > page or a single line. In other cases, a full-frame scratch buffer is
> > required, which is costly, and the decision on whether or not to
> > allocate such a scratch buffer should probably be taken with userspace
> > being involved.
>
> I honestly don't see why you would want to spend a lot of time on adding
> scratch buffer support just to save a bit of memory. Is the use-case of
> capturing just a single buffer so common? To me it seems that it only
> makes sense to spend effort on this if you only need to capture a single
> buffer and never need to stream more buffers.

I can give you two examples I'm currently working with

- A device with a "viewfinder" device node and a "still capture"
  capture device. We want to only queue one buffer to the "still
  capture" capture device when the user requires to (the usual "tap to
  capture"). Adam which was in cc to my patch for RkISP1 that removes
  min_queued_buffers was struggling to implement "tap to capture"
  support in his application has he had to queue 3 buffers before he
  could capture an image from the "still capture" pipe.

- A resource constrained device that only capture one frame
  sporadically because it needs to reduce memory pressure and can't
  allocate a number of buffers that allows it to keep the queued
  buffers queue populated to sustain high frame rates produced by the
  sensor

- In libcamera we want the image pipeline to be running even if no
  buffers are queued to the capture devices at its end. This means
  that we want statistics to be produced by the ISP and parameters to
  be consumed even if the frames produced by the ISP are actually
  discarded to the scratch buffers.

  We want this because we want the algorithms to keep running even if
  users are queuing capture buffers sporadically, to ensure a smaller as
  possible recovery period of the 3A algorithms (ideally, there
  shouldn't be nothing to recover from as the system is 'live' all the
  time) so we want stats to be generated for every frame produced by
  the sensor. And we ideally want this from frame#0 without waiting
  for users to queue a min number buffer to start the pipeline.

I'm sure in robotics/machine vision there are even more advanced use
cases for capturing single buffers in response to events from the
external world.

>
> Can you describe the use-case of capturing just a single buffer? Is that
> just for testing libcamera? Or is it something that happens all the time
> during normal libcamera operation?
>
> Supporting scratch buffers is a lot of effort for something that is not
> needed for normal streaming.
>
> >
> > min_queued_buffers describes how the device operates from a userspace
> > point of view, so I don't think it should be considered or documented as
> > being a hardware requirement, but a driver requirement.
>
> It's a hardware and/or driver requirement. It is absolutely not a userspace
> requirement. Normal userspace applications that use VIDIOC_REQBUFS and just
> stream video will never notice this.
>
> >
> >> 2) Historically VIDIOC_REQBUFS is expected to increase the count value to a number that
> >>    ensures the application can smoothly process the video stream. Typically this will
> >>    be 3 or 4 (if min_queued_buffers == 2): min_queued_buffers are used by the DMA engine,
> >>    one buffer is queued up in vb2, ready to be used by the DMA engine as soon as it
> >>    returns a filled buffer to userspace, and one buffer is processed by userspace.
> >>
> >>    This is to support applications that call VIDIOC_REQBUFS with count = 1 and leave it
> >>    to the driver to increment it to a workable value.
> >
> > Do we know what those applications are ? I'm not disputing the fact that
> > this may need to be supported to avoid breaking old userspace, but I
> > also think this feature should be phased out for new drivers, especially
> > drivers that require a device-specific userspace and therefore won't
> > work out of the box with old applications.
>
> xawtv is one: it will call REQBUFS with count = 2 (so this would fail for
> any driver that sets min_queued_buffers to 2), and with count = 1 if it wants
> to capture just a single frame.
>
> 'git grep min_queued_buffers|grep -v videobuf|wc' gives me 83 places where it is
> set. Some of those are likely wrong (min_queued_buffers has been abused as a
> replacement for min_reqbufs_allocation), but still that's quite a lot.
>
> Mostly these are older drivers for hardware without an IOMMU and typically for
> SDTV capture. So memory is not a consideration for those drivers since a
> SDTV buffer is quite small.
>
> >
> >> 3) Stateful codecs in particular have additional requirements beyond the DMA engine
> >>    limits due to the fact that they have to keep track of reference buffers and other
> >>    codec limitations. As such more buffers are needed, and that number might also vary
> >>    based on the specific codec used. The V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT
> >>    controls are used to report that. Support for this is required by the stateful codec
> >>    API.
> >>
> >>    The documentation of these controls suggest that these are generic controls, but
> >>    as of today they are only used by stateful codec drivers.
> >>
> >> 4) Some corner cases (mainly/only SDR, I think) where you need more than the usual
> >>    3 or 4 buffers since the buffers arrive at a high frequency.
> >
> > High frame rates is an important feature, but it's also a can of worms.
> > V4L2 is lacking the ability to batch multiple frames, we will have to
> > address that. Hopefully it could be decoupled from this RFC.
>
> It's a separate issue indeed. I just mentioned it because I know SDR drivers
> use this. They are rarely used, though.
>
> >
> >> Rather than have drivers try to correct the count value (typically incorrectly), the
> >> vb2_queue min_reqbufs_allocation field was added to set the minimum number of
> >> buffers that VIDIOC_REQBUFS should allocate if count is less than that.
> >
> > Even if I dislike this feature, I agree it's better implemented through
> > min_reqbufs_allocation than by manual calculations in drivers.
> >
> >> VIDIOC_CREATE_BUFS is not affected by that: if you use CREATE_BUFS you take full control
> >> of how many buffers you want to create. It might create fewer buffers if you run out of
> >> memory, but never more than requested.
> >>
> >> But what is missing is that if you use CREATE_BUFS you need to know the value of
> >> min_queued_buffers + 1, and that is not exposed.
> >>
> >> I would propose to add a min_num_buffers field to struct v4l2_create_buffers
> >> and add a V4L2_BUF_CAP_SUPPORTS_MIN_NUM_BUFFERS flag to signal the presence of
> >> that field. And vb2 can set it to min_queued_buffers + 1.
> >
> > This would require allocating a buffer first to get the value. Wouldn't
> > a read-only control be better ?
>
> No. You can call CREATE_BUFS with count = 0: in that case it does nothing,
> except filling in all those capabilities. It was designed with that in mind
> so you have an ioctl that can return all that information.
>
> >
> > Furthermore, I would rather provide the min_queued_buffers value instead
> > of min_queued_buffers + 1. The V4L2 API should provide userspace with
> > information it needs to make informed decisions, but not make those
> > decisions in behalf of userspace. It's up to applications to add 1 or
> > more buffers depending on their use case.
>
> I would definitely want more opinions on this. What's the point of returning
> min_queued_buffers and then creating that many buffers and still not be able
> to stream?

There are use cases for memory constrained systems where buffers are
only queued sporadically. It might come from the requirement of
allocating less buffers as possible [*] or because processing frames
takes a longer time and maintaing the buffer queue populated for sustained
frame-rate operations would require a lot of buffers to be reserved.

In general, we can't predict the use cases in which a driver will be
used, so informing user-space about the actual requirements without
trying to hint what they should do seems better to me,

[*] I understand that allocating a full scratch buffer in the driver
kind of goes in the opposite direction of "not wasting memory" but if
the DMA engine does not support discarding frames in HW, a single buffer
in kernel space avoids a larger allocation in user space

>
> Can you think of a scenario (e.g. in libcamera or elsewhere) where that makes
> sense?
>
> Also, will the average V4L2 user have the knowledge to understand that? You
> have that knowledge, but I think for anyone else it would be really confusing.
>
> >
> > I think we also need to discuss policies regarding scratch buffer
> > allocation in the context of this RFC. When the hardware supports small
> > scratch buffers, I would like to make it mandatory for drivers to do so
> > and support min_queued_buffers = 0.
>
> I would first like to know the use-case (as I mentioned above).
>
> For the type of drivers I mostly work with (video receivers), it would just
> be a lot of work for no gain. But perhaps for camera pipelines it does make
> sense?
>
> > When scratch buffers are expensive, do we want to still support them in
> > the kernel, perhaps in a way controlled by userspace ? A userspace that
> > can guarantee it will always provide min_queued_buffers + 1 buffers
> > could indicate so and avoid scratch buffer allocation, while a userspace
> > that can't provide that guarantee would get scratch buffers from the
> > kernel.
>
> That is really the difference between using VIDIOC_REQBUFS and VIDIOC_CREATE_BUFS.
> I.e., userspace can already choose this.
>
> Just to clarify the reason for this RFC: the current situation is messy. There
> is a lot of history and a lot of older drivers do not always do the right thing.
>
> With this RFC I would like to get a consensus of how it should work. After that
> I want to implement any missing bits and improve the documentation, and finally
> go through the drivers and at least try to make them behave consistently.

my2c: if CREATE_BUFFERS(0) allows to retrieve min_queued_buffers to
allow userspace make informed decisions about how many buffers to
allocate to at least get streaming going, I would be happy with such
API more than with a control.

When it comes to scratch buffers usage, I'm not sure we can enforce it
as a requirement (or even try to provide some helper in the core for
drivers) but I defintately see use cases for applicating queueing buffers
sporadically and for driver being ready to discard frames without
stalling or delaying the start of the capture pipeline operations.

Thanks
  j

>
> Also I want to improve v4l2-compliance to test more corner cases, especially
> if you use CREATE_BUFS instead of REQBUFS (I already have a patch for that
> ready).
>
> The work Benjamin did on increasing the max number of supported buffers and the
> REMOVE_BUFS ioctl uncovered a lot of that messy history, and it is clear we need
> to try and clarify how it should work.
>
> >> The second proposal is to explicitly document that the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT
> >> are for stateful codec support only, at least for now.
>
> I just discovered that v4l2-compliance and v4l2-ctl do not honor these controls
> for stateful codecs. That's something that needs to be fixed.
>
> There is also one other item that I would like to discuss: the vb2 queue_setup
> callback is currently used for both REQBUFS and CREATE_BUFS, and it remains
> confusing for drivers how to use it exactly. I am inclined to redesign that
> part, most likely splitting it in two: either one callback for REQBUFS and one
> for CREATE_BUFS, or alternatively, one callback when allocating buffers for
> the first time (so REQBUFS and when CREATE_BUFS is called for the first time,
> i.e. when no buffers are allocated yet), and one callback when adding additional
> buffers. I would have to think about this, and probably experiment a bit.
>
> Regards,
>
> 	Hans
>
> >>
> >> If this is in place, then min_reqbufs_allocation should be set to a sane number of
> >> buffers (i.e. typically 3 or 4), and if you want precise control, use VIDIOC_CREATE_BUFS.
> >
>
>




[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux