Re: RFC: add min_num_buffers and clarify V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/10/2024 16:52, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Mon, Oct 28, 2024 at 12:10:22PM +0100, Hans Verkuil wrote:
>> Hi all,
>>
>> This mail thread uncovered some corner cases around how many buffers should be allocated
>> if VIDIOC_REQBUFS with count = 1 is called:
>>
>> https://lore.kernel.org/linux-media/20241003-rp1-cfe-v6-0-d6762edd98a8@xxxxxxxxxxxxxxxx/T/#mc2210597d92b5a0f09fabdac2f7307128aaa9bd8
> 
> I'll repeat below some comments I've made in that thread, as they're
> better discussed in the context of this RFC.
> 
>> When it comes to the minimum number of buffers there are a number of limitations:
>>
>> 1) The DMA engine needs at least N buffers to be queued before it can start. Typically
>>    this is 0, 1 or 2, and a driver sets this via the vb2_queue min_queued_buffers field.
>>    So if min_queued_buffers = 1, then the DMA engine needs one buffer at all times to
>>    DMA to. Allocating just one buffer would mean the DMA engine can never return that
>>    buffer to userspace (it would just keep recycling the same buffer over and over), so
>>    the minimum must be min_queued_buffers + 1.
> 
> I think you're mixing hardware and driver constraints here. Drivers can
> use scratch buffers to relax the hardware requirements, and allow
> userspace operation with less buffers than strictly required by the
> hardware.
> 
> The cost of allocating such scratch buffers vary depending on the
> device. When an IOMMU is available, or when the device has a line stride
> that can be set to 0 and supports race-free programming of the stride
> and buffer addresses, the scratch buffer can be as small as a single
> page or a single line. In other cases, a full-frame scratch buffer is
> required, which is costly, and the decision on whether or not to
> allocate such a scratch buffer should probably be taken with userspace
> being involved.

I honestly don't see why you would want to spend a lot of time on adding
scratch buffer support just to save a bit of memory. Is the use-case of
capturing just a single buffer so common? To me it seems that it only
makes sense to spend effort on this if you only need to capture a single
buffer and never need to stream more buffers.

Can you describe the use-case of capturing just a single buffer? Is that
just for testing libcamera? Or is it something that happens all the time
during normal libcamera operation?

Supporting scratch buffers is a lot of effort for something that is not
needed for normal streaming.

> 
> min_queued_buffers describes how the device operates from a userspace
> point of view, so I don't think it should be considered or documented as
> being a hardware requirement, but a driver requirement.

It's a hardware and/or driver requirement. It is absolutely not a userspace
requirement. Normal userspace applications that use VIDIOC_REQBUFS and just
stream video will never notice this.

> 
>> 2) Historically VIDIOC_REQBUFS is expected to increase the count value to a number that
>>    ensures the application can smoothly process the video stream. Typically this will
>>    be 3 or 4 (if min_queued_buffers == 2): min_queued_buffers are used by the DMA engine,
>>    one buffer is queued up in vb2, ready to be used by the DMA engine as soon as it
>>    returns a filled buffer to userspace, and one buffer is processed by userspace.
>>
>>    This is to support applications that call VIDIOC_REQBUFS with count = 1 and leave it
>>    to the driver to increment it to a workable value.
> 
> Do we know what those applications are ? I'm not disputing the fact that
> this may need to be supported to avoid breaking old userspace, but I
> also think this feature should be phased out for new drivers, especially
> drivers that require a device-specific userspace and therefore won't
> work out of the box with old applications.

xawtv is one: it will call REQBUFS with count = 2 (so this would fail for
any driver that sets min_queued_buffers to 2), and with count = 1 if it wants
to capture just a single frame.

'git grep min_queued_buffers|grep -v videobuf|wc' gives me 83 places where it is
set. Some of those are likely wrong (min_queued_buffers has been abused as a
replacement for min_reqbufs_allocation), but still that's quite a lot.

Mostly these are older drivers for hardware without an IOMMU and typically for
SDTV capture. So memory is not a consideration for those drivers since a
SDTV buffer is quite small.

> 
>> 3) Stateful codecs in particular have additional requirements beyond the DMA engine
>>    limits due to the fact that they have to keep track of reference buffers and other
>>    codec limitations. As such more buffers are needed, and that number might also vary
>>    based on the specific codec used. The V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT
>>    controls are used to report that. Support for this is required by the stateful codec
>>    API.
>>
>>    The documentation of these controls suggest that these are generic controls, but
>>    as of today they are only used by stateful codec drivers.
>>
>> 4) Some corner cases (mainly/only SDR, I think) where you need more than the usual
>>    3 or 4 buffers since the buffers arrive at a high frequency.
> 
> High frame rates is an important feature, but it's also a can of worms.
> V4L2 is lacking the ability to batch multiple frames, we will have to
> address that. Hopefully it could be decoupled from this RFC.

It's a separate issue indeed. I just mentioned it because I know SDR drivers
use this. They are rarely used, though.

> 
>> Rather than have drivers try to correct the count value (typically incorrectly), the
>> vb2_queue min_reqbufs_allocation field was added to set the minimum number of
>> buffers that VIDIOC_REQBUFS should allocate if count is less than that.
> 
> Even if I dislike this feature, I agree it's better implemented through
> min_reqbufs_allocation than by manual calculations in drivers.
> 
>> VIDIOC_CREATE_BUFS is not affected by that: if you use CREATE_BUFS you take full control
>> of how many buffers you want to create. It might create fewer buffers if you run out of
>> memory, but never more than requested.
>>
>> But what is missing is that if you use CREATE_BUFS you need to know the value of
>> min_queued_buffers + 1, and that is not exposed.
>>
>> I would propose to add a min_num_buffers field to struct v4l2_create_buffers
>> and add a V4L2_BUF_CAP_SUPPORTS_MIN_NUM_BUFFERS flag to signal the presence of
>> that field. And vb2 can set it to min_queued_buffers + 1.
> 
> This would require allocating a buffer first to get the value. Wouldn't
> a read-only control be better ?

No. You can call CREATE_BUFS with count = 0: in that case it does nothing,
except filling in all those capabilities. It was designed with that in mind
so you have an ioctl that can return all that information.

> 
> Furthermore, I would rather provide the min_queued_buffers value instead
> of min_queued_buffers + 1. The V4L2 API should provide userspace with
> information it needs to make informed decisions, but not make those
> decisions in behalf of userspace. It's up to applications to add 1 or
> more buffers depending on their use case.

I would definitely want more opinions on this. What's the point of returning
min_queued_buffers and then creating that many buffers and still not be able
to stream?

Can you think of a scenario (e.g. in libcamera or elsewhere) where that makes
sense?

Also, will the average V4L2 user have the knowledge to understand that? You
have that knowledge, but I think for anyone else it would be really confusing.

> 
> I think we also need to discuss policies regarding scratch buffer
> allocation in the context of this RFC. When the hardware supports small
> scratch buffers, I would like to make it mandatory for drivers to do so
> and support min_queued_buffers = 0.

I would first like to know the use-case (as I mentioned above).

For the type of drivers I mostly work with (video receivers), it would just
be a lot of work for no gain. But perhaps for camera pipelines it does make
sense?

> When scratch buffers are expensive, do we want to still support them in
> the kernel, perhaps in a way controlled by userspace ? A userspace that
> can guarantee it will always provide min_queued_buffers + 1 buffers
> could indicate so and avoid scratch buffer allocation, while a userspace
> that can't provide that guarantee would get scratch buffers from the
> kernel.

That is really the difference between using VIDIOC_REQBUFS and VIDIOC_CREATE_BUFS.
I.e., userspace can already choose this.

Just to clarify the reason for this RFC: the current situation is messy. There
is a lot of history and a lot of older drivers do not always do the right thing.

With this RFC I would like to get a consensus of how it should work. After that
I want to implement any missing bits and improve the documentation, and finally
go through the drivers and at least try to make them behave consistently.

Also I want to improve v4l2-compliance to test more corner cases, especially
if you use CREATE_BUFS instead of REQBUFS (I already have a patch for that
ready).

The work Benjamin did on increasing the max number of supported buffers and the
REMOVE_BUFS ioctl uncovered a lot of that messy history, and it is clear we need
to try and clarify how it should work.

>> The second proposal is to explicitly document that the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT
>> are for stateful codec support only, at least for now.

I just discovered that v4l2-compliance and v4l2-ctl do not honor these controls
for stateful codecs. That's something that needs to be fixed.

There is also one other item that I would like to discuss: the vb2 queue_setup
callback is currently used for both REQBUFS and CREATE_BUFS, and it remains
confusing for drivers how to use it exactly. I am inclined to redesign that
part, most likely splitting it in two: either one callback for REQBUFS and one
for CREATE_BUFS, or alternatively, one callback when allocating buffers for
the first time (so REQBUFS and when CREATE_BUFS is called for the first time,
i.e. when no buffers are allocated yet), and one callback when adding additional
buffers. I would have to think about this, and probably experiment a bit.

Regards,

	Hans

>>
>> If this is in place, then min_reqbufs_allocation should be set to a sane number of
>> buffers (i.e. typically 3 or 4), and if you want precise control, use VIDIOC_CREATE_BUFS.
> 





[Index of Archives]     [Linux Input]     [Video for Linux]     [Gstreamer Embedded]     [Mplayer Users]     [Linux USB Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [Yosemite Backpacking]

  Powered by Linux