Hi Hans, On Mon, Oct 28, 2024 at 12:10:22PM +0100, Hans Verkuil wrote: > Hi all, > > This mail thread uncovered some corner cases around how many buffers should be allocated > if VIDIOC_REQBUFS with count = 1 is called: > > https://lore.kernel.org/linux-media/20241003-rp1-cfe-v6-0-d6762edd98a8@xxxxxxxxxxxxxxxx/T/#mc2210597d92b5a0f09fabdac2f7307128aaa9bd8 I'll repeat below some comments I've made in that thread, as they're better discussed in the context of this RFC. > When it comes to the minimum number of buffers there are a number of limitations: > > 1) The DMA engine needs at least N buffers to be queued before it can start. Typically > this is 0, 1 or 2, and a driver sets this via the vb2_queue min_queued_buffers field. > So if min_queued_buffers = 1, then the DMA engine needs one buffer at all times to > DMA to. Allocating just one buffer would mean the DMA engine can never return that > buffer to userspace (it would just keep recycling the same buffer over and over), so > the minimum must be min_queued_buffers + 1. I think you're mixing hardware and driver constraints here. Drivers can use scratch buffers to relax the hardware requirements, and allow userspace operation with less buffers than strictly required by the hardware. The cost of allocating such scratch buffers vary depending on the device. When an IOMMU is available, or when the device has a line stride that can be set to 0 and supports race-free programming of the stride and buffer addresses, the scratch buffer can be as small as a single page or a single line. In other cases, a full-frame scratch buffer is required, which is costly, and the decision on whether or not to allocate such a scratch buffer should probably be taken with userspace being involved. min_queued_buffers describes how the device operates from a userspace point of view, so I don't think it should be considered or documented as being a hardware requirement, but a driver requirement. > 2) Historically VIDIOC_REQBUFS is expected to increase the count value to a number that > ensures the application can smoothly process the video stream. Typically this will > be 3 or 4 (if min_queued_buffers == 2): min_queued_buffers are used by the DMA engine, > one buffer is queued up in vb2, ready to be used by the DMA engine as soon as it > returns a filled buffer to userspace, and one buffer is processed by userspace. > > This is to support applications that call VIDIOC_REQBUFS with count = 1 and leave it > to the driver to increment it to a workable value. Do we know what those applications are ? I'm not disputing the fact that this may need to be supported to avoid breaking old userspace, but I also think this feature should be phased out for new drivers, especially drivers that require a device-specific userspace and therefore won't work out of the box with old applications. > 3) Stateful codecs in particular have additional requirements beyond the DMA engine > limits due to the fact that they have to keep track of reference buffers and other > codec limitations. As such more buffers are needed, and that number might also vary > based on the specific codec used. The V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT > controls are used to report that. Support for this is required by the stateful codec > API. > > The documentation of these controls suggest that these are generic controls, but > as of today they are only used by stateful codec drivers. > > 4) Some corner cases (mainly/only SDR, I think) where you need more than the usual > 3 or 4 buffers since the buffers arrive at a high frequency. High frame rates is an important feature, but it's also a can of worms. V4L2 is lacking the ability to batch multiple frames, we will have to address that. Hopefully it could be decoupled from this RFC. > Rather than have drivers try to correct the count value (typically incorrectly), the > vb2_queue min_reqbufs_allocation field was added to set the minimum number of > buffers that VIDIOC_REQBUFS should allocate if count is less than that. Even if I dislike this feature, I agree it's better implemented through min_reqbufs_allocation than by manual calculations in drivers. > VIDIOC_CREATE_BUFS is not affected by that: if you use CREATE_BUFS you take full control > of how many buffers you want to create. It might create fewer buffers if you run out of > memory, but never more than requested. > > But what is missing is that if you use CREATE_BUFS you need to know the value of > min_queued_buffers + 1, and that is not exposed. > > I would propose to add a min_num_buffers field to struct v4l2_create_buffers > and add a V4L2_BUF_CAP_SUPPORTS_MIN_NUM_BUFFERS flag to signal the presence of > that field. And vb2 can set it to min_queued_buffers + 1. This would require allocating a buffer first to get the value. Wouldn't a read-only control be better ? Furthermore, I would rather provide the min_queued_buffers value instead of min_queued_buffers + 1. The V4L2 API should provide userspace with information it needs to make informed decisions, but not make those decisions in behalf of userspace. It's up to applications to add 1 or more buffers depending on their use case. I think we also need to discuss policies regarding scratch buffer allocation in the context of this RFC. When the hardware supports small scratch buffers, I would like to make it mandatory for drivers to do so and support min_queued_buffers = 0. When scratch buffers are expensive, do we want to still support them in the kernel, perhaps in a way controlled by userspace ? A userspace that can guarantee it will always provide min_queued_buffers + 1 buffers could indicate so and avoid scratch buffer allocation, while a userspace that can't provide that guarantee would get scratch buffers from the kernel. > The second proposal is to explicitly document that the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE/OUTPUT > are for stateful codec support only, at least for now. > > If this is in place, then min_reqbufs_allocation should be set to a sane number of > buffers (i.e. typically 3 or 4), and if you want precise control, use VIDIOC_CREATE_BUFS. -- Regards, Laurent Pinchart