On Tue, Oct 29, 2024 at 10:54:31AM +0100, Hans Verkuil wrote: > On 10/29/24 09:56, Laurent Pinchart wrote: > > On Tue, Oct 29, 2024 at 09:26:58AM +0100, Hans Verkuil wrote: > >> On 29/10/2024 08:01, Jacopo Mondi wrote: > >>> On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote: > >>>> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote: > >>>>> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote: > >>>>>> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote: > >>>>>>> On 28/10/2024 16:30, Jacopo Mondi wrote: > >>>>>>>> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote: > >>>>>>>>> On 28/10/2024 15:35, Jacopo Mondi wrote: > >>>>>>>>>> There apparently is no reason to require 3 queued buffers to call > >>>>>>>>>> streamon() for the RkISP1 as the driver operates with a scratch buffer > >>>>>>>>>> where frames can be directed to if there's no available buffer provided > >>>>>>>>>> by userspace. > >>>>>>>>>> > >>>>>>>>>> Reduce the number of required buffers to 0 to allow applications to > >>>>>>>>>> operate by queueing capture buffers on-demand. > >>>>>>>>>> > >>>>>>>>>> Tested with libcamera, by operating with a single capture request. The > >>>>>>>>>> same request (and associated capture buffer) gets recycled once > >>>>>>>>>> completed. This of course causes a frame rate drop but doesn't hinder > >>>>>>>>>> operations. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@xxxxxxxxxxxxxxxx> > >>>>>>>>>> --- > >>>>>>>>>> The first version of this patch set min_queued_buffers to 1, but setting it > >>>>>>>>>> to 0 doesn't compromise operations and it's even better as it allows application > >>>>>>>>>> to queue buffers to the capture devices on-demand. If a buffer is not provided > >>>>>>>>>> to the DMA engines, image data gets directed to the driver's internal scratch > >>>>>>>>>> buffer. > >>>>>>>>>> --- > >>>>>>>>>> drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +--- > >>>>>>>>>> 1 file changed, 1 insertion(+), 3 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > >>>>>>>>>> index 2bddb4fa8a5c..5fcf9731f41b 100644 > >>>>>>>>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > >>>>>>>>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c > >>>>>>>>>> @@ -35,8 +35,6 @@ > >>>>>>>>>> #define RKISP1_SP_DEV_NAME RKISP1_DRIVER_NAME "_selfpath" > >>>>>>>>>> #define RKISP1_MP_DEV_NAME RKISP1_DRIVER_NAME "_mainpath" > >>>>>>>>>> > >>>>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3 > >>>>>>>>>> - > >>>>>>>>>> enum rkisp1_plane { > >>>>>>>>>> RKISP1_PLANE_Y = 0, > >>>>>>>>>> RKISP1_PLANE_CB = 1, > >>>>>>>>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap) > >>>>>>>>>> q->ops = &rkisp1_vb2_ops; > >>>>>>>>>> q->mem_ops = &vb2_dma_contig_memops; > >>>>>>>>>> q->buf_struct_size = sizeof(struct rkisp1_buffer); > >>>>>>>>>> - q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED; > >>>>>>>>>> + q->min_queued_buffers = 0; > >>>>>>>>> > >>>>>>>>> You can probably just drop this since the vb2_queue struct is zeroed when it > >>>>>>>>> is allocated. So no need to set it to 0. > >>>>>>>> > >>>>>>>> I suspected so :) > >>>>>>>> > >>>>>>>>> > >>>>>>>>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change? > >>>>>>>> > >>>>>>>> No, and this patch removes it in facts > >>>>>>>> > >>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3 > >>>>>>>> - > >>>>>>> > >>>>>>> I should have checked the patch :-) Sorry for the noise. > >>>>>>> > >>>>>>>>> > >>>>>>>>> Also, see my RFC I posted today: > >>>>>>>>> > >>>>>>>>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xxxxxxxxx/T/#u > >>>>>>>>> > >>>>>>>>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1 > >>>>>>>>> and expect the driver to change that to a workable value, will, in fact, now just get > >>>>>>>>> one buffer. And streaming that will cause lots of frame drops. > >>>>>>>>> > >>>>>>>>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available, > >>>>>>>>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called. > >>>>>>>> > >>>>>>>> Yeah, I've read the discussion between you and Tomi and it seemed like > >>>>>>>> a good time to re-send this patch. > >>>>>>>> > >>>>>>>>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in > >>>>>>>>> libcamera to get much better control over how many buffers should be allocated. > >>>>>>>> > >>>>>>>> In my understanding min_queued_buffers identifies how many buffers > >>>>>>>> should be queued before calling start_streaming, and this comes > >>>>>>>> directly from an hw/driver requirement. This doesn't mean that at > >>>>>>>> least min_queue_buffers should be queued at all the times during > >>>>>>>> streaming, at least, I don't see how and where videobuf2 enforces > >>>>>>>> this. Or does it ? > >>>>>>> > >>>>>>> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers > >>>>>>> queued up for the DMA engine to work, then it really is always holding on > >>>>>>> to two buffers. The only thing the framework does is postpone calling > >>>>>>> start_streaming until that number of buffers is queued to ensure the > >>>>>>> DMA engine has what it needs to start. But after that vb2 doesn't check > >>>>>>> it. > >>>>>> > >>>>>> The "driver" part of "HW/driver" is important here, as drivers can > >>>>>> influence this in multiple ways. One of them is usage of scratch > >>>>>> buffers, but even without that, a DMA engine that requires two buffers > >>>>>> can easily be operated with a single buffer by programming the DMA > >>>>>> engine with the same buffer address twice. Drivers should really do so > >>>>>> unless they really can't. > >>>>>> > >>>>>>>> If the above is correct, then the number of buffers to be queued > >>>>>>>> during streaming is, in my opinion, less an hw/driver requirement but > >>>>>>>> more an application decision. > >>>>>>> > >>>>>>> No, min_queued_buffers is a HW/drivers property: the DMA engine can't > >>>>>>> start until that many buffers are queued up, and once it is started > >>>>>>> it will always hold on to that many buffers. > >>>>> > >>>>> I get it, my point was that once start_streaming has been called, even > >>>>> if min_queued_buffers=2, there is nothing preventing userspace from > >>>>> queing one buffer at the time once the first two have completed. Sure, the > >>>>> hw/driver might not like it, but while delaying start_streaming > >>>>> prevents bad things from happening, there is nothing in the core that > >>>>> prevents applications from potentially stalling the capture > >>>>> operations. > >>>>> > >>>>> But I get your point, if the system needs 2 buffers to start > >>>>> streaming, it will probably need two buffers to continue producing > >>>>> frames. > >> > >> Right, it won't be able to complete anything and return it to userspace > >> until it gets a third buffer. Only then can it return one buffer to userspace. > >> > >>>>>> That's not always true. The imx7-media-csi driver, for instance, sets > >>>>>> min_queued_buffers to 2, but allocates scratch buffers and uses them at > >>>>>> runtime, so that it can return all queued buffers to userspace. > >>>>> > >>>>> That's interesting. From your mention of "scratch buffers" I get there > >>>>> actually is a need to have 2 buffers queued to the HW ? How does > >>>>> that work, after all queuing a buffer to the DMA engine usually means > >>>>> pointing its write engine to one (set of) addresses. > >>>>> > >>>>> Or is it a driver-only requirement to ask for two buffers ? > >>>> > >>>> The hardware has a ping-pong mechanism with two addresses and switches > >>>> between them automatically. > >>>> > >>>>>> Grepping for min_queued_buffers I see drivers setting it to 4 > >>>>>> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9 > >>>>>> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at > >>>>>> runtime. Your statement is not universally true today?. > >>>>>> > >>>>>> This could be considered as driver issues, and the min_queued_buffers > >>>>>> values should be fixed to match the runtime behaviour. In some cases I > >>>>>> expect it will require more work than just changing the value, as > >>>>>> drivers may implement the logic to operate with less buffers at runtime > >>>>>> but not at start time. This would be fixable, but it may also call for > >>>>>> asking if the start at runtime behaviours need to be identical. > >>>>>> > >>>>>>> So the application has to know somehow how many buffers are needed to > >>>>>>> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to > >>>>>>> always return a workable number of buffers, the other is by actually > >>>>>>> reporting the minimum number of buffers as per my RFC. > >>>>>>> > >>>>>>>> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be > >>>>>>>> consumed by the application), but in very specific cases where an > >>>>>>>> application retains the buffer for longer, for whatever reason, it > >>>>>>>> might need a larger number of queued buffers to provide the DMA > >>>>>>>> engines a space where to write data without them being discarded (to > >>>>>>>> scratch buffers or discarded by the DMA engine itself, if the HW > >>>>>>>> supports that). Or maybe an application is fine to drop frames and > >>>>>>>> only queue buffers sporadically (if the HW supports that ofc). > >>>>>>>> > >>>>>>>> For libcamera, and for this specific platform in particular, we're > >>>>>>>> going to base new developments on the assumption that > >>>>>>>> min_queued_buffers == 0, and it would be more convenient for use to be > >>>>>>>> able to access its value from userspace to identify if we're running > >>>>>>>> on a kernel with or without this patch being applied. > >>>>>>> > >>>>>>> So my proposal in my RFC to expose min_num_buffers would work for libcamera? > >>>>>>> It sounds like that's what you need. > >>>>> > >>>>> My immediate need is to know if I'm running on a "legacy" version of > >>>>> this driver that still requires 3 buffers for no apparent reason, or > >>>>> on a new version. Your proposal might work, but I still feel like we > >>>>> should report the HW/driver requirement (min_queued_buffers) instead > >>>>> of trying to suggest applications how many buffers they need to > >>>>> allocate to get "smooth streaming" or similar, as the use cases > >>>>> might be different. > >>>>> > >>>>>> It may be useful, but I think we may also just require min_num_buffers > >>>>>> == 0 for a device to be supported in libcamera. We have to implement > >>>>> > >>>>> While I concur this would be ideal, how would it work for existing > >>>>> rkisp1 implementation that do not include this patch ? libcamera > >>>>> should be able to run on both, probably in two different "modes" /o\ > >>>> > >>>> A simple option is to check the kernel version, we do that in a few > >>>> places. Over time we'll increase the minimum kernel version and drop > >>> > >>> I considered that, but this patch is pretty easy to backport, having > >>> something that tells to userspace the value of > >>> min_queued_buffers might be useful indeed. > >> > >> The implementation is similar to that of the new max_num_buffers field > >> in struct v4l2_create_buffers: it will be signaled by a new buffer > >> capability flag. So you can check that in the code. Much better than > >> relying on kernel versions. > > > > Sure, I agree that an explicit API is better. I'm not a big fan of > > adding a field to v4l2_create_buffers though (I didn't realize it was > > extended with a max_num_buffers field) as it would require calling > > VIDIOC_CREATE_BUFS to get the information. That can get quite awkward to > > use for userspace. > > Why would that be awkward? If called with count = 0, then it will just > report the capabilities and it will not create any buffers. I wasn't aware you could call it with count = 0 for that purpose. This addresses my concerns. > > This being said, libcamera today won't behave properly with rkisp1, as > > it won't be able to give the application the last buffers, as guaranteed > > by the libcamera API. Relying on min_num_buffers == 0 will fix it and > > won't cause any regression on kernels that still use 3, it will just > > keep the current behaviour. > > > >>>> support for legacy APIs. > >>>> > >>>>>> APIs such as the Android camera HAL that has no concept of buffers being > >>>>>> kept by the device. This could possibly be handled within libcamera by > >>>>>> allocating scratch buffers in userspace, but that comes with other > >>>>>> challenges. I would like to at least try to get help from the kernel > >>>>>> until proven that it's a bad idea. -- Regards, Laurent Pinchart