Re: [PATCH v6 3/4] media: raspberrypi: Add support for RP1-CFE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Laurent,

On 28/10/2024 17:17, Laurent Pinchart wrote:
On Mon, Oct 28, 2024 at 12:30:45PM +0100, Hans Verkuil wrote:
On 28/10/2024 12:25, Tomi Valkeinen wrote:
On 28/10/2024 13:13, Hans Verkuil wrote:
On 28/10/2024 12:05, Tomi Valkeinen wrote:
On 28/10/2024 12:11, Hans Verkuil wrote:
On 28/10/2024 10:21, Tomi Valkeinen wrote:
On 24/10/2024 11:20, Hans Verkuil wrote:
Hi Tomi,

I know this driver is already merged, but while checking for drivers that use
q->max_num_buffers I stumbled on this cfe code:

<snip>

+/*
+ * vb2 ops
+ */
+
+static int cfe_queue_setup(struct vb2_queue *vq, unsigned int *nbuffers,
+               unsigned int *nplanes, unsigned int sizes[],
+               struct device *alloc_devs[])
+{
+    struct cfe_node *node = vb2_get_drv_priv(vq);
+    struct cfe_device *cfe = node->cfe;
+    unsigned int size = is_image_node(node) ?
+                    node->vid_fmt.fmt.pix.sizeimage :
+                    node->meta_fmt.fmt.meta.buffersize;
+
+    cfe_dbg(cfe, "%s: [%s] type:%u\n", __func__, node_desc[node->id].name,
+        node->buffer_queue.type);
+
+    if (vq->max_num_buffers + *nbuffers < 3)
+        *nbuffers = 3 - vq->max_num_buffers;

This makes no sense: max_num_buffers is 32, unless explicitly set when vb2_queue_init
is called. So 32 + *nbuffers is never < 3.

If the idea is that at least 3 buffers should be allocated by REQBUFS, then set
q->min_reqbufs_allocation = 3; before calling vb2_queue_init and vb2 will handle this
for you.

Drivers shouldn't modify *nbuffers, except in very rare circumstances, especially
since the code is almost always wrong.

Looking at this, the original code in the old BSP tree was, which somehow, along the long way, got turned into the above:

if (vq->num_buffers + *nbuffers < 3)
           *nbuffers = 3 - vq->num_buffers;

So... I think that is the same as "q->min_reqbufs_allocation = 3"?

The distinction between min_queued_buffers and
min_reqbufs_allocation, or rather the need for the latter, still
escapes me. If the HW/SW requires N buffers to be queued, why
would we require allocating more than N buffers?

min_queued_buffers is easiest to explain: that represents the requirements of the DMA
engine, i.e. how many buffers much be queued before the DMA engine can be started.
Typically it is 0, 1 or 2.

That's partly true only. Even if the hardware requires 2 buffers, a
driver can allocate scratch buffers to lower the requirement for
userspace. Setting min_queued_buffers to 1 is usually fine, as there are
few use cases for userspace to start the hardware before a buffer is
available to capture a frame to. A value of 2 is much more problematic,
as it prevents operating with a single buffer. I know using a single
buffer results in frame drops, but there are resource-constrained
systems where application don't always need all the frames (such as the
Raspberry Pi Zero for instance). I very strongly encourage drivers to
never set a min_queued_buffers value higher than 1.


min_reqbufs_allocation is the minimum number of buffers that will be allocated when
calling VIDIOC_REQBUFS in order for userspace to be able to stream without blocking
or dropping frames.

Typically this is 3 for video capture: one buffer is being DMAed, another is queued up
and the third is being processed by userspace. But sometimes drivers have other
requirements.

This is exactly why I dislike min_reqbufs_allocation when set based on
this logic, it encodes assumption on userspace use cases that a capture
driver really shouldn't make.


The reason is that some applications will just call VIDIOC_REQBUFS with count=1 and
expect it to be rounded up to whatever makes sense. See the VIDIOC_REQBUFS doc in
https://hverkuil.home.xs4all.nl/spec/userspace-api/v4l/vidioc-reqbufs.html

"It can be smaller than the number requested, even zero, when the driver runs out of
    free memory. A larger number is also possible when the driver requires more buffers
    to function correctly."

How drivers implement this is a mess, and usually the code in the driver is wrong as
well. In particular they often did not take VIDIOC_CREATE_BUFS into account, i.e.
instead of 'if (vq->num_buffers + *nbuffers < 3)' they would do 'if (*nbuffers < 3)'.

Thanks, this was educational!

So. If I have a driver that has min_queued_buffers = 1, I can use
VIDIOC_CREATE_BUFS to allocate a single buffer, and then capture
just one buffer, right? Whereas VIDIOC_REQBUFS would give me
(probably) three (or two, if the driver does not set
min_reqbufs_allocation). Three buffers makes sense for full
streaming, of course.

When we worked on the support for more than 32 buffers we added min_reqbufs_allocation
to let the core take care of this. In addition, this only applies to VIDIOC_REQBUFS,

I agree it's better to handle it in the core than in drivers, even if I
dislike the feature in the first place.

if you want full control over the number of allocated buffers, then use VIDIOC_CREATE_BUFS,
with this ioctl the number of buffers will never be more than requested, although it
may be less if you run out of memory.

On a side note, we should transition libcamera to use VIDIOC_CREATE_BUFS
unconditionally.


I really should go through all existing drivers and fix them up if they try to
handle this in the queue_setup function, I suspect a lot of them are quite messy.

One thing that is missing in the V4L2 uAPI is a way to report the minimum number of
buffers that need to be allocated, i.e. min_queued_buffers + 1. Since if you want

Hmm, so what I wrote above is not correct? One needs min_queued_buffers + 1? Why is that?

The DMA engine always uses min_queued_buffers, so if there are only that many buffers,
then it can never return a buffer to userspace! So you need one more. That's the absolute
minimum. For smooth capture you need two more to allow time for userspace to process the
buffer.

Hmm, ok, I see. Well, I guess my "I want to capture just a single frame" is not a very common case.

It's not that uncommon, see above.


Can I queue one buffer, start streaming, stop streaming, and get the
filled buffer? But then I guess I don't when the buffer has been
filled, i.e. when to call stop streaming.

Exactly. If you really want that, then the driver has to be adapted in the way that Laurent
suggested, i.e. with one or more scratch buffers. But that is not always possible, esp. with
older hardware without an IOMMU.

Drivers can always allocate a full-frame scratch buffer in the worst
case. That can waste memory though, which is less than ideal.

So, never mind, I don't actually have any use case for this, just wondering.


to use CREATE_BUFS you need that information so you know that you have to create
at least that number of buffers. We have the V4L2_CID_MIN_BUFFERS_FOR_CAPTURE control,
but it is effectively codec specific. This probably should be clarified.

I wonder if it wouldn't be better to add a min_num_buffers field to
struct v4l2_create_buffers and set it to min_queued_buffers + 1.

Don't add the +1. We should give userspace the information it needs to
make informed decisions, not make decisions on its behalf.


I think this makes sense (although I still don't get the +1).

However, based on the experiences from adding the streams features
to various ioctls, let's be very careful =). The new
'min_num_buffers' can be filled with garbage by the userspace. If
we define the 'min_num_buffers' field to be always filled by the
kernel, and any value provided from the userspace to be ignored, I
think it should work.

I've posted an RFC for this.

Thanks, I'll check it out.

For the original issue in this thread, I think the correct fix is to
remove the lines from cfe_queue_setup(), and add
"q->min_reqbufs_allocation = 3".

Or just don't set min_reqbufs_allocation ? This is a new driver, and it
requires a device-specific userspace to operate the ISP. I don't think
we need to care about applications blindly calling VIDIOC_REQBUFS(1) and
expecting to get more buffers.

It doesn't require a device-specific userspace for plain CSI-2 capture.

If I understood right, the expected behavior for VIDIOC_REQBUFS is to return enough buffers for "smooth streaming". So even if device-specific userspace would be required, doesn't it still make sense to have min_reqbufs_allocation = 3?

Or is your point that even a device-specific userspace, which knows exactly what it's doing, would use VIDIOC_REQBUFS, instead of VIDIOC_CREATE_BUFS?

Also, if I don't set min_reqbufs_allocation, VIDIOC_REQBUFS(1) would still allocate two buffers, not one.

 Tomi





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux