On 6/7/19 4:34 PM, Tomasz Figa wrote: > On Fri, Jun 7, 2019 at 11:11 PM Hans Verkuil <hverkuil@xxxxxxxxx> wrote: >> >> On 6/7/19 3:55 PM, Marek Szyprowski wrote: >>> Hi Hans, >>> >>> On 2019-06-07 15:40, Hans Verkuil wrote: >>>> On 6/7/19 2:47 PM, Hans Verkuil wrote: >>>>> On 6/7/19 2:23 PM, Hans Verkuil wrote: >>>>>> On 6/7/19 2:14 PM, Marek Szyprowski wrote: >>>>>>> On 2019-06-07 14:01, Hans Verkuil wrote: >>>>>>>> On 6/7/19 1:16 PM, Laurent Pinchart wrote: >>>>>>>>> Thank you for the patch. >>>>>>>>> >>>>>>>>> On Fri, Jun 07, 2019 at 10:45:31AM +0200, Hans Verkuil wrote: >>>>>>>>>> The __prepare_userptr() function made the incorrect assumption that if the >>>>>>>>>> same user pointer was used as the last one for which memory was acquired, then >>>>>>>>>> there was no need to re-acquire the memory. This assumption was never properly >>>>>>>>>> tested, and after doing that it became clear that this was in fact wrong. >>>>>>>>> Could you explain in the commit message why the assumption is not >>>>>>>>> correct ? >>>>>>>> You can free the memory, then allocate it again and you can get the same pointer, >>>>>>>> even though it is not necessarily using the same physical pages for the memory >>>>>>>> that the kernel is still using for it. >>>>>>>> >>>>>>>> Worse, you can free the memory, then allocate only half the memory you need and >>>>>>>> get back the same pointer. vb2 wouldn't notice this. And it seems to work (since >>>>>>>> the original mapping still remains), but this can corrupt userspace memory >>>>>>>> causing the application to crash. It's not quite clear to me how the memory can >>>>>>>> get corrupted. I don't know enough of those low-level mm internals to understand >>>>>>>> the sequence of events. >>>>>>>> >>>>>>>> I have test code for v4l2-compliance available if someone wants to test this. >>>>>>> I'm interested, I would really like to know what happens in the mm >>>>>>> subsystem in such case. >>>>>> Here it is: >>>>>> >>>>>> diff --git a/utils/v4l2-compliance/v4l2-test-buffers.cpp b/utils/v4l2-compliance/v4l2-test-buffers.cpp >>>>>> index be606e48..9abf41da 100644 >>>>>> --- a/utils/v4l2-compliance/v4l2-test-buffers.cpp >>>>>> +++ b/utils/v4l2-compliance/v4l2-test-buffers.cpp >>>>>> @@ -797,7 +797,7 @@ int testReadWrite(struct node *node) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> -static int captureBufs(struct node *node, const cv4l_queue &q, >>>>>> +static int captureBufs(struct node *node, cv4l_queue &q, >>>>>> const cv4l_queue &m2m_q, unsigned frame_count, int pollmode, >>>>>> unsigned &capture_count) >>>>>> { >>>>>> @@ -962,6 +962,21 @@ static int captureBufs(struct node *node, const cv4l_queue &q, >>>>>> buf.s_flags(V4L2_BUF_FLAG_REQUEST_FD); >>>>>> buf.s_request_fd(buf_req_fds[req_idx]); >>>>>> } >>>>>> + if (v4l_type_is_capture(buf.g_type()) && q.g_memory() == V4L2_MEMORY_USERPTR) { >>>>>> + printf("\nidx: %d", buf.g_index()); >>>>>> + for (unsigned p = 0; p < q.g_num_planes(); p++) { >>>>>> + printf(" old buf[%d]: %p ", p, buf.g_userptr(p)); >>>>>> + fflush(stdout); >>>>>> + free(buf.g_userptr(p)); >>>>>> + void *m = calloc(1, q.g_length(p)/2); >>>>>> + >>>>>> + fail_on_test(m == NULL); >>>>>> + q.s_userptr(buf.g_index(), p, m); >>>>>> + printf("new buf[%d]: %p", p, m); >>>>>> + buf.s_userptr(m, p); >>>>>> + } >>>>>> + printf("\n"); >>>>>> + } >>>>>> fail_on_test(buf.qbuf(node, q)); >>>>>> fail_on_test(buf.g_flags() & V4L2_BUF_FLAG_DONE); >>>>>> if (buf.g_flags() & V4L2_BUF_FLAG_REQUEST_FD) { >>>>>> >>>>>> >>>>>> >>>>>> Load the vivid driver and just run 'v4l2-compliance -s10' and you'll see: >>>>>> >>>>>> ... >>>>>> Streaming ioctls: >>>>>> test read/write: OK >>>>>> test blocking wait: OK >>>>>> test MMAP (no poll): OK >>>>>> test MMAP (select): OK >>>>>> test MMAP (epoll): OK >>>>>> Video Capture: Frame #000 >>>>>> idx: 0 old buf[0]: 0x7f71c6e7c010 new buf[0]: 0x7f71c6eb4010 >>>>>> Video Capture: Frame #001 >>>>>> idx: 1 old buf[0]: 0x7f71c6e0b010 new buf[0]: 0x7f71c6e7b010 >>>>>> Video Capture: Frame #002 >>>>>> idx: 0 old buf[0]: 0x7f71c6eb4010 free(): invalid pointer >>>>>> Aborted >>>>> To clarify: two full size buffers are allocated and queued (that happens in setupUserPtr()), >>>>> then streaming starts and captureBufs is called which basically just calls dqbuf >>>>> and qbuf. >>>>> >>>>> Tomasz pointed out that all the pointers in this log are actually different. That's >>>>> correct, but here is a log where the old and new buf ptr are the same: >>>>> >>>>> Streaming ioctls: >>>>> test read/write: OK >>>>> test blocking wait: OK >>>>> test MMAP (no poll): OK >>>>> test MMAP (select): OK >>>>> test MMAP (epoll): OK >>>>> Video Capture: Frame #000 >>>>> idx: 0 old buf[0]: 0x7f1094e16010 new buf[0]: 0x7f1094e4e010 >>>>> Video Capture: Frame #001 >>>>> idx: 1 old buf[0]: 0x7f1094da5010 new buf[0]: 0x7f1094e15010 >>>>> Video Capture: Frame #002 >>>>> idx: 0 old buf[0]: 0x7f1094e4e010 new buf[0]: 0x7f1094e4e010 >>>>> Video Capture: Frame #003 >>>>> idx: 1 old buf[0]: 0x7f1094e15010 free(): invalid pointer >>>>> Aborted >>>>> >>>>> It's weird that the first log fails that way: if the pointers are different, >>>>> then vb2 will call get_userptr and it should discover that the buffer isn't >>>>> large enough, causing qbuf to fail. That doesn't seem to happen. >>>> I think that the reason for this corruption is that the memory pool used >>>> by glibc is now large enough for vb2 to think it can map the full length >>>> of the user pointer into memory, even though only the first half is actually >>>> from the buffer that's allocated. When you capture a frame you just overwrite >>>> a random part of the application's memory pool, causing this invalid pointer. >>>> >>>> But that's a matter of garbage in, garbage out. So that's not the issue here. >>>> >>>> The real question is what happens when you free the old buffer, allocate a >>>> new buffer, end up with the same userptr, but it's using one or more different >>>> pages for its memory compared to the mapping that the kernel uses. >>>> >>>> I managed to reproduce this with v4l2-ctl: >>>> >>>> diff --git a/utils/v4l2-ctl/v4l2-ctl-streaming.cpp b/utils/v4l2-ctl/v4l2-ctl-streaming.cpp >>>> index 28b2b3b9..8f2ed9b5 100644 >>>> --- a/utils/v4l2-ctl/v4l2-ctl-streaming.cpp >>>> +++ b/utils/v4l2-ctl/v4l2-ctl-streaming.cpp >>>> @@ -1422,6 +1422,24 @@ static int do_handle_cap(cv4l_fd &fd, cv4l_queue &q, FILE *fout, int *index, >>>> * has the size that fits the old resolution and might not >>>> * fit to the new one. >>>> */ >>>> + if (q.g_memory() == V4L2_MEMORY_USERPTR) { >>>> + printf("\nidx: %d", buf.g_index()); >>>> + for (unsigned p = 0; p < q.g_num_planes(); p++) { >>>> + unsigned *pb = (unsigned *)buf.g_userptr(p); >>>> + printf(" old buf[%d]: %p first pixel: 0x%x", p, buf.g_userptr(p), *pb); >>>> + fflush(stdout); >>>> + free(buf.g_userptr(p)); >>>> + void *m = calloc(1, q.g_length(p)); >>>> + >>>> + if (m == NULL) >>>> + return QUEUE_ERROR; >>>> + q.s_userptr(buf.g_index(), p, m); >>>> + if (m == buf.g_userptr(p)) >>>> + printf(" identical new buf"); >>>> + buf.s_userptr(m, p); >>>> + } >>>> + printf("\n"); >>>> + } >>>> if (fd.qbuf(buf) && errno != EINVAL) { >>>> fprintf(stderr, "%s: qbuf error\n", __func__); >>>> return QUEUE_ERROR; >>>> >>>> >>>> Load vivid, setup a pure white test pattern: >>>> >>>> v4l2-ctl -c test_pattern=6 >>>> >>>> Now run v4l2-ctl --stream-user and you'll see: >>>> >>>> idx: 0 old buf[0]: 0x7f91551cb010 first pixel: 0x80ea80ea identical new buf >>>> < >>>> idx: 1 old buf[0]: 0x7f915515a010 first pixel: 0x80ea80ea identical new buf >>>> < >>>> idx: 2 old buf[0]: 0x7f91550e9010 first pixel: 0x80ea80ea identical new buf >>>> < >>>> idx: 3 old buf[0]: 0x7f9155078010 first pixel: 0x80ea80ea identical new buf >>>> < >>>> idx: 0 old buf[0]: 0x7f91551cb010 first pixel: 0x0 identical new buf >>>> < >>>> idx: 1 old buf[0]: 0x7f915515a010 first pixel: 0x0 identical new buf >>>> < 5.00 fps >>>> >>>> idx: 2 old buf[0]: 0x7f91550e9010 first pixel: 0x0 identical new buf >>>> < >>>> idx: 3 old buf[0]: 0x7f9155078010 first pixel: 0x0 identical new buf >>>> >>>> The first four dequeued buffers are filled with data, after that the >>>> returned buffer is empty because vivid is actually writing to different >>>> memory pages. >>>> >>>> With this patch the first pixel is always non-zero. >>> >>> Good catch. The question is weather we treat that as undefined behavior >>> and keep the optimization for 'good applications' or assume that every >>> broken userspace code has to be properly handled. The good thing is that >>> there is still imho no security issue. The physical pages gathered by >> >> Yeah, that scared me for a bit, but it all looks secure. >> >>> vb2 in worst case belongs to noone else (vb2 is their last user, they >>> are not yet returned to free pages pool). >> >> I see three options: >> >> 1) just always reacquire the buffer, and if anyone complains about it >> being slower we point them towards DMABUF. >> >> 2) keep the current behavior, but document it. >> >> 3) as 2), but also add a new buffer flag that forces a reacquire of the >> buffer. This could be valid for DMABUF as well. E.g.: >> >> V4L2_BUF_FLAG_REACQUIRE >> >> I'm leaning towards the third option since it won't slow down existing >> implementations, yet if you do change the userptr every time, then you >> can now force this to work safely. >> > > I'd be for 1) or 3) as that would allow Chrome work on mainline. > > Also I believe there is still some bug when the pointers don't match, > even if you don't free those pages. I guess some more testing that > includes verifying the contents of previously dequeued buffers could > show something. I also realized that there is another problem with USERPTR: Suppose you queue buffer 1 with pointer X, then dequeue it (the mapping remains), then queue buffer 2 with the same pointer X: it will now be mapped again. I've no idea what the result of that will be. While DMABUF has the same behavior at first glance, after digging deeper into the dma_buf framework details I see that it is actually refcounting the mappings and so will not map again, instead it just returns the existing mapping. So DMABUF is fine. I think we should either go for option 1 or option 3. My preference is 3, but it depends on how often USERPTR is used in practice. Tomasz, Nicolas, do you guys have a better idea about that? Regards, Hans > >>>> I wonder if it isn't possible to just check the physical address of >>>> the received user pointer with the physical address of the previous >>>> user pointer. Or something like that. I'll dig around a bit more. >>> >>> Such check won't be so simple. Pages contiguous in the virtual memory >>> won't map to pages contiguous in the physical memory, so you would need >>> to check every single memory page. Make no sense. It is better to >>> reacquire buffer on every queue operation. This indeed show how broken >>> the USERPTR related part of v4l2 API is. >> >> OK, good to know. Then I'm not going to spend time on that. >> >> Regards, >> >> Hans