On 2/14/25 9:34 AM, Eric Sandeen wrote: > (Giant disclaimer: I had not even heard of watch_queue.c until this week. > So this might be garbage, but I think at least the bug is real and I think > the analysis is correct.) > > We got a bug report stating that doing this in a loop: > > pipe2(pipefd, O_NOTIFICATION_PIPE); > ioctl(pipefd[0], IOC_WATCH_QUEUE_SET_SIZE, BUF_SIZE); I should have specified that BUF_SIZE is 256, which is why watch_queue_set_size adjusts the accounting to (256 / WATCH_QUEUE_NOTES_PER_PAGE) == 8 in my analysis. The maximum allowed in IOC_WATCH_QUEUE_SET_SIZE is 512, which won't expose the bug, but slightly smaller numbers (~480, which leads to < 16 pages), will. Thanks, -Eric > close(pipefd[0]); > close(pipefd[1]); > > as an unprivileged user would eventually yield -EPERM. The bug actually > turned up when running > https://github.com/SELinuxProject/selinux-testsuite/tree/main/tests/watchkey > > Analysis: > > The -EPERM happens because the too_many_pipe_buffers_soft() test in > watch_queue_set_size() fails. That fails because user->pipe_bufs has > underflowed. user->pipe_bufs has underflowed because (abbreviated) ... > > sys_pipe2 > __do_pipe_flags > create_pipe_files > get_pipe_inode > alloc_pipe_info > pipe_bufs = PIPE_DEF_BUFFERS; // (16) > // charge 16 bufs to user > account_pipe_buffers(user, 0, pipe_bufs); > pipe->nr_accounted = pipe_bufs; // (16) > > so now the pipe has nr_accounted set, normally to PIPE_DEF_BUFFERS (16) > > Then, the ioctl: > > IOC_WATCH_QUEUE_SET_SIZE > watch_queue_set_size(nr_notes) > nr_pages = nr_notes / WATCH_QUEUE_NOTES_PER_PAGE; // (8) > // reduce bufs charged to user for this pipe from 16 to 8 > account_pipe_buffers(pipe->user, pipe->nr_accounted, nr_pages); > nr_notes = nr_pages * WATCH_QUEUE_NOTES_PER_PAGE; > pipe_resize_ring(pipe, roundup_pow_of_two(nr_slots == nr_notes)); > if (!pipe_has_watch_queue(pipe)) > pipe->nr_accounted = nr_slots; > > Two things to note here: > > Because pipe_has_watch_queue() is true, pipe->nr_accounted is not changed. > Also, pipe_resize_ring() resized to the number of notifications, not the > number of pages, though pipe_resize_ring() seems to expect a page count? > > At this point wve hae charged 8 pages to user->pipe_bufs, but > pipe->nr_accounted is still 16. And, it seems that we have actually > allocated 8*32 pages to the pipe by virtue of calling pipe_resize_ring() > with nr_notes not nr_pages. > > Finally, we close the pipes: > > pipe_release > put_pipe_info > free_pipe_info > account_pipe_buffers(pipe->user, pipe->nr_accounted, 0); > > This subtracts pipe->nr_accounted (16) from user->pipe_bufs, even though > per above we had only actually charged 8 to user->pipe_bufs, so it's a > net reduction. Do that in a loop, and eventually user->pipe_bufs underflows. > > Maybe all that should be in the commit messages, but I'll try to reduce it for > now, any comments are appreciated. > > (This does pass the handful of watch queue tests in LTP.) > > thanks, > -Eric > >