On Wed, Apr 20, 2022 at 3:25 PM Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > Hey again, > > On Wed, Apr 20, 2022 at 02:15:45AM +0200, Jason A. Donenfeld wrote: > > Hi Jann, > > > > On Tue, Apr 19, 2022 at 9:45 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > AFAIK this also means that if you make an epoll watch for > > > /proc/sys/kernel/random/fork_event, and then call poll() *on the epoll > > > fd* for some reason, that will probably already consume the event; and > > > if you then try to actually receive the epoll event via epoll_wait(), > > > it'll already be gone (because epoll tries to re-poll the "ready" > > > files to figure out what state those files are at now). Similarly if > > > you try to create an epoll watch for an FD that already has an event > > > pending: Installing the watch will call the ->poll handler once, > > > resetting the file's state, and the following epoll_wait() will call > > > ->poll again and think the event is already gone. See the call paths > > > to vfs_poll() in fs/eventpoll.c. > > > > > > Maybe we don't care about such exotic usage, and are willing to accept > > > the UAPI inconsistency and slight epoll breakage of plumbing > > > edge-triggered polling through APIs designed for level-triggered > > > polling. IDK. > > > > Hmm, I see. The thing is, this is _already_ what's done for > > domainname/hostname. It's how the sysctl poll handler was "designed". > > So our options here are: > > > > a) Remove this quirky behavior from domainname/hostname and start > > over. This would potentially break userspace, but maybe nobody uses > > this? No idea, but sounds risky. > > > > b) Apply this commit as-is, because it's using the API as the API was > > designed, and call it a day. > > > > c) Apply this commit as-is, because it's using the API as the API was > > designed, and then later try to fix up the epoll behavior on this. > > > > Of these, (a) seems like a non-starter. (c) is most appealing, but it > > sounds like it might not actually be possible? > > > > Jason > > I actually tried to verify your concern but didn't have success doing > so. My point is that when you run this code: $ cat edgepoll.c #include <time.h> #include <stdio.h> #include <fcntl.h> #include <err.h> #include <unistd.h> #include <poll.h> #include <sys/epoll.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int epfd = SYSCHK(epoll_create1(0)); int hostname_fd = SYSCHK(open("/proc/sys/kernel/hostname", O_RDONLY)); struct epoll_event event = { .events = EPOLLERR, .data = { .u32 = 1234 } }; SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, hostname_fd, &event)); while (1) { struct pollfd pollfds[1] = { { .fd = epfd, .events = POLLIN } }; int poll_res = poll(pollfds, 1, -1); if (poll_res == -1) { perror("poll() error"); continue; } if (poll_res == 0) { printf("poll(): no events ready (can't happen, we're using timeout=-1)\n"); continue; } struct epoll_event events[1]; int epoll_res = epoll_wait(epfd, events, 1, 0); if (epoll_res == -1) { perror("epoll error"); continue; } if (epoll_res == 0) { printf("spurious epoll readiness\n"); continue; } printf("got epoll fd readiness: events=0x%x, u32=%u\n", events[0].events, events[0].data.u32); } } $ gcc -o edgepoll edgepoll.c $ ./edgepoll and then change the hostname, you'll just get "spurious epoll readiness" logged - simply calling poll() on the epoll FD resets the state of the hostname file that is being polled, so when we then try to receive the epoll event with epoll_wait(), the event is gone. And the other case is this: $ cat edgepoll2.c #include <time.h> #include <stdio.h> #include <fcntl.h> #include <err.h> #include <unistd.h> #include <poll.h> #include <sys/epoll.h> #define SYSCHK(x) ({ \ typeof(x) __res = (x); \ if (__res == (typeof(x))-1) \ err(1, "SYSCHK(" #x ")"); \ __res; \ }) int main(void) { int epfd = SYSCHK(epoll_create1(0)); int hostname_fd = SYSCHK(open("/proc/sys/kernel/hostname", O_RDONLY)); printf("opened hostname fd, sleeping\n"); sleep(10); printf("done sleeping\n"); struct epoll_event event = { .events = EPOLLERR, .data = { .u32 = 1234 } }; SYSCHK(epoll_ctl(epfd, EPOLL_CTL_ADD, hostname_fd, &event)); struct epoll_event events[1]; int epoll_res = SYSCHK(epoll_wait(epfd, events, 1, 0)); if (epoll_res == 0) errx(1, "no epoll events ready"); printf("got epoll fd readiness: events=0x%x, u32=%u\n", events[0].events, events[0].data.u32); } $ gcc -o edgepoll2 edgepoll2.c $ ./edgepoll2 opened hostname fd, sleeping done sleeping edgepoll2: no epoll events ready $ If you change the hostname when "opened hostname fd, sleeping" is printed, it'll still say "edgepoll2: no epoll events ready", because the EPOLL_CTL_ADD basically consumed the event.