Hey again, On Wed, Apr 20, 2022 at 02:15:45AM +0200, Jason A. Donenfeld wrote: > Hi Jann, > > On Tue, Apr 19, 2022 at 9:45 PM Jann Horn <jannh@xxxxxxxxxx> wrote: > > AFAIK this also means that if you make an epoll watch for > > /proc/sys/kernel/random/fork_event, and then call poll() *on the epoll > > fd* for some reason, that will probably already consume the event; and > > if you then try to actually receive the epoll event via epoll_wait(), > > it'll already be gone (because epoll tries to re-poll the "ready" > > files to figure out what state those files are at now). Similarly if > > you try to create an epoll watch for an FD that already has an event > > pending: Installing the watch will call the ->poll handler once, > > resetting the file's state, and the following epoll_wait() will call > > ->poll again and think the event is already gone. See the call paths > > to vfs_poll() in fs/eventpoll.c. > > > > Maybe we don't care about such exotic usage, and are willing to accept > > the UAPI inconsistency and slight epoll breakage of plumbing > > edge-triggered polling through APIs designed for level-triggered > > polling. IDK. > > Hmm, I see. The thing is, this is _already_ what's done for > domainname/hostname. It's how the sysctl poll handler was "designed". > So our options here are: > > a) Remove this quirky behavior from domainname/hostname and start > over. This would potentially break userspace, but maybe nobody uses > this? No idea, but sounds risky. > > b) Apply this commit as-is, because it's using the API as the API was > designed, and call it a day. > > c) Apply this commit as-is, because it's using the API as the API was > designed, and then later try to fix up the epoll behavior on this. > > Of these, (a) seems like a non-starter. (c) is most appealing, but it > sounds like it might not actually be possible? > > Jason I actually tried to verify your concern but didn't have success doing so. Both of these worked: int efd = epoll_create1(0); assert(efd >= 0); struct epoll_event event = { .data.fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY) }; assert(event.data.fd >= 0); assert(epoll_ctl(efd, EPOLL_CTL_ADD, event.data.fd, &event) == 0); for (;;) { assert(epoll_wait(efd, &event, 1, -1) == 1); puts("vm fork detected"); } And: int efd = epoll_create1(0); assert(efd >= 0); struct epoll_event event = { .data.fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY) }; assert(event.data.fd >= 0); assert(epoll_ctl(efd, EPOLL_CTL_ADD, event.data.fd, &event) == 0); for (;;) { assert(poll(&(struct pollfd){ .fd = efd, .events = POLLIN }, 1, -1) == 1); puts("vm fork detected"); } It also worked if I added EPOLLET to the epoll_event. It did not work if I removed POLLIN from the pollfd event. Maybe I'm missing some subtlety. But what exactly is broken? (Either way, it doesn't change the (a) vs (c) calculus in my previous email.) Jason