On Mon, Nov 2, 2020 at 10:51 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > I'll go stare at it some more. That code is fundamentally broken in so many ways. Look at how ep_poll() does that ep_events_available() without actually holding the ep->lock (or the ep->mtx) in half the cases. End result: it works in 99.9% of all cases, but there's a race with somebody else doing WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR); /* * Quickly re-inject items left on "txlist". */ list_splice(&txlist, &ep->rdllist); when that code can see an empty rdllist and an inactive ovflist and thus decide that there are no events available. I think the "Quickly re-inject" comment may be because some people knew of that race. The signal handling is also odd and looks broken. The "short circuit on fatal signals" means that ep_send_events() isn't actually done on a SIGKILL, but the code also used an exclusive wait, so nobody else will be woken up either. Admittedly you can steal wakeups other ways, by simply not caring about the end result, so maybe that's all just inherent in epoll anyway. But it looks strange, and it seems pointless: the right thing to do would seem to be simply to have a regular check for signal_pending(), and returning -EINTR if rather than looping. And that do { } while (0) is entirely pointless. It seems to exist in order to use "break" instead of the goto that everything else does, which I guess is nice, except the whole need for that comes from how oddly the code is written. Why doesn't this all do something like the attached instead? NOTE! I did not bother to fix that ep_events_available() race. Linus
Attachment:
patch
Description: Binary data