On Mon, Nov 2, 2020 at 1:51 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Nov 2, 2020 at 9:49 AM Soheil Hassas Yeganeh <soheil@xxxxxxxxxx> wrote: > > > > Thank you for the suggestion. That was the first version I tried, and > > I can confirm it fixes the race because we call ep_send_events() once > > more before returning. Though, I believe, due to time_out=1, we won't > > goto fetch_events to call ep_events_available(): > > > > if (!res && eavail && > > !(res = ep_send_events(ep, events, maxevents)) && !timed_out) > > goto fetch_events; > > Right. We won't be repeating the loop, but we will do one final send_events. > > Which I think is really the point, no? Yes, absolutely, that's the point. > > You're spot on that the patch is more complicated than your > > suggestion. However, the downside I observed was a performance > > regression for the non-racy case: Suppose there are a few threads with > > a similar non-zero timeout and no ready event. They will all > > experience a noticeable contention in ep_scan_ready_list, by > > unconditionally calling ep_send_events(). The contention was large > > because there will be 2 write locks on ep->lock and one mutex lock on > > ep->mtx with a large critical section. > > Ugh. I really detest the eventpoll code. Afaik, it has no normal users > anywhere, so it gets no real coverage except for the odd cases > (presumably inside google?) > > A lot of the work over the last couple of years has been to try to > simplify the code and streamline it, and fix bugs due to recursion. > > I really wish we would continue that pattern of trying to simplify > this code rather than add more complexity on top of it, which is why I > reacted to strongly to that patch. > > And that whole ep_poll() function is written in just about the most > confusing way possible, with code that looks like loops but aren't > ("while (0)") and goto's that _are_ loops ("goto fetch_events"). I wholeheartedly agree. Due to its cryptic implementation, It was really difficult to think about the correctness of the fixes. On Mon, Nov 2, 2020 at 2:39 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Nov 2, 2020 at 10:51 AM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > I'll go stare at it some more. > > That code is fundamentally broken in so many ways. > > Look at how ep_poll() does that ep_events_available() without > actually holding the ep->lock (or the ep->mtx) in half the cases. > > End result: it works in 99.9% of all cases, but there's a race with > somebody else doing > > WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR); > /* > * Quickly re-inject items left on "txlist". > */ > list_splice(&txlist, &ep->rdllist); > > when that code can see an empty rdllist and an inactive ovflist and > thus decide that there are no events available. > > I think the "Quickly re-inject" comment may be because some people > knew of that race. > > The signal handling is also odd and looks broken. > > The "short circuit on fatal signals" means that ep_send_events() isn't > actually done on a SIGKILL, but the code also used an exclusive wait, > so nobody else will be woken up either. > > Admittedly you can steal wakeups other ways, by simply not caring > about the end result, so maybe that's all just inherent in epoll > anyway. But it looks strange, and it seems pointless: the right thing > to do would seem to be simply to have a regular check for > signal_pending(), and returning -EINTR if rather than looping. > > And that do { } while (0) is entirely pointless. It seems to exist in > order to use "break" instead of the goto that everything else does, > which I guess is nice, except the whole need for that comes from how > oddly the code is written. > > Why doesn't this all do something like the attached instead? This looks really great to me! Thank you! I'll give it a try and get back to you soon. > NOTE! I did not bother to fix that ep_events_available() race. Given that you're calling ep_events_available() under lock, I think this should address the inefficiency for the non-racy timeout case, I mentioned above. The remaining races are preexisting and all result in spurious events, which should be fine. Thanks again! Soheil > Linus > I'll go stare at it some more. > > Linus