Re: [patch 13/15] epoll: check ep_events_available() upon timeout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 2, 2020 at 1:51 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 2, 2020 at 9:49 AM Soheil Hassas Yeganeh <soheil@xxxxxxxxxx> wrote:
> >
> > Thank you for the suggestion. That was the first version I tried, and
> > I can confirm it fixes the race because we call ep_send_events() once
> > more before returning.  Though, I believe, due to time_out=1, we won't
> > goto fetch_events to call ep_events_available():
> >
> > if (!res && eavail &&
> >    !(res = ep_send_events(ep, events, maxevents)) && !timed_out)
> >  goto fetch_events;
>
> Right. We won't be repeating the loop, but we will do one final send_events.
>
> Which I think is really the point, no?

Yes, absolutely, that's the point.

> > You're spot on that the patch is more complicated than your
> > suggestion.  However, the downside I observed was a performance
> > regression for the non-racy case: Suppose there are a few threads with
> > a similar non-zero timeout and no ready event. They will all
> > experience a noticeable contention in ep_scan_ready_list, by
> > unconditionally calling ep_send_events(). The contention was large
> > because there will be 2 write locks on ep->lock and one mutex lock on
> > ep->mtx with a large critical section.
>
> Ugh. I really detest the eventpoll code. Afaik, it has no normal users
> anywhere, so it gets no real coverage except for the odd cases
> (presumably inside google?)
>
> A lot of the work over the last couple of years has been to try to
> simplify the code and streamline it, and fix bugs due to recursion.
>
> I really wish we would continue that pattern of trying to simplify
> this code rather than add more complexity on top of it, which is why I
> reacted to strongly to that patch.
>
> And that whole ep_poll() function is written in just about the most
> confusing way possible, with code that looks like loops but aren't
> ("while (0)") and goto's that _are_ loops ("goto fetch_events").

I wholeheartedly agree. Due to its cryptic implementation, It was
really difficult to think about the correctness of the fixes.

On Mon, Nov 2, 2020 at 2:39 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Nov 2, 2020 at 10:51 AM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > I'll go stare at it some more.
>
> That code is fundamentally broken in so many ways.
>
> Look at how  ep_poll() does that ep_events_available() without
> actually holding the ep->lock (or the ep->mtx) in half the cases.
>
> End result: it works in 99.9% of all cases, but there's a race with
> somebody else doing
>
>         WRITE_ONCE(ep->ovflist, EP_UNACTIVE_PTR);
>         /*
>          * Quickly re-inject items left on "txlist".
>          */
>         list_splice(&txlist, &ep->rdllist);
>
> when that code can see an empty rdllist and an inactive ovflist and
> thus decide that there are no events available.
>
> I think the "Quickly re-inject" comment may be because some people
> knew of that race.
>
> The signal handling is also odd and looks broken.
>
> The "short circuit on fatal signals" means that ep_send_events() isn't
> actually done on a SIGKILL, but the code also used an exclusive wait,
> so nobody else will be woken up either.
>
> Admittedly you can steal wakeups other ways, by simply not caring
> about the end result, so maybe that's all just inherent in epoll
> anyway. But it looks strange, and it seems pointless: the right thing
> to do would seem to be simply to have a regular check for
> signal_pending(), and returning -EINTR if rather than looping.
>
> And that do { } while (0) is entirely pointless. It seems to exist in
> order to use "break" instead of the goto that everything else does,
> which I guess is nice, except the whole need for that comes from how
> oddly the code is written.
>
> Why doesn't this all do something like the attached instead?

This looks really great to me! Thank you!  I'll give it a try and get
back to you soon.

> NOTE! I did not bother to fix that ep_events_available() race.

Given that you're calling ep_events_available() under lock, I think
this should address the inefficiency for the non-racy timeout case,  I
mentioned above. The remaining races are preexisting and all result in
spurious events, which should be fine.

Thanks again!
Soheil

>                       Linus



> I'll go stare at it some more.
>
>                    Linus



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux