On Tue, Oct 27, 2020 at 09:32:11PM +0000, David Woodhouse wrote: > On Tue, 2020-10-27 at 21:30 +0100, Peter Zijlstra wrote: > > On Tue, Oct 27, 2020 at 07:27:59PM +0000, David Woodhouse wrote: > > > > > > While looking at this I found that weird __add_wait_queue_exclusive() > > > > which is used by fs/eventpoll.c and does something similar, except it > > > > doesn't keep the FIFO order. > > > > > > It does, doesn't it? Except those so-called "exclusive" entries end up > > > in FIFO order amongst themselves at the *tail* of the queue, to be > > > woken up only after all the other entries before them *haven't* been > > > excluded. > > > > __add_wait_queue_exclusive() uses __add_wait_queue() which does > > list_add(). It does _not_ add at the tail like normal exclusive users, > > and there is exactly _1_ user in tree that does this. > > > > I'm not exactly sure how this happened, but: > > > > add_wait_queue_exclusive() > > > > and > > > > __add_wait_queue_exclusive() > > > > are not related :-( > > I think that goes all the way back to here: > > https://lkml.org/lkml/2007/5/4/530 > > It was rounded up in commit d47de16c72and subsequently "cleaned up" > into an inline in wait.h, but I don't think there was ever a reason for > it to be added to the head of the list instead of the tail. Maybe, I'm not sure I can tell in a hurry. I've opted to undo the above 'cleanups' > So I think we can reasonably make __add_wait_queue_exclusive() do > precisely the same thing as add_wait_queue_exclusive() does (modulo > locking). Aye, see below. > And then potentially rename them both to something that isn't quite > such a lie. And give me the one I want that *does* actually exclude > other waiters :) I don't think we want to do that; people are very much used to the current semantics. I also very much want to do: s/__add_wait_queue_entry_tail/__add_wait_queue_tail/ on top of all this. Anyway, I'll agree to your patch. How do we route this? Shall I take the waitqueue thing and stick it in a topic branch for Paolo so he can then merge that and the kvm bits on top into the KVM tree? --- diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 4df61129566d..a2a7e1e339f6 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1895,10 +1895,12 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, */ eavail = ep_events_available(ep); if (!eavail) { - if (signal_pending(current)) + if (signal_pending(current)) { res = -EINTR; - else - __add_wait_queue_exclusive(&ep->wq, &wait); + } else { + wq_entry->flags |= WQ_FLAG_EXCLUSIVE; + __add_wait_queue(wq_head, wq_entry); + } } write_unlock_irq(&ep->lock); diff --git a/fs/orangefs/orangefs-bufmap.c b/fs/orangefs/orangefs-bufmap.c index 538e839590ef..8cac3589f365 100644 --- a/fs/orangefs/orangefs-bufmap.c +++ b/fs/orangefs/orangefs-bufmap.c @@ -86,7 +86,7 @@ static int wait_for_free(struct slot_map *m) do { long n = left, t; if (likely(list_empty(&wait.entry))) - __add_wait_queue_entry_tail_exclusive(&m->q, &wait); + __add_wait_queue_exclusive(&m->q, &wait); set_current_state(TASK_INTERRUPTIBLE); if (m->c > 0) diff --git a/include/linux/wait.h b/include/linux/wait.h index 27fb99cfeb02..4b8c4ece13f7 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -171,23 +171,13 @@ static inline void __add_wait_queue(struct wait_queue_head *wq_head, struct wait list_add(&wq_entry->entry, &wq_head->head); } -/* - * Used for wake-one threads: - */ -static inline void -__add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) -{ - wq_entry->flags |= WQ_FLAG_EXCLUSIVE; - __add_wait_queue(wq_head, wq_entry); -} - static inline void __add_wait_queue_entry_tail(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { list_add_tail(&wq_entry->entry, &wq_head->head); } static inline void -__add_wait_queue_entry_tail_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) +__add_wait_queue_exclusive(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry) { wq_entry->flags |= WQ_FLAG_EXCLUSIVE; __add_wait_queue_entry_tail(wq_head, wq_entry);