On 12/5/18 6:16 AM, Roman Penyaev wrote: > On 2018-12-04 18:23, Jason Baron wrote: >> On 12/3/18 6:02 AM, Roman Penyaev wrote: > > [...] > >>> >>> ep_set_busy_poll_napi_id(epi); >>> >>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v >>> */ >>> if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) { >>> if (epi->next == EP_UNACTIVE_PTR) { >>> - epi->next = ep->ovflist; >>> - ep->ovflist = epi; >>> + /* Atomically exchange tail */ >>> + epi->next = xchg(&ep->ovflist, epi); >> >> This also relies on the fact that the same epi can't be added to the >> list in parallel, because the wait queue doing the wakeup will have the >> wait_queue_head lock. That was a little confusing for me b/c we only had >> the read_lock() above. > > Yes, that is indeed not obvious path, but wq is locked by wake_up_*_poll() > call or caller of wake_up_locked_poll() has to be sure wq.lock is taken. > > I'll add an explicit comment here, thanks for pointing out. > >> >>> if (epi->ws) { >>> /* >>> * Activate ep->ws since epi->ws may get >>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v >>> >>> /* If this file is already in the ready list we exit soon */ >>> if (!ep_is_linked(epi)) { >>> - list_add_tail(&epi->rdllink, &ep->rdllist); >>> + list_add_tail_lockless(&epi->rdllink, &ep->rdllist); >>> ep_pm_stay_awake_rcu(epi); >>> } >> >> same for this. > > ... and an explicit comment here. > >> >>> >>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v >>> break; >>> } >>> } >>> - wake_up_locked(&ep->wq); >>> + wake_up(&ep->wq); >> >> why the switch here to the locked() variant? Shouldn't the 'reader' >> side, in this case which takes the rwlock for write see all updates in a >> coherent state at this point? > > lockdep inside __wake_up_common expects wq_head->lock is taken, and > seems this is not a good idea to leave wq naked on wake up path, > when several CPUs can enter wake function. Although default_wake_function > is protected by spinlock inside try_to_wake_up(), but for example > autoremove_wake_function() can't be called concurrently for the same wq > (it removes wq entry from the list). Also in case of bookmarks > __wake_up_common adds an entry to the list, thus can't be called without > any locks. > > I understand you concern and you are right saying that read side sees > wq entries as stable, but that will work only if __wake_up_common does > not modify anything, that is seems true now, but of course it is > too scary to rely on that in the future. I think it might be interesting for, at least testing, to see if not grabbing wq.lock improves your benchmarks any further? fwiw, epoll only recently started grabbing wq.lock bc lockdep required it. Thanks, -Jason