On 03/05/2015 04:15 AM, Ingo Molnar wrote: > * Jason Baron <jbaron@xxxxxxxxxx> wrote: > >> 2) We are using the wakeup in this case to 'assign' work more >> permanently to the thread. That is, in the case of a listen socket >> we then add the connected socket to the woken up threads local set >> of epoll events. So the load persists past the wake up. And in this >> case, doing the round robin wakeups, simply allows us to access more >> cpu bandwidth. (I'm also looking into potentially using cpu affinity >> to do the wakeups as well as you suggested.) > So this is the part that I still don't understand. > > What difference does LIFO versus FIFO wakeups make to CPU utilization: > a thread waiting for work is idle, no matter whether it ran most > recently or least recently. > > Once an idle worker thread is woken it will compute its own work, for > whatever time it needs to, and won't be bothered by epoll again until > it finished its work and starts waiting again. > > So regardless the wakeup order it's the same principal bandwidth > utilization, modulo caching artifacts [*] and modulo scheduling > artifacts [**]: So just adding the wakeup source as 'exclusive', I think would give much of the desired behavior as you point out. In the first patch posting I separated 'exclusive' from 'rotate' (where rotate depended on exclusive), since the idle threads will tend to get assigned the new work vs. the busy threads as you point out and the workload naturally spreads out (modulo the artifacts you mentioned). However, I added the 'rotate' b/c I'm assigning work via the wakeup that persists past the wakeup point. So without the rotate I might end up assigning a lot of work to always say the first thread if its always idle. And then I might get a large burst of work queued to it at some later point. The rotate is intended to address this case. To use some pseudo-code in hopes of clarifying things, each thread is roughly doing: epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd...); while(1) { epoll_wait(epfd...); fd = accept(listen_fd...); epoll_ctl(epfd, EPOLL_CTL_ADD, fd...); ...do any additional desired fd processing... } So since the work persists past the wakeup point (after the 'fd' has been assigned to the epfd set of the local thread), I am trying to balance out future load. This is an issue that current userspace has to address in various ways. In our case, we periodically remove all epfds from the listen socket, and then re-add in a different order periodically. Another alternative that was suggested by Eric was to have a dedicated thread(s), to do the assignment. So these approaches can work to an extent, but they seem at least to me to complicate userspace somewhat. And at least in our case, its not providing as good balancing as this approach. So I am trying to use epoll in a special way to do work assignment. I think the model is different from the standard waker/wakee model. So to that end, in this v3 version, I've attempted to isolate all the changes to be contained within epoll to reflect that fact. Thanks, -Jason > > [*] Caching artifacts: in that sense Andrew's point stands: given > multiple equivalent choices it's more beneficial to pick a thread > that was most recently used (and is thus most cache-hot - i.e. > the current wakeup behavior), versus a thread that was least > recently used (and is thus the most cache-cold - i.e. the > round-robin wakeup you introduce). > > [**] The hack patch I posted in my previous reply. > > Thanks, > > Ingo > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html