On Tue, 7 Dec 2021 at 15:25, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > FIFO means the thread used longest ago gets to go first. If your threads > are an idempotent workers, FIFO might not be the best option. But I'm > not much familiar with the FUSE code or it's design. Okay. Did some experiments, but couldn't see wake_up_interruptible_sync() actually migrate the woken task, the behavior was identical to wake_up_interruptible(). I guess this is the "less" part in "more or less", but it would be good to see more clearly what is happening. I'll try to describe the design to give more context: - FUSE is similar to network filesystem in that there's a server and a client, except both are on the same host. The client lives in the kernel and the server lives in userspace. - Communication between them is done with read and write syscalls. - Usually the server has multiple threads. When a server thread is idle it is blocking in sys_read -> ... -> fuse_dev_do_read -> wait_event_interruptible_exclusive(fiq->waitq,...). - When a filesystem request comes in (e.g. mkdir) a request is constructed, put on the input queue (fiq->pending) and fiq->waitq woken up. After this the client task goes to sleep in request_wait_answer -> wait_event_interruptible(req->waitq, ...). - The server thread takes the request off the pending list, copies the data to the userspace buffer and puts the request on the processing list. - The userspace part interprets the read buffer, performs the fs operation, and writes the reply. - During the write(2) the reply is now copied to the kernel and the request is looked up on the processing list. The client is woken up through req->waitq. After returning from write(2) the server thread again calls read(2) to get the next request. - After being woken up, the client task now returns with the result of the operation. - The above example is for synchronous requests. There are async requests like readahead or buffered writes. In that case the client does not call request_wait_answer() but returns immediately and the result is processed from the server thread using a callback function (req->args->end()). >From a scheduling prospective it would be ideal if the server thread's CPU was matched to the client thread's CPU, since that would make the data stay local, and for synchronous requests a _sync type wakeup is perfect, since the client goes to sleep just as the server starts processing and vice versa. Always migrating the woken server thread to the client's CPU is not going to be good, since this would result in too many migrations and would loose locality for the server's stack. Another idea is to add per-cpu input queues. The client then would queue the request on the pending queue corresponding to its CPU and wake up the server thread blocked on that queue. What happens though if this particular queue has no servers? Or if a queue is starved because it's served by less threads than another? Handing these cases seems really complicated. Is there a simper way? Thanks, Miklos