On 5/28/24 7:59 PM, Miklos Szeredi wrote: > On Tue, 28 May 2024 at 10:52, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >> >> Hi Peter-Jan, >> >> Thanks for the amazing work. >> >> I'd just like to know if you have any plan of making fiq and fiq->lock >> more scalable, e.g. make fiq a per-CPU software queue? > > Doing a per-CPU queue is not necessarily a good idea: async requests > could overload one queue while others are idle. It is in FUSE scenarios (instead of virtiofs). There's no 1:1 mapping between CPUs and daemon threads. All requests submitted from all CPUs are enqueued in one global pending list, and all daemon threads fetch request to be processed from this global pending list. > > One idea is to allow request to go through a per-CPU fast path if the > respective listener is idle. Otherwise the request would enter the > default slow queue, where idle listeners would pick requests (like > they do now). I guess "listener" refers to one thread of fuse daemon. When coming to virtiofs scenarios, there's 1:1 mapping between CPUs and hardware queues. The requests will only be routed to the hardware queue to which the submitter CPU maps, and thus there's no meaning still managing pending requests in one global pending list. In this case, the per-CPU pending list theoretically can reduce the lock contention on the global pending list. I guess we have an internal RFC on per-CPU pending list for virtiofs. Let me see if this really improves the performance from test. -- Thanks, Jingbo