On Wed, 2024-05-29 at 09:52 +0800, Jingbo Xu wrote: > External email: Use caution opening links or attachments > > > On 5/28/24 10:21 PM, Peter-Jan Gootzen wrote: > > On Tue, 2024-05-28 at 21:35 +0800, Jingbo Xu wrote: > > > External email: Use caution opening links or attachments > > > > > > > > > On 5/28/24 7:59 PM, Miklos Szeredi wrote: > > > > On Tue, 28 May 2024 at 10:52, Jingbo Xu > > > > <jefflexu@xxxxxxxxxxxxxxxxx> > > > > wrote: > > > > > > > > > > Hi Peter-Jan, > > > > > > > > > > Thanks for the amazing work. > > > > > > > > > > I'd just like to know if you have any plan of making fiq and > > > > > fiq- > > > > > > lock > > > > > more scalable, e.g. make fiq a per-CPU software queue? > > > > > > > > Doing a per-CPU queue is not necessarily a good idea: async > > > > requests > > > > could overload one queue while others are idle. > > > > > > It is in FUSE scenarios (instead of virtiofs). There's no 1:1 > > > mapping > > > between CPUs and daemon threads. All requests submitted from all > > > CPUs > > > are enqueued in one global pending list, and all daemon threads > > > fetch > > > request to be processed from this global pending list. > > Virtiofsd also works in this thread pool manner with a global > > pending > > list and many daemon threads fetching from said list. > > > > > > > > > > > > > One idea is to allow request to go through a per-CPU fast path > > > > if > > > > the > > > > respective listener is idle. Otherwise the request would enter > > > > the > > > > default slow queue, where idle listeners would pick requests > > > > (like > > > > they do now). > > > > > > I guess "listener" refers to one thread of fuse daemon. > > > > > > > > > When coming to virtiofs scenarios, there's 1:1 mapping between > > > CPUs > > > and > > > hardware queues. The requests will only be routed to the hardware > > > queue > > > to which the submitter CPU maps, and thus there's no meaning still > > > managing pending requests in one global pending list. In this > > > case, > > > the > > > per-CPU pending list theoretically can reduce the lock contention > > > on > > > the > > > global pending list. > > > > > > > > > I guess we have an internal RFC on per-CPU pending list for > > > virtiofs. > > > Let me see if this really improves the performance from test. > > I am a bit confused about the software layers we are talking about > > now. > > We have: > > - FUSE driver (so the kernel FUSE fs driver) > > - virtio-fs driver > > - virtiofsd as a software virtio-fs device > > - Hardware virtio-fs device implementations (e.g. BlueField) > > - libfuse (connecting /dev/fuse and a FS implementation) > > - File systems that build upon: libfuse, virtiofsd or some hardware > > virtio-fs implementation > > > > So you have a patch at hand that fixes this fiq and fiq->lock > > bottleneck > > in the FUSE driver, correct? In the virtio-fs driver there is no > > pending > > list, it just pops from the pending list of the FUSE driver. > > I mean fiq->pending list (struct fuse_iqueue) when I refer to software > pending list. > > __fuse_request_send > spin_lock(&fiq->lock) <-- lock contention if multiple CPUs submit > IOs > queue_request_and_unlock > # enqueue request to fiq->pending list > fiq->ops->wake_pending_and_unlock(), > i.e. virtio_fs_wake_pending_and_unlock() for virtiofs > # fetch one request from fiq->pending list > spin_unlock(&fiq->lock) > > Both the regular /dev/fuse interface and virtiofs use "struct > fuse_iqueue" as the pending list. Okay I see now. I am very eager to hear your result, take a look at the patch and see how it evaluates on our system :) > > -- > Thanks, > Jingbo - Peter-Jan