Re: [PATCH 2/2] virtio-fs: add multi-queue support

Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> · Wed, 29 May 2024 09:52:06 +0800

On 5/28/24 10:21 PM, Peter-Jan Gootzen wrote:
> On Tue, 2024-05-28 at 21:35 +0800, Jingbo Xu wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 5/28/24 7:59 PM, Miklos Szeredi wrote:
>>> On Tue, 28 May 2024 at 10:52, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hi Peter-Jan,
>>>>
>>>> Thanks for the amazing work.
>>>>
>>>> I'd just like to know if you have any plan of making fiq and fiq-
>>>>> lock
>>>> more scalable, e.g. make fiq a per-CPU software queue?
>>>
>>> Doing a per-CPU queue is not necessarily a good idea: async requests
>>> could overload one queue while others are idle.
>>
>> It is in FUSE scenarios (instead of virtiofs). There's no 1:1 mapping
>> between CPUs and daemon threads.  All requests submitted from all CPUs
>> are enqueued in one global pending list, and all daemon threads fetch
>> request to be processed from this global pending list.
> Virtiofsd also works in this thread pool manner with a global pending
> list and many daemon threads fetching from said list.
> 
>>
>>>
>>> One idea is to allow request to go through a per-CPU fast path if
>>> the
>>> respective listener is idle.  Otherwise the request would enter the
>>> default slow queue, where idle listeners would pick requests (like
>>> they do now).
>>
>> I guess "listener" refers to one thread of fuse daemon.
>>
>>
>> When coming to virtiofs scenarios, there's 1:1 mapping between CPUs
>> and
>> hardware queues.  The requests will only be routed to the hardware
>> queue
>> to which the submitter CPU maps, and thus there's no meaning still
>> managing pending requests in one global pending list.  In this case,
>> the
>> per-CPU pending list theoretically can reduce the lock contention on
>> the
>> global pending list.
>>
>>
>> I guess we have an internal RFC on per-CPU pending list for virtiofs.
>> Let me see if this really improves the performance from test.
> I am a bit confused about the software layers we are talking about now.
> We have:
> - FUSE driver (so the kernel FUSE fs driver)
> - virtio-fs driver
> - virtiofsd as a software virtio-fs device
> - Hardware virtio-fs device implementations (e.g. BlueField)
> - libfuse (connecting /dev/fuse and a FS implementation)
> - File systems that build upon: libfuse, virtiofsd or some hardware
> virtio-fs implementation
> 
> So you have a patch at hand that fixes this fiq and fiq->lock bottleneck
> in the FUSE driver, correct? In the virtio-fs driver there is no pending
> list, it just pops from the pending list of the FUSE driver.

I mean fiq->pending list (struct fuse_iqueue) when I refer to software
pending list.

__fuse_request_send
  spin_lock(&fiq->lock) <-- lock contention if multiple CPUs submit IOs
  queue_request_and_unlock
    # enqueue request to fiq->pending list
    fiq->ops->wake_pending_and_unlock(),
    i.e. virtio_fs_wake_pending_and_unlock() for virtiofs
      # fetch one request from fiq->pending list
    spin_unlock(&fiq->lock)

Both the regular /dev/fuse interface and virtiofs use "struct
fuse_iqueue" as the pending list.

-- 
Thanks,
Jingbo