Re: [PATCH v4 17/25] ibnbd: client: main functionality

Danil Kipnis <danil.kipnis@xxxxxxxxxxxxxxx> · Fri, 27 Sep 2019 11:32:44 +0200



On Fri, Sep 27, 2019 at 10:52 AM Roman Penyaev <r.peniaev@xxxxxxxxx> wrote:
>
> No, it seems this thingy is a bit different.  According to my
> understanding patches 3 and 4 from this patchset do the
> following: 1# split equally the whole queue depth on number
> of hardware queues and 2# return tag number which is unique
> host-wide (more or less similar to unique_tag, right?).
>
> 2# is not needed for ibtrs, and 1# can be easy done by dividing
> queue_depth on number of hw queues on tag set allocation, e.g.
> something like the following:
>
>     ...
>     tags->nr_hw_queues = num_online_cpus();
>     tags->queue_depth  = sess->queue_deph / tags->nr_hw_queues;
>
>     blk_mq_alloc_tag_set(tags);
>
>
> And this trick won't work out for the performance.  ibtrs client
> has a single resource: set of buffer chunks received from a
> server side.  And these buffers should be dynamically distributed
> between IO producers according to the load.  Having a hard split
> of the whole queue depth between hw queues we can forget about a
> dynamic load distribution, here is an example:
>
>    - say server shares 1024 buffer chunks for a session (do not
>      remember what is the actual number).
>
>    - 1024 buffers are equally divided between hw queues, let's
>      say 64 (number of cpus), so each queue is 16 requests depth.
>
>    - only several CPUs produce IO, and instead of occupying the
>      whole "bandwidth" of a session, i.e. 1024 buffer chunks,
>      we limit ourselves to a small queue depth of an each hw
>      queue.
>
> And performance drops significantly when number of IO producers
> is smaller than number of hw queues (CPUs), and it can be easily
> tested and proved.
>
> So for this particular ibtrs case tags should be globally shared,
> and seems (unfortunately) there is no any other similar requirements
> for other block devices.
I don't see any difference between what you describe here and 100 dm
volumes sitting on top of a single NVME device.