Re: [RFC/PATCH net-next 0/9] net/dim: Support for multiple implementations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I agree that blk is not the most successful name, we were trying to find
something that would work for general storage applications. I think
rdma_dim would work as it is completion based but then when we want to
use it for nvme it will probably require code duplication.

Lets worry about reuse NVMe when it is actually applicable.

Also in the internal review Yamin added a table that summarize all the
testing that were done using NVMeoF (I guess it somehow didn't get to
this RFC).

I guess we can do the same for iSER to get more confidence and then
set both to create modifiable cq (if HCA supports, of course).

Agreed ?

I think that adding a flag in create_cq will be less clean as it will
require more work for anyone writing applications that should not have
to consider this feature.

Based on the results I saw during testing I would set it to work by
default as I could not find a use case where it significantly reduces
performance and in many cases it is a large improvement. It should be
more of an opt out situation.

By detailed performance results I meant:
1. Full latency histogram for QD=1 both for single queue and multi-queue
(including max, 99.99% and 99.999% percentiles)
2. latency vs. IOPs graph/table for both single queue and multi-queue
3. At least some measurement/analysis of how well the algorithm is
   adapting to workload change dynamically and how quickly.
4. Test also with real NVMe devices.

Also, we need to separate the host side moderation and the target
side moderation to understand if/how they effect each other.

Its very easy to show that for high stress workloads you can get an
improvement as obviously there is a clear win for interrupt moderation,
however, if this was the only metric that is interesting, we wouldn't
need it to be adaptive.

As I said before, this adds entropy to the equation which in certain use
cases can make more harm than good, and we need to quantify where is the
impact and understand how important they are compared to the extremely
niche use-case of a single host pushing 2M-8M IOPs.

Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
      NVMf between two equal end-hosts with 56 cores across a Mellanox switch
      using null_blk device:

      IO READS before:
      blk size | BW      | IOPS | 99th percentile latency
      512B     | 3.2GiB  | 6.6M | 1549  usec
      4k       | 7.2GiB  | 1.8M | 7177  usec
      64k      | 10.7GiB | 176k | 82314 usec

I've seen this before, why are we not getting 100Gb/s for 4k with CX5?
I recall we used to get it with CX4.

      IO READS after:
      blk size | BW      | IOPS | 99th percentile latency
      512B     | 4.2GiB  | 8.6M | 1729   usec
      4k       | 10.5GiB | 2.7M | 5669   usec
      64k      | 10.7GiB | 176k | 102000 usec

      IO WRITES before:
      blk size | BW      | IOPS | 99th percentile latency
      512B     | 3GiB    | 6.2M | 2573  usec
      4k       | 7.2GiB  | 1.8M | 5342  usec
      64k      | 10.7GiB | 176k | 62129 usec

      IO WRITES after:
      blk size | BW      | IOPS  | 99th percentile latency
      512B     | 4.2GiB  | 8.6M  | 938   usec
      4k       | 10.2GiB | 2.68M | 2769  usec
      64k      | 10.6GiB | 173k  | 87557 usec

The fact that the 64k 99% latency is substantially higher (20+
milliseconds) without any BW benefit, while its not a very interesting
measurement, gives me an indication that a more detailed analysis needs
to be made here to understand where are the trade-offs.

It doesn't really make a difference to me how the option is implemented
but I think it makes more sense to have it dealt with by us such as in a
module parameter and not something like a flag that has a larger radius
of effect.

I was suggesting a sysctl global parameter for global behavior and of
someone wants to override it it can add a CQ flag (which follows the
common net params exactly).



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux