On 9/29/2022 12:59 PM, Sagi Grimberg wrote:
Hi Sagi,
On 9/28/2022 10:55 PM, Sagi Grimberg wrote:
Our mpath stack device is just a shim that selects a bottom namespace
and submits the bio to it without any fancy splitting. This also means
that we don't clone the bio or have any context to the bio beyond
submission. However it really sucks that we don't see the mpath device
io stats.
Given that the mpath device can't do that without adding some context
to it, we let the bottom device do it on its behalf (somewhat similar
to the approach taken in nvme_trace_bio_complete);
Can you please paste the output of the application that shows the
benefit of this commit ?
What do you mean? there is no noticeable effect on the application here.
With this patch applied, /sys/block/nvmeXnY/stat is not zeroed out,
sysstat and friends can monitor IO stats, as well as other observability
tools.
I meant the output for iostat application/tool.
This will show us the double accounting I mentioned bellow.
I guess it's the same situation we have today with /dev/dm-0 and its
underlying devices /dev/sdb and /dev/sdc for example.
This should be explained IMO.
Signed-off-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
---
drivers/nvme/host/apple.c | 2 +-
drivers/nvme/host/core.c | 10 ++++++++++
drivers/nvme/host/fc.c | 2 +-
drivers/nvme/host/multipath.c | 18 ++++++++++++++++++
drivers/nvme/host/nvme.h | 12 ++++++++++++
drivers/nvme/host/pci.c | 2 +-
drivers/nvme/host/rdma.c | 2 +-
drivers/nvme/host/tcp.c | 2 +-
drivers/nvme/target/loop.c | 2 +-
9 files changed, 46 insertions(+), 6 deletions(-)
Several questions:
1. I guess that for the non-mpath case we get this for free from the
block layer for each bio ?
blk-mq provides all IO stat accounting, hence it is on by default.
2. Now we have doubled the accounting, haven't we ?
Yes. But as I listed in the cover-letter, I've been getting complaints
about how IO stats appear only for the hidden devices (blk-mq devices)
and there is an non-trivial logic to map that back to the mpath device,
which can also depend on the path selection logic...
I think that this is very much justified, the observability experience
sucks. IMO we should have done it since introducing nvme-multipath.
3. Do you have some performance numbers (we're touching the fast path
here) ?
This is pretty light-weight, accounting is per-cpu and only wrapped by
preemption disable. This is a very small price to pay for what we gain.
I don't have any performance numbers, other than on my laptop VM that
did not record any noticeable difference, which I don't expect to have.
4. Should we enable this by default ?
Yes. there is no reason why nvme-mpath should be the only block device
that does not account and expose IO stats.