Re: [RFC PATCH 3/3] nvme: add the "debug" host driver

Javier González <javier@xxxxxxxxxxx> · Fri, 4 Feb 2022 12:34:23 +0100

On 04.02.2022 09:58, Chaitanya Kulkarni wrote:
On 2/4/22 12:24 AM, Javier González wrote:
On 04.02.2022 07:58, Chaitanya Kulkarni wrote:
On 2/3/22 22:28, Damien Le Moal wrote:
On 2/4/22 12:12, Chaitanya Kulkarni wrote:

One can instantiate scsi devices with qemu by using fake scsi
devices,
but one can also just use scsi_debug to do the same. I see both
efforts
as desirable, so long as someone mantains this.

Why do you think both efforts are desirable ?

When testing code using the functionality, it is far easier to get said
functionality doing a simple "modprobe" rather than having to setup a
VM. C.f. running blktests or fstests.

agree on simplicity but then why do we have QEMU implementations for
the NVMe features (e.g. ZNS, NVMe Simple Copy) ? we can just build
memoery backed NVMeOF test target for NVMe controller features.

Also, recognizing the simplicity I proposed initially NVMe ZNS
fabrics based emulation over QEMU (I think I still have initial state
machine implementation code for ZNS somewhere), those were "nacked" for
the right reason, since we've decided go with QEMU and use that as a
primary platform for testing, so I failed to understand what has
changed.. since given that QEMU already supports NVMe simple copy ...

I was not part of this conversation, but as I see it each approach give
a benefit. QEMU is fantastic for compliance testing and I am not sure
you get the same level of command analysis anywhere else; at least not
without writing dedicated code for this in a target.

This said, when we want to test for race conditions, QEMU is very slow.

Can you please elaborate the scenario and numbers for slowness of QEMU?

QEMU is an emulator, not a simulator. So we will not be able to stress
the host stack in the same way the null_blk device does. If we want to
test code in the NVMe driver then we need a way to have the equivalent
to the null_blk in NVMe. It seems like the nvme-loop target can achieve
this.

Does this answer your concern?

For race conditions testing we can build error injection framework
around the code implementation which present in kernel everywhere.

True. This is also a good way to do this.

For a software-only solution, we have experimented with something
similar to the nvme-debug code tha Mikulas is proposing. Adam pointed to
the nvme-loop target as an alternative and this seems to work pretty
nicely. I do not believe there should be many changes to support copy
offload using this.

If QEMU is so incompetent then we need to add every big feature into
the NVMeOF test target so that we can test it better ? is that what
you are proposing ? since if we implement one feature, it will be
hard to nack any new features that ppl will come up with
same rationale "with QEMU being slow and hard to test race
conditions etc .."

In my opinion, if people want this and is willing to maintain it, there
is a case for it.

and if that is the case why we don't have ZNS NVMeOF target
memory backed emulation ? Isn't that a bigger and more
complicated feature than Simple Copy where controller states
are involved with AENs ?

I think this is a good idea.

ZNS kernel code testing is also done on QEMU, I've also fixed
bugs in the ZNS kernel code which are discovered on QEMU and I've not
seen any issues with that. Given that simple copy feature is way smaller
than ZNS it will less likely to suffer from slowness and etc (listed
above) in QEMU.

QEMU is super useful: it is easy and it help identifying many issues.
But it is for compliance, not for performance. There was an effort to
make FEMU, but this seems to be an abandoned project.

my point is if we allow one, we will be opening floodgates and we need
to be careful not to bloat the code unless it is _absolutely
necessary_ which I don't think it is based on the simple copy
specification.

I understand, and this is a very valid point. It seems like the
nvme-loop device can give a lot of what we need; all the necessary extra
logic can go into the null_blk and then we do not need NVMe specific
code.

Do you see any inconvenient with this approach?

So in my view having both is not replication and it gives more
flexibility for validation, which I believe it is always good.