On 10/17/2023 10:27 PM, Jason Wang wrote:
On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 10/16/2023 7:35 PM, Jason Wang wrote:
On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote:
On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote:
On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
On 10/12/2023 8:01 PM, Jason Wang wrote:
On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote:
Devices with on-chip IOMMU or vendor specific IOTLB implementation
may need to restore iotlb mapping to the initial or default state
using the .reset_map op, as it's desirable for some parent devices
to solely manipulate mappings by its own, independent of virtio device
state. For instance, device reset does not cause mapping go away on
such IOTLB model in need of persistent mapping. Before vhost-vdpa
is going away, give them a chance to reset iotlb back to the initial
state in vhost_vdpa_cleanup().
Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx>
---
drivers/vhost/vdpa.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 851535f..a3f8160 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
return vhost_vdpa_alloc_as(v, asid);
}
+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->reset_map)
+ ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
hlist_del(&as->hash_link);
vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+ /*
+ * Devices with vendor specific IOMMU may need to restore
+ * iotlb to the initial or default state which is not done
+ * through device reset, as the IOTLB mapping manipulation
+ * could be decoupled from the virtio device life cycle.
+ */
Should we do this according to whether IOTLB_PRESIST is set?
Well, in theory this seems like so but it's unnecessary code change
actually, as that is the way how vDPA parent behind platform IOMMU works
today, and userspace doesn't break as of today. :)
Well, this is one question I've ever asked before. You have explained
that one of the reason that we don't break userspace is that they may
couple IOTLB reset with vDPA reset as well. One example is the Qemu.
As explained in previous threads [1][2], when IOTLB_PERSIST is not set
it doesn't necessarily mean the iotlb will definitely be destroyed
across reset (think about the platform IOMMU case), so userspace today
is already tolerating enough with either good or bad IOMMU.
I'm confused, how to define tolerating here?
Tolerating defined as QEMU has to proactively unmap before reset just to
workaround the driver bug (on-chip maps out of sync), unconditionally
for platform or on-chip. While we all know it doesn't have to do so for
platform IOMMU, though userspace has no means to distinguish. That said,
userspace is sacrificing reset time performance on platform IOMMU setup
just for working around buggy implementation in the other setup.
Ok, so what you actually mean is that userspace can tolerate the "bug"
with the performance penalty.
Right.
For example, if it has tolerance, why bother?
I'm not sure I get the question. But I think userspace is compromising
because of buggy implementation in a few drivers doesn't mean we should
uniformly enforce such behavior for all set_map/dma_map implementations.
This is not my point. I meant, we can fix we need a negotiation in
order to let some "buggy" old user space to survive from the changes.
Userspace is no buggy today, how to define "buggy"? Userspace with
tolerance could survive just fine no matter if this negotiation or buggy
driver behavior emulation is around or not. If any userspace doesn't
tolerate, it can work still fine on good on-chip IOMMU or platform
IOMMU, no matter if the negotiation is around or not.
This code of
not checking IOTLB_PERSIST being set is intentional, there's no point to
emulate bad IOMMU behavior even for older userspace (with improper
emulation to be done it would result in even worse performance).
I can easily imagine a case:
The old Qemu that works only with a setup like mlx5_vdpa.
Noted, seems to me there's no such case of a userspace implementation
that only works with mlx5_vdpa or its friends, but doesn't work with the
others e.g. platform IOMMU, or well behaving on-chip IOMMU
implementations.
It's not hard to think of a case where:
1) the environment has mlx5_vdpa only
2) kernel doc can't have endless details, so when developing
application, the author notice IOTLB is cleared during reset
I get it, but my question was that, even if the author had noticed IOTLB
is cleared during reset, does he care or not to make IOTLB back working
again? My point is that, if this old setup is supposed to "work" on
mlx5_vdpa, then the developer must come up with sort of "quirk" to
recover the IOTLB to make it back to working state again after the
reset. It will be more justified to come up with the proper fix for
compatibility/emulation only until we know what should be expected to
work and through which possible means to making it back to work, rather
than blindly emulate the buggy behavior solely based on a few driver's
own implementation. I'm pretty sure there are multiple ways to implement
the buggy reset behavior in the driver, does it mean we have to emulate
various corrupted mapping states in the individual on-chip iommu itself?
How is it able to help the developer user if we are able to replicate
the same corrupted mapping state in the on-chip iommu after reset, any
real-life user only cares about mapping being corrupted in the same way,
rather than cares more about the quirk sequence or work around to get
iotlb maps out of the broken state?
Only if the userspace is like a test facility to expect some test case
to fail on mlx5_vdpa after reset -- I assume that is not real-life user
at all.
The Unmap+remap trick around vdpa reset works totally
fine for platform IOMMU, except with sub-optimal performance. Other than
this trick, I cannot easily think of other means or iotlb message
sequence for userspace to recover the bogus state and make iotlb back to
work again after reset.
Yes for sure, but we can't audit every user space, no?
We don't have to, as userspace here has no bug at all. The bug exists in
the driver not in userspace. Real life userspace app only cares about
making things work not asserting something must be broken.
Are we talking about hypnosis that has no real
basis to exist in the real world?
Instead of trying to answer these hard questions, I would go another
way. That is, stick to the old behaviour when IOTLB_PRESISIT is not
set by the backend. This is much easier.
Please be noted the old (broken) behavior can vary between different
parent driver implementations. It's driver's specific own problem, if
there are N ways to for driver to implement buggy .reset, do we have to
emulate N flavors of different vdpa reset behavior?
If we do
this without a negotiation, IOTLB will not be clear but the Qemu will
try to re-program the IOTLB after reset. Which will break?
1) stick the exact old behaviour with just one line of check
It's not just one line of check here, the old behavior emulation has to
be done as Eugenio illustrated in the other email.
For vhost-vDPA it's just
if (IOTLB_PERSIST is acked by userspace)
reset_map()
For parent, it's somehow similar:
during .reset()
if (IOTLB_PERSIST is not acked by userspace)
reset_vendor_mappings()
Anything I missed here?
First, the ideal fix would be to leave this reset_vendor_mappings()
emulation code on the individual driver itself, which already has the
broken behavior. But today there's no backend feature negotiation
between vhost-vdpa and the parent driver. Do we want to send down the
acked_backend_features to parent drivers?