On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > > > On 10/16/2023 7:35 PM, Jason Wang wrote: > > On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >> > >> > >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote: > >>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 10/12/2023 8:01 PM, Jason Wang wrote: > >>>>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation > >>>>>>> may need to restore iotlb mapping to the initial or default state > >>>>>>> using the .reset_map op, as it's desirable for some parent devices > >>>>>>> to solely manipulate mappings by its own, independent of virtio device > >>>>>>> state. For instance, device reset does not cause mapping go away on > >>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa > >>>>>>> is going away, give them a chance to reset iotlb back to the initial > >>>>>>> state in vhost_vdpa_cleanup(). > >>>>>>> > >>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++ > >>>>>>> 1 file changed, 16 insertions(+) > >>>>>>> > >>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > >>>>>>> index 851535f..a3f8160 100644 > >>>>>>> --- a/drivers/vhost/vdpa.c > >>>>>>> +++ b/drivers/vhost/vdpa.c > >>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v, > >>>>>>> return vhost_vdpa_alloc_as(v, asid); > >>>>>>> } > >>>>>>> > >>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid) > >>>>>>> +{ > >>>>>>> + struct vdpa_device *vdpa = v->vdpa; > >>>>>>> + const struct vdpa_config_ops *ops = vdpa->config; > >>>>>>> + > >>>>>>> + if (ops->reset_map) > >>>>>>> + ops->reset_map(vdpa, asid); > >>>>>>> +} > >>>>>>> + > >>>>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>>>> { > >>>>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid); > >>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>>>> > >>>>>>> hlist_del(&as->hash_link); > >>>>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid); > >>>>>>> + /* > >>>>>>> + * Devices with vendor specific IOMMU may need to restore > >>>>>>> + * iotlb to the initial or default state which is not done > >>>>>>> + * through device reset, as the IOTLB mapping manipulation > >>>>>>> + * could be decoupled from the virtio device life cycle. > >>>>>>> + */ > >>>>>> Should we do this according to whether IOTLB_PRESIST is set? > >>>>> Well, in theory this seems like so but it's unnecessary code change > >>>>> actually, as that is the way how vDPA parent behind platform IOMMU works > >>>>> today, and userspace doesn't break as of today. :) > >>>> Well, this is one question I've ever asked before. You have explained > >>>> that one of the reason that we don't break userspace is that they may > >>>> couple IOTLB reset with vDPA reset as well. One example is the Qemu. > >>>> > >>>>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set > >>>>> it doesn't necessarily mean the iotlb will definitely be destroyed > >>>>> across reset (think about the platform IOMMU case), so userspace today > >>>>> is already tolerating enough with either good or bad IOMMU. > > I'm confused, how to define tolerating here? > > Tolerating defined as QEMU has to proactively unmap before reset just to > workaround the driver bug (on-chip maps out of sync), unconditionally > for platform or on-chip. While we all know it doesn't have to do so for > platform IOMMU, though userspace has no means to distinguish. That said, > userspace is sacrificing reset time performance on platform IOMMU setup > just for working around buggy implementation in the other setup. Ok, so what you actually mean is that userspace can tolerate the "bug" with the performance penalty. > > > For example, if it has tolerance, why bother? > I'm not sure I get the question. But I think userspace is compromising > because of buggy implementation in a few drivers doesn't mean we should > uniformly enforce such behavior for all set_map/dma_map implementations. This is not my point. I meant, we can fix we need a negotiation in order to let some "buggy" old user space to survive from the changes. > > > > >>>> This code of > >>>>> not checking IOTLB_PERSIST being set is intentional, there's no point to > >>>>> emulate bad IOMMU behavior even for older userspace (with improper > >>>>> emulation to be done it would result in even worse performance). > > I can easily imagine a case: > > > > The old Qemu that works only with a setup like mlx5_vdpa. > Noted, seems to me there's no such case of a userspace implementation > that only works with mlx5_vdpa or its friends, but doesn't work with the > others e.g. platform IOMMU, or well behaving on-chip IOMMU > implementations. It's not hard to think of a case where: 1) the environment has mlx5_vdpa only 2) kernel doc can't have endless details, so when developing application, the author notice IOTLB is cleared during reset > The Unmap+remap trick around vdpa reset works totally > fine for platform IOMMU, except with sub-optimal performance. Other than > this trick, I cannot easily think of other means or iotlb message > sequence for userspace to recover the bogus state and make iotlb back to > work again after reset. Yes for sure, but we can't audit every user space, no? > Are we talking about hypnosis that has no real > basis to exist in the real world? Instead of trying to answer these hard questions, I would go another way. That is, stick to the old behaviour when IOTLB_PRESISIT is not set by the backend. This is much easier. > > > If we do > > this without a negotiation, IOTLB will not be clear but the Qemu will > > try to re-program the IOTLB after reset. Which will break? > > > > 1) stick the exact old behaviour with just one line of check > It's not just one line of check here, the old behavior emulation has to > be done as Eugenio illustrated in the other email. For vhost-vDPA it's just if (IOTLB_PERSIST is acked by userspace) reset_map() For parent, it's somehow similar: during .reset() if (IOTLB_PERSIST is not acked by userspace) reset_vendor_mappings() Anything I missed here? > In addition, the > emulation has to limit to those buggy drivers as I don't feel this > emulation should apply uniformly to all future set_map/dma_map > implementations. Unfortunately, it's a must to stick to ABI. I agree it's a mess but we don't have a better choice. Or we can fail the probe if userspace doesn't ack this feature. > > 2) audit all the possible cases to avoid a one line of code > > > > 1) seems much easier than 2) > You see it's more than just one line of code, and I'm uncertain if the > additional complexity is warranted or necessary, particularly if added > this piece of compatibility code will linger for quite a long time. This is a must as long as it can be noticed by userspace. Doing something conservative makes more sense to me. > Instead of adding hypothetical code change for no specific good reason > and no real use case, It's not adding something new or new behaviours, it's just making the IOTLB reset conditional based on vDPA reset. > I'd like to add the code when we find out a > specific use case that may get impacted or already being affected, It doesn't conflict with what you proposed here. Old behaviours have their users, no? > then > we will have good understanding how to code up the fix and emulate > properly for compatibility, while not affecting other good implementations. The issue is, even if we can't find a userspace now. It doesn't mean we can't have one in the future. Then it might be too late or too tricky to fix them. We had a lot of lessons in the past. Thanks > > Thanks, > -Siwe/i/ > > > > >>>> For two reasons: > >>>> > >>>> 1) backend features need acked by userspace this is by design > >>>> 2) keep the odd behaviour seems to be more safe as we can't audit > >>>> every userspace program > >>>> > >>> The old behavior (without flag ack) cannot be trusted already, as: > > Possibly but the point is to unbreak userspace no matter how weird the > > behaviour we've ever had. > > > >>> * Devices using platform IOMMU (in other words, not implementing > >>> neither .set_map nor .dma_map) does not unmap memory at virtio reset. > >>> * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do > >>> reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb > >>> called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset, > >>> now all backends work the same as far as I know., which was (and is) > >>> the way devices using the platform IOMMU works. > >>> > >>> The difference in behavior did not matter as QEMU unmaps all the > >>> memory unregistering the memory listener at vhost_vdpa_dev_start(..., > >>> started = false), > >> Exactly. It's not just QEMU, but any (older) userspace manipulates > >> mappings through the vhost-vdpa iotlb interface has to unmap all > >> mappings to workaround the vdpa parent driver bug. > > Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI. > > > >> If they don't do > >> explicit unmap, it would cause state inconsistency between vhost-vdpa > >> and parent driver, then old mappings can't be restored, and new mapping > >> can be added to iotlb after vDPA reset. There's no point to preserve > >> this broken and inconsistent behavior between vhost-vdpa and parent > >> driver, as userspace doesn't care at all! > > It's a userspace notice change so we can't fix it silently: > > > > https://lkml.org/lkml/2012/12/23/75 > > > > Another example which is related to vhost-vDPA: > > > > https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/ > > > > Thanks > > > >>> but the backend acknowledging this feature flag > >>> allows QEMU to make sure it is safe to skip this unmap & map in the > >>> case of vhost stop & start cycle. > >>> > >>> In that sense, this feature flag is actually a signal for userspace to > >>> know that the bug has been solved. > >> Right, I couldn't say it better than you do, thanks! The feature flag is > >> more of an unusual means to indicating kernel bug having been fixed, > >> rather than introduce a new feature or new kernel behavior ending up in > >> change of userspace's expectation. > >> > >>> Not offering it indicates that > >>> userspace cannot trust the kernel will retain the maps. > >>> > >>> Si-Wei or Dragos, please correct me if I've missed something. Feel > >>> free to use the text in case you find more clear in doc or patch log. > >> Sure, will do, thank you! Will post v2 adding these to the log. > >> > >> Thanks, > >> -Siwei > >> > >> > >> > >>> Thanks! > >>> > >>>> Thanks > >>>> > >>>>> I think > >>>>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100% > >>>>> certainty of persistent iotlb mapping not getting lost across vdpa reset. > >>>>> > >>>>> Thanks, > >>>>> -Siwei > >>>>> > >>>>> [1] > >>>>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/ > >>>>> [2] > >>>>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/ > >>>>>> Otherwise > >>>>>> we may break old userspace. > >>>>>> > >>>>>> Thanks > >>>>>> > >>>>>>> + vhost_vdpa_reset_map(v, asid); > >>>>>>> kfree(as); > >>>>>>> > >>>>>>> return 0; > >>>>>>> -- > >>>>>>> 1.8.3.1 > >>>>>>> > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization