On Wed, Oct 18, 2023 at 1:27 PM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > On Wed, Oct 18, 2023 at 12:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > > > > > > > On 10/16/2023 7:35 PM, Jason Wang wrote: > > > On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > >> > > >> > > >> On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote: > > >>> On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > >>>> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > >>>>> > > >>>>> On 10/12/2023 8:01 PM, Jason Wang wrote: > > >>>>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > >>>>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation > > >>>>>>> may need to restore iotlb mapping to the initial or default state > > >>>>>>> using the .reset_map op, as it's desirable for some parent devices > > >>>>>>> to solely manipulate mappings by its own, independent of virtio device > > >>>>>>> state. For instance, device reset does not cause mapping go away on > > >>>>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa > > >>>>>>> is going away, give them a chance to reset iotlb back to the initial > > >>>>>>> state in vhost_vdpa_cleanup(). > > >>>>>>> > > >>>>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx> > > >>>>>>> --- > > >>>>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++ > > >>>>>>> 1 file changed, 16 insertions(+) > > >>>>>>> > > >>>>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > > >>>>>>> index 851535f..a3f8160 100644 > > >>>>>>> --- a/drivers/vhost/vdpa.c > > >>>>>>> +++ b/drivers/vhost/vdpa.c > > >>>>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v, > > >>>>>>> return vhost_vdpa_alloc_as(v, asid); > > >>>>>>> } > > >>>>>>> > > >>>>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid) > > >>>>>>> +{ > > >>>>>>> + struct vdpa_device *vdpa = v->vdpa; > > >>>>>>> + const struct vdpa_config_ops *ops = vdpa->config; > > >>>>>>> + > > >>>>>>> + if (ops->reset_map) > > >>>>>>> + ops->reset_map(vdpa, asid); > > >>>>>>> +} > > >>>>>>> + > > >>>>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > > >>>>>>> { > > >>>>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid); > > >>>>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > > >>>>>>> > > >>>>>>> hlist_del(&as->hash_link); > > >>>>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid); > > >>>>>>> + /* > > >>>>>>> + * Devices with vendor specific IOMMU may need to restore > > >>>>>>> + * iotlb to the initial or default state which is not done > > >>>>>>> + * through device reset, as the IOTLB mapping manipulation > > >>>>>>> + * could be decoupled from the virtio device life cycle. > > >>>>>>> + */ > > >>>>>> Should we do this according to whether IOTLB_PRESIST is set? > > >>>>> Well, in theory this seems like so but it's unnecessary code change > > >>>>> actually, as that is the way how vDPA parent behind platform IOMMU works > > >>>>> today, and userspace doesn't break as of today. :) > > >>>> Well, this is one question I've ever asked before. You have explained > > >>>> that one of the reason that we don't break userspace is that they may > > >>>> couple IOTLB reset with vDPA reset as well. One example is the Qemu. > > >>>> > > >>>>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set > > >>>>> it doesn't necessarily mean the iotlb will definitely be destroyed > > >>>>> across reset (think about the platform IOMMU case), so userspace today > > >>>>> is already tolerating enough with either good or bad IOMMU. > > > I'm confused, how to define tolerating here? > > > > Tolerating defined as QEMU has to proactively unmap before reset just to > > workaround the driver bug (on-chip maps out of sync), unconditionally > > for platform or on-chip. While we all know it doesn't have to do so for > > platform IOMMU, though userspace has no means to distinguish. That said, > > userspace is sacrificing reset time performance on platform IOMMU setup > > just for working around buggy implementation in the other setup. > > Ok, so what you actually mean is that userspace can tolerate the "bug" > with the performance penalty. > > > > > > > For example, if it has tolerance, why bother? > > I'm not sure I get the question. But I think userspace is compromising > > because of buggy implementation in a few drivers doesn't mean we should > > uniformly enforce such behavior for all set_map/dma_map implementations. > > This is not my point. I meant, we can fix we need a negotiation in > order to let some "buggy" old user space to survive from the changes. > > > > > > > > >>>> This code of > > >>>>> not checking IOTLB_PERSIST being set is intentional, there's no point to > > >>>>> emulate bad IOMMU behavior even for older userspace (with improper > > >>>>> emulation to be done it would result in even worse performance). > > > I can easily imagine a case: > > > > > > The old Qemu that works only with a setup like mlx5_vdpa. > > Noted, seems to me there's no such case of a userspace implementation > > that only works with mlx5_vdpa or its friends, but doesn't work with the > > others e.g. platform IOMMU, or well behaving on-chip IOMMU > > implementations. > > It's not hard to think of a case where: > > 1) the environment has mlx5_vdpa only > 2) kernel doc can't have endless details, so when developing > application, the author notice IOTLB is cleared during reset > > > The Unmap+remap trick around vdpa reset works totally > > fine for platform IOMMU, except with sub-optimal performance. Other than > > this trick, I cannot easily think of other means or iotlb message > > sequence for userspace to recover the bogus state and make iotlb back to > > work again after reset. > > Yes for sure, but we can't audit every user space, no? > > > Are we talking about hypnosis that has no real > > basis to exist in the real world? > > Instead of trying to answer these hard questions, I would go another > way. That is, stick to the old behaviour when IOTLB_PRESISIT is not > set by the backend. This is much easier. > > > > > > If we do > > > this without a negotiation, IOTLB will not be clear but the Qemu will > > > try to re-program the IOTLB after reset. Which will break? > > > > > > 1) stick the exact old behaviour with just one line of check > > It's not just one line of check here, the old behavior emulation has to > > be done as Eugenio illustrated in the other email. > > For vhost-vDPA it's just > > if (IOTLB_PERSIST is acked by userspace) > reset_map() > > For parent, it's somehow similar: > > during .reset() > > if (IOTLB_PERSIST is not acked by userspace) > reset_vendor_mappings() > > Anything I missed here? > > > In addition, the > > emulation has to limit to those buggy drivers as I don't feel this > > emulation should apply uniformly to all future set_map/dma_map > > implementations. > > Unfortunately, it's a must to stick to ABI. I agree it's a mess but we > don't have a better choice. Or we can fail the probe if userspace > doesn't ack this feature. Antoher idea we can just do the following in vhost_vdpa reset? config->reset() if (IOTLB_PERSIST is not set) { config->reset_map() } Then we don't have the burden to maintain them in the parent? Thanks > > > > 2) audit all the possible cases to avoid a one line of code > > > > > > 1) seems much easier than 2) > > You see it's more than just one line of code, and I'm uncertain if the > > additional complexity is warranted or necessary, particularly if added > > this piece of compatibility code will linger for quite a long time. > > This is a must as long as it can be noticed by userspace. Doing > something conservative makes more sense to me. > > > Instead of adding hypothetical code change for no specific good reason > > and no real use case, > > It's not adding something new or new behaviours, it's just making the > IOTLB reset conditional based on vDPA reset. > > > I'd like to add the code when we find out a > > specific use case that may get impacted or already being affected, > > It doesn't conflict with what you proposed here. Old behaviours have > their users, no? > > > then > > we will have good understanding how to code up the fix and emulate > > properly for compatibility, while not affecting other good implementations. > > The issue is, even if we can't find a userspace now. It doesn't mean > we can't have one in the future. Then it might be too late or too > tricky to fix them. We had a lot of lessons in the past. > > Thanks > > > > > Thanks, > > -Siwe/i/ > > > > > > > >>>> For two reasons: > > >>>> > > >>>> 1) backend features need acked by userspace this is by design > > >>>> 2) keep the odd behaviour seems to be more safe as we can't audit > > >>>> every userspace program > > >>>> > > >>> The old behavior (without flag ack) cannot be trusted already, as: > > > Possibly but the point is to unbreak userspace no matter how weird the > > > behaviour we've ever had. > > > > > >>> * Devices using platform IOMMU (in other words, not implementing > > >>> neither .set_map nor .dma_map) does not unmap memory at virtio reset. > > >>> * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do > > >>> reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb > > >>> called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset, > > >>> now all backends work the same as far as I know., which was (and is) > > >>> the way devices using the platform IOMMU works. > > >>> > > >>> The difference in behavior did not matter as QEMU unmaps all the > > >>> memory unregistering the memory listener at vhost_vdpa_dev_start(..., > > >>> started = false), > > >> Exactly. It's not just QEMU, but any (older) userspace manipulates > > >> mappings through the vhost-vdpa iotlb interface has to unmap all > > >> mappings to workaround the vdpa parent driver bug. > > > Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI. > > > > > >> If they don't do > > >> explicit unmap, it would cause state inconsistency between vhost-vdpa > > >> and parent driver, then old mappings can't be restored, and new mapping > > >> can be added to iotlb after vDPA reset. There's no point to preserve > > >> this broken and inconsistent behavior between vhost-vdpa and parent > > >> driver, as userspace doesn't care at all! > > > It's a userspace notice change so we can't fix it silently: > > > > > > https://lkml.org/lkml/2012/12/23/75 > > > > > > Another example which is related to vhost-vDPA: > > > > > > https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/ > > > > > > Thanks > > > > > >>> but the backend acknowledging this feature flag > > >>> allows QEMU to make sure it is safe to skip this unmap & map in the > > >>> case of vhost stop & start cycle. > > >>> > > >>> In that sense, this feature flag is actually a signal for userspace to > > >>> know that the bug has been solved. > > >> Right, I couldn't say it better than you do, thanks! The feature flag is > > >> more of an unusual means to indicating kernel bug having been fixed, > > >> rather than introduce a new feature or new kernel behavior ending up in > > >> change of userspace's expectation. > > >> > > >>> Not offering it indicates that > > >>> userspace cannot trust the kernel will retain the maps. > > >>> > > >>> Si-Wei or Dragos, please correct me if I've missed something. Feel > > >>> free to use the text in case you find more clear in doc or patch log. > > >> Sure, will do, thank you! Will post v2 adding these to the log. > > >> > > >> Thanks, > > >> -Siwei > > >> > > >> > > >> > > >>> Thanks! > > >>> > > >>>> Thanks > > >>>> > > >>>>> I think > > >>>>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100% > > >>>>> certainty of persistent iotlb mapping not getting lost across vdpa reset. > > >>>>> > > >>>>> Thanks, > > >>>>> -Siwei > > >>>>> > > >>>>> [1] > > >>>>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/ > > >>>>> [2] > > >>>>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/ > > >>>>>> Otherwise > > >>>>>> we may break old userspace. > > >>>>>> > > >>>>>> Thanks > > >>>>>> > > >>>>>>> + vhost_vdpa_reset_map(v, asid); > > >>>>>>> kfree(as); > > >>>>>>> > > >>>>>>> return 0; > > >>>>>>> -- > > >>>>>>> 1.8.3.1 > > >>>>>>> > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization