On Tue, Oct 17, 2023 at 4:30 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > > > On 10/16/2023 4:28 AM, Eugenio Perez Martin wrote: > > On Mon, Oct 16, 2023 at 8:33 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > >> On Fri, Oct 13, 2023 at 3:36 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>> > >>> > >>> On 10/12/2023 8:01 PM, Jason Wang wrote: > >>>> On Tue, Oct 10, 2023 at 5:05 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>>>> Devices with on-chip IOMMU or vendor specific IOTLB implementation > >>>>> may need to restore iotlb mapping to the initial or default state > >>>>> using the .reset_map op, as it's desirable for some parent devices > >>>>> to solely manipulate mappings by its own, independent of virtio device > >>>>> state. For instance, device reset does not cause mapping go away on > >>>>> such IOTLB model in need of persistent mapping. Before vhost-vdpa > >>>>> is going away, give them a chance to reset iotlb back to the initial > >>>>> state in vhost_vdpa_cleanup(). > >>>>> > >>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@xxxxxxxxxx> > >>>>> --- > >>>>> drivers/vhost/vdpa.c | 16 ++++++++++++++++ > >>>>> 1 file changed, 16 insertions(+) > >>>>> > >>>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > >>>>> index 851535f..a3f8160 100644 > >>>>> --- a/drivers/vhost/vdpa.c > >>>>> +++ b/drivers/vhost/vdpa.c > >>>>> @@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v, > >>>>> return vhost_vdpa_alloc_as(v, asid); > >>>>> } > >>>>> > >>>>> +static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid) > >>>>> +{ > >>>>> + struct vdpa_device *vdpa = v->vdpa; > >>>>> + const struct vdpa_config_ops *ops = vdpa->config; > >>>>> + > >>>>> + if (ops->reset_map) > >>>>> + ops->reset_map(vdpa, asid); > >>>>> +} > >>>>> + > >>>>> static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>> { > >>>>> struct vhost_vdpa_as *as = asid_to_as(v, asid); > >>>>> @@ -140,6 +149,13 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) > >>>>> > >>>>> hlist_del(&as->hash_link); > >>>>> vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid); > >>>>> + /* > >>>>> + * Devices with vendor specific IOMMU may need to restore > >>>>> + * iotlb to the initial or default state which is not done > >>>>> + * through device reset, as the IOTLB mapping manipulation > >>>>> + * could be decoupled from the virtio device life cycle. > >>>>> + */ > >>>> Should we do this according to whether IOTLB_PRESIST is set? > >>> Well, in theory this seems like so but it's unnecessary code change > >>> actually, as that is the way how vDPA parent behind platform IOMMU works > >>> today, and userspace doesn't break as of today. :) > >> Well, this is one question I've ever asked before. You have explained > >> that one of the reason that we don't break userspace is that they may > >> couple IOTLB reset with vDPA reset as well. One example is the Qemu. > >> > >>> As explained in previous threads [1][2], when IOTLB_PERSIST is not set > >>> it doesn't necessarily mean the iotlb will definitely be destroyed > >>> across reset (think about the platform IOMMU case), so userspace today > >>> is already tolerating enough with either good or bad IOMMU. I'm confused, how to define tolerating here? For example, if it has tolerance, why bother? > >>This code of > >>> not checking IOTLB_PERSIST being set is intentional, there's no point to > >>> emulate bad IOMMU behavior even for older userspace (with improper > >>> emulation to be done it would result in even worse performance). I can easily imagine a case: The old Qemu that works only with a setup like mlx5_vdpa. If we do this without a negotiation, IOTLB will not be clear but the Qemu will try to re-program the IOTLB after reset. Which will break? 1) stick the exact old behaviour with just one line of check 2) audit all the possible cases to avoid a one line of code 1) seems much easier than 2) > >> For two reasons: > >> > >> 1) backend features need acked by userspace this is by design > >> 2) keep the odd behaviour seems to be more safe as we can't audit > >> every userspace program > >> > > The old behavior (without flag ack) cannot be trusted already, as: Possibly but the point is to unbreak userspace no matter how weird the behaviour we've ever had. > > * Devices using platform IOMMU (in other words, not implementing > > neither .set_map nor .dma_map) does not unmap memory at virtio reset. > > * Devices that implement .set_map or .dma_map (vdpa_sim, mlx5) do > > reset IOTLB, but in their parent ops (vdpasim_do_reset, prune_iotlb > > called from mlx5_vdpa_reset). With vdpa_sim patch removing the reset, > > now all backends work the same as far as I know., which was (and is) > > the way devices using the platform IOMMU works. > > > > The difference in behavior did not matter as QEMU unmaps all the > > memory unregistering the memory listener at vhost_vdpa_dev_start(..., > > started = false), > Exactly. It's not just QEMU, but any (older) userspace manipulates > mappings through the vhost-vdpa iotlb interface has to unmap all > mappings to workaround the vdpa parent driver bug. Just to clarify, from userspace, it's the (odd) behaviour of the current uAPI. > If they don't do > explicit unmap, it would cause state inconsistency between vhost-vdpa > and parent driver, then old mappings can't be restored, and new mapping > can be added to iotlb after vDPA reset. There's no point to preserve > this broken and inconsistent behavior between vhost-vdpa and parent > driver, as userspace doesn't care at all! It's a userspace notice change so we can't fix it silently: https://lkml.org/lkml/2012/12/23/75 Another example which is related to vhost-vDPA: https://lore.kernel.org/netdev/20230927140544.205088-1-eric.auger@xxxxxxxxxx/T/ Thanks > > > but the backend acknowledging this feature flag > > allows QEMU to make sure it is safe to skip this unmap & map in the > > case of vhost stop & start cycle. > > > > In that sense, this feature flag is actually a signal for userspace to > > know that the bug has been solved. > Right, I couldn't say it better than you do, thanks! The feature flag is > more of an unusual means to indicating kernel bug having been fixed, > rather than introduce a new feature or new kernel behavior ending up in > change of userspace's expectation. > > > Not offering it indicates that > > userspace cannot trust the kernel will retain the maps. > > > > Si-Wei or Dragos, please correct me if I've missed something. Feel > > free to use the text in case you find more clear in doc or patch log. > Sure, will do, thank you! Will post v2 adding these to the log. > > Thanks, > -Siwei > > > > > > > Thanks! > > > >> Thanks > >> > >>> I think > >>> the purpose of the IOTLB_PERSIST flag is just to give userspace 100% > >>> certainty of persistent iotlb mapping not getting lost across vdpa reset. > >>> > >>> Thanks, > >>> -Siwei > >>> > >>> [1] > >>> https://lore.kernel.org/virtualization/9f118fc9-4f6f-dd67-a291-be78152e47fd@xxxxxxxxxx/ > >>> [2] > >>> https://lore.kernel.org/virtualization/3364adfd-1eb7-8bce-41f9-bfe5473f1f2e@xxxxxxxxxx/ > >>>> Otherwise > >>>> we may break old userspace. > >>>> > >>>> Thanks > >>>> > >>>>> + vhost_vdpa_reset_map(v, asid); > >>>>> kfree(as); > >>>>> > >>>>> return 0; > >>>>> -- > >>>>> 1.8.3.1 > >>>>> > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization