On Fri, Oct 20, 2023 at 6:28 AM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > > > > On 10/19/2023 7:39 AM, Eugenio Perez Martin wrote: > > On Thu, Oct 19, 2023 at 10:27 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > >> On Thu, Oct 19, 2023 at 2:47 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>> > >>> > >>> On 10/18/2023 7:53 PM, Jason Wang wrote: > >>>> On Wed, Oct 18, 2023 at 4:49 PM Si-Wei Liu <si-wei.liu@xxxxxxxxxx> wrote: > >>>>> > >>>>> On 10/18/2023 12:00 AM, Jason Wang wrote: > >>>>>>> Unfortunately, it's a must to stick to ABI. I agree it's a mess but we > >>>>>>> don't have a better choice. Or we can fail the probe if userspace > >>>>>>> doesn't ack this feature. > >>>>>> Antoher idea we can just do the following in vhost_vdpa reset? > >>>>>> > >>>>>> config->reset() > >>>>>> if (IOTLB_PERSIST is not set) { > >>>>>> config->reset_map() > >>>>>> } > >>>>>> > >>>>>> Then we don't have the burden to maintain them in the parent? > >>>>>> > >>>>>> Thanks > >>>>> Please see my earlier response in the other email, thanks. > >>>>> > >>>>> ----------------%<----------------%<---------------- > >>>>> > >>>>> First, the ideal fix would be to leave this reset_vendor_mappings() > >>>>> emulation code on the individual driver itself, which already has the > >>>>> broken behavior. > >>>> So the point is, not about whether the existing behavior is "broken" > >>>> or not. > >>> Hold on, I thought earlier we all agreed upon that the existing behavior > >>> of vendor driver self-clearing maps during .reset violates the vhost > >>> iotlb abstraction and also breaks the .set_map/.dma_map API. This is > >>> 100% buggy driver implementation itself that we should discourage or > >>> eliminate as much as possible (that's part of the goal for this series), > >> I'm not saying it's not an issue, what I'm saying is, if the fix > >> breaks another userspace, it's a new bug in the kernel. See what Linus > >> said in [1] > >> > >> "If a change results in user programs breaking, it's a bug in the kernel." > >> > >>> but here you seem to go existentialism and suggests the very opposite > >>> that every .set_map/.dma_map driver implementation, regardless being the > >>> current or the new/upcoming, should unconditionally try to emulate the > >>> broken reset behavior for the sake of not breaking older userspace. > >> Such "emulation" is not done at the parent level. New parents just > >> need to implement reset_map() or not. everything could be done inside > >> vhost-vDPA as pseudo code that is shown above. > >> > >>> Set > >>> aside the criteria and definition for how userspace can be broken, can > >>> we step back to the original question why we think it's broken, and what > >>> we can do to promote good driver implementation instead of discuss the > >>> implementation details? > >> I'm not sure I get the point of this question. I'm not saying we don't > >> need to fix, what I am saying is that such a fix must be done in a > >> negotiable way. And it's better if parents won't get any burden. It > >> can just decide to implement reset_map() or not. > >> > >>> Reading the below response I found my major > >>> points are not heard even if written for quite a few times. > >> I try my best to not ignore any important things, but I can't promise > >> I will not miss any. I hope the above clarifies my points. > >> > >>> It's not > >>> that I don't understand the importance of not breaking old userspace, I > >>> appreciate your questions and extra patience, however I do feel the > >>> "broken" part is very relevant to our discussion here. > >>> If it's broken (in the sense of vhost IOTLB API) that you agree, I think > >>> we should at least allow good driver implementations; and when you think > >>> about the possibility of those valid good driver cases > >>> (.set_map/.dma_map implementations that do not clear maps in .reset), > >>> you might be able to see why it's coded the way as it is now. > >>> > >>>> It's about whether we could stick to the old behaviour without > >>>> too much cost. And I believe we could. > >>>> > >>>> And just to clarify here, reset_vendor_mappings() = config->reset_map() > >>>> > >>>>> But today there's no backend feature negotiation > >>>>> between vhost-vdpa and the parent driver. Do we want to send down the > >>>>> acked_backend_features to parent drivers? > >>>> There's no need to do that with the above code, or anything I missed here? > >>>> > >>>> config->reset() > >>>> if (IOTLB_PERSIST is not set) { > >>>> config->reset_map() > >>>> } > >>> Implementation issue: this implies reset_map() has to be there for every > >>> .set_map implementations, but vendor driver implementation for custom > >>> IOMMU could well implement DMA ops by itself instead of .reset_map. This > >>> won't work for every set_map driver (think about the vduse case). > >> Well let me do it once again, reset_map() is not mandated: > >> > >> config->reset() > >> if (IOTLB_PERSIST is not set) { > >> if (config->reset_map) > >> config->reset_map() > > To avoid new parent drivers > I am afraid it's not just new parent drivers, but any well behaved > driver today may well break userspace if go with this forced emulation > code, if they have to implement reset_map for some reason (e.g. restored > to 1:1 passthrough mapping or other default state in mapping). For new > userspace and user driver we can guard against it using the > IOTLB_PERSIST flag, but the above code would get a big chance to break > setup with good driver and older userspace in practice. > > And .reset_map implementation doesn't necessarily need to clear maps. > For e.g. IOMMU API compliant driver that only needs simple DMA model for > passthrough, all .reset_map has to do is toggle to 1:1 mapping mode to > the default/initial state without taking care of maps, as > vhost_vdpa_unmap(0, -1ULL) earlier should have done the map cleaning job > already. Ok, finally, it takes me a while to understand the issue :) Actually, there are two things: 1) stick the IOTLB mappings across the reset 2) reset the vendor specific mappings to whatever the parent think it's comfort (like 1:1) So I think my suggestion doesn't work. > > > > to have this behavior if they need to > > implement reset_map, > > > > What if we add a new callback like "config->buggy_virtio_reset_map", > > different from regular reset_map callback at cleanup? > Right, separating out the need for old behavior emulation from > .reset_map is much cleaner, and this is what individual broken driver > has to maintain without penalizing other good drivers. Good to see what > I said earlier is heard. > > > Only mlx5 and > > vdpa_sim need to implement it, with a big warning, and new parent > > drivers can trust they'll never have the old bad behavior. > Let's see what Jason will say about it and try to converge on this > first, I think he seemed to imply that this is part of ABI that every > driver has to make compromise for. I'd better get it ack'ed before > proceeding to the rest. Thanks for your patience. I think we have some choices: (All of the below can work, but we need to choose the best) 1) module parameter: this turns out to be hard for the management as it requires the subtle knowledge of a specific user space which turns out to be hard 2) buggy_virtio_reset_map: seems like somehow a pollution of the config ops, I think we can do this only if we have other choice 3) set_backend_features: I understand the concern that we should not propagate the vhost level feature to parent, the reason is most of them are irrelevant to the parent. I think the right way is to introduce get_parent_features()/set_parent_features() then we can choose to map some parent feature to vhost like (ENALBE_AFTER_DRIVER_OK and IOTLB_PERSIST) 4) piggy backing whether we need to clean vendor specific IOTLB in config->reset(bool clean_map) Siwei, Eugenio, what's your opinion here? Thanks > > Thanks, > -Siwei > > > > >> } > >> > >> Did you see any issue with VDUSE in this case? > >> > >>> But this is not the the point I was making. I think if you agree this is > >>> purely buggy driver implementation of its own, we should try to isolate > >>> this buggy behavior to individual driver rather than overload vhost-vdpa > >>> or vdpa core's role to help implement the emulation of broken driver > >>> behavior. > >> As I pointed out, if it is not noticeable in the userspace, that's > >> fine but it's not. > >> > >>> I don't get why .reset is special here, the abuse of .reset to > >>> manipulate mapping could also happen in other IOMMU unrelated driver > >>> entries like in .suspend, or in queue_reset. > >> Who can abuse reset here? It is totally under the control of > >> vhost-vDPA and it's not visible to uAPI. And we can fully control the > >> behaviour of vhost-vDPA. > >> > >>> If someday userspace is > >>> found coded around similar buggy driver implementation in other driver > >>> ops, do we want to follow and duplicate the same emulation in vdpa core > >>> as the precedent is already set here around .reset? > >> I think so, have you seen the links I give you? If you want to go > >> through the one from Linus thread[1], you can see the one that unbreak > >> virtio-IOMMU[2]: > >> > >> 1) Someday, we spot invalidate with size 0 is a bug > >> 2) We fix this bug by not allowing this > >> 3) But virtio-IOMMU userspace find that size 0 actually clean all the > >> IOTLB so it depends on the behaviour > >> 4) So the virtio-IOMMU userspace find it can't work after 2) > >> 5) Then we recover the behaviour before 2) via [2] > >> > >> Another example is the IOTLB_MSG_V2, V1 suffers from in-stable ABI in > >> 32bit archs, most of the userspace survives since it never runs on > >> 32bit archs. The fix is to introduce a V2 but we will stick to V1 by > >> default if V2 is not acknowledged by the userspace. > >> > >> I think the above 2 examples are sufficient for us to understand the > >> case. If not, I can help to clarify more since I'm involved in those 2 > >> fixes. > >> > >>> The buggy driver can fail in a lot of other ways indefinitely during > >>> reset, if there's a buggy driver that's already broken the way as how it > >>> is and happens to survive with all userspace apps, we just don't care > >>> and let it be. > >> Without IOTLB_PRESIST it doesn't break. With IOTLB_PERSIST and if the > >> reset_map() is done unconditionally, it can break. That's my point. > >> > >>> There's no way we can enumerate all those buggy behaviors > >>> in .reset_map itself, it's overloading that driver API too much. > >> If it is not noticeable by userspace, we can do any fix at will. But > >> it is not, we don't have another choice. Especially considering the > >> cost is rather low. > >> > >>>>> Second, IOTLB_PERSIST is needed but not sufficient. Due to lack of > >>>>> backend feature negotiation in parent driver, if vhost-vdpa has to > >>>>> provide the old-behaviour emulation for compatibility on driver's > >>>>> behalf, it needs to be done per-driver basis. There could be good > >>>>> on-chip or vendor IOMMU implementation which doesn't clear the IOTLB in > >>>>> .reset, and vendor specific IOMMU doesn't have to provide .reset_map, > >>>> Then we just don't offer IOTLB_PRESIST, isn't this by design? > >>> Think about the vduse case, it can work with DMA ops directly so doesn't > >>> have to implement .reset_map, unless for some specific good reason. > >>> Because it's a conforming and valid/good driver implementation, we may > >>> still allow it to advertise IOTLB_PERSIST to userspace. > >> I would like to know why this can't work in this case: > >> > >> config->reset() > >> if (IOTLB_PERSIST is not set) { > >> if (config->reset_map) > >> config->reset_map() > >> } > >> > >>> Which belongs to > >>> the 3rd bullet below: > >>> > >>> https://lore.kernel.org/virtualization/1696928580-7520-4-git-send-email-si-wei.liu@xxxxxxxxxx/ > >>> > >>> There are 3 cases that backend may claim this feature bit on: > >>> > >>> - parent device that has to work with platform IOMMU > >>> - parent device with on-chip IOMMU that has the expected > >>> .reset_map support in driver > >>> - parent device with vendor specific IOMMU implementation > >>> that explicitly declares the specific backend feature > >>> > >>>>> we > >>>>> should allow these good driver implementations rather than > >>>>> unconditionally stick to some specific problematic behavior for every > >>>>> other good driver. > >>>> Then you can force reset_map() with set_map() that is what I suggest > >>>> in another thread, no? > >>> This is exactly what I was afraid of that broken behavior emulation may > >>> become a dangerous slippery slope - in principle we should encourage > >>> good driver implementation, as they can work totally fine with older > >>> userspace. Why do they have to bother emulating broken behavior just > >>> because some other driver's misbehaving? > >> Please read the link [1], Linus has explained it. > >> > >>> And what's the boundary for > >>> this hack, do drivers backed by platform IOMMU even have to emulate (if > >>> not why not, and is there substantial difference in between)? > >> The boundary is whether the behaviour change could be noticed but > >> userspace. And I've shown you it's not a burden with the pseudo codes. > >> If not, please explain why. > >> > >>> After > >>> getting through all of this, do you still believe everything is just as > >>> easy and simple as what thought to be? > >> The truth is that bugs exist everywhere. We can't promise there's no > >> bug when developing an uAPI or subsystem. For kernel code, the bug > >> that touches uAPI might be fixed in a way that doesn't break existing > >> userspace. If you look at how downstream to maintain kABI, you will be > >> supersized furtherly. > >> > >>> Btw, I thought I was expecting but still haven't got the clear answers > >>> to what was the goal to do all this, we spent a lot of time trying to > >>> unbreak userspace, > >> The code is pretty simple. But yes, the time spent on justifying it > >> might take some time. That's the community. People need time to > >> understand each other's points. > >> > >>> but looks to me as if we were trying every possible > >>> way to break userspace > >> How could my suggestions break a userspace? > >> > >>> or try to approximate to the same brokenness > >>> mlx5_vdpa may have caused to the userspace. What we will get eventually > >>> from these lengthy discussions? > >> Siwei, I'd really suggest you read the link I gave you. You may get > >> the answer. What's more, It doesn't cost too much then we know for > >> sure there would not be any issue, why not choose the hard way? > >> > >>> On the other hand, if you think it from > >>> vhost-vdpa user perspective, you'll clearly see there's just a couple of > >>> ways to unbreak userspace from the internal broken map which is out of > >>> sync with vhost-vdpa iotlb after device reset. > >> Patches are more than welcomed. > >> > >>> If this brokenness was > >>> something universally done from the vhost-vdpa layer itself, I'd feel > >>> it's more of a shared problem, but this is not the case I see it here. > >>> While the long standing mlx5_vdpa/vdpa_sim issue is 100% misuse of > >>> .reset op in a wrong way per IOMMU API definition. Why leaving this > >>> discrepancy to the individual driver is not even an option, I'm still > >>> not sure? > >> Sorry? I start with a switch in the driver, and then I try to avoid > >> that. And it seems you don't want a burden on the driver as well. > >> Where did you see I say we can't do that in the driver? What I > >> disagree with is to use a module parameter. > >> > >> Even if I fail, it doesn't mean we can't do that in the driver code. > >> If you read the link[1] you can see the offending commit is a change > >> in uvcvideo driver. > >> > >> Thanks > >> > >>> > >>> Thanks, > >>> -Siwei > >>> > >>>>> Then we need a set of device flags (backend_features > >>>>> bit again?) to indicate the specific driver needs upper layer's help on > >>>>> old-behaviour emulation. > >>>>> > >>>>> Last but not least, I'm not sure how to properly emulate > >>>>> reset_vendor_mappings() from vhost-vdpa layer. If a vendor driver has no > >>>>> .reset_map op implemented, or if .reset_map has a slightly different > >>>>> implementation than what it used to reset the iotlb in the .reset op, > >>>> See above, for reset_vendor_mappings() I meant config->reset_map() exactly. > >>>> > >>>> Thanks > >>>> > >>>>> then this either becomes effectively dead code if no one ends up using, > >>>>> or the vhost-vdpa emulation is helpless and limited in scope, unable to > >>>>> cover all the cases. > >>>>> > >>>>> ----------------%<----------------%<---------------- > >>>>> > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization