On Fri, Aug 30, 2024 at 02:04:26PM -0300, Jason Gunthorpe wrote: > On Fri, Aug 30, 2024 at 04:09:04PM +0000, Mostafa Saleh wrote: > > Hi Jason, > > > > On Tue, Aug 27, 2024 at 12:51:38PM -0300, Jason Gunthorpe wrote: > > > For SMMUv3 a IOMMU_DOMAIN_NESTED is composed of a S2 iommu_domain acting > > > as the parent and a user provided STE fragment that defines the CD table > > > and related data with addresses translated by the S2 iommu_domain. > > > > > > The kernel only permits userspace to control certain allowed bits of the > > > STE that are safe for user/guest control. > > > > > > IOTLB maintenance is a bit subtle here, the S1 implicitly includes the S2 > > > translation, but there is no way of knowing which S1 entries refer to a > > > range of S2. > > > > > > For the IOTLB we follow ARM's guidance and issue a CMDQ_OP_TLBI_NH_ALL to > > > flush all ASIDs from the VMID after flushing the S2 on any change to the > > > S2. > > > > > > Similarly we have to flush the entire ATC if the S2 is changed. > > > > > > > I am still reviewing this patch, but just some quick questions. > > > > 1) How does userspace do IOTLB maintenance for S1 in that case? > > See > > https://lore.kernel.org/linux-iommu/cover.1724776335.git.nicolinc@xxxxxxxxxx > > Patch 17 Thanks, I had this series on my radar, I will check it by this week. > > Really, this series and that series must be together. We have a patch > planning issue to sort out here as well, all 27 should go together > into the same merge window. > > > 2) Is there a reason the UAPI is designed this way? > > The way I imagined this, is that userspace will pass the pointer to the CD > > (+ format) not the STE (or part of it). > > Yes, we need more information from the STE than just that. EATS and > STALL for instance. And the cachability below. Who knows what else in > the future. But for example if that was extended later, how can user space know which fields are allowed and which are not? > > We also want to support the V=0, Bypass and Abort STE configurations > under the nesting domain (V, CFG required) so that the VIOMMU can > remain affiliated with the STE in all cases. This is necessary to > allow VIOMMU event reporting to always work. > > Looking at the masks: > > STRTAB_STE_0_NESTING_ALLOWED = 0xf80fffffffffffff > STRTAB_STE_1_NESTING_ALLOWED = 0x380000ff > > So we do use alot of the bits. Reformatting from the native HW format > into something else doesn't seem better for VMM or kernel.. > > This is similar to the invalidation design where we also just forward > the invalidation command as is in native HW format, and how IDR is > done the same. > > Overall this sort of direct transparency is how I prefer to see these > kinds of iommufd HW specific interfaces designed. From a lot of > experience here, arbitary marshall/unmarshall is often an > antipattern :) Is there any documentation for the (proposed) SMMUv3 UAPI for IOMMUFD? I can understand reading IDRs from userspace (with some sanitation), but adding some more logic to map vSTE to STE needs more care of what kind of semantics are provided. Also, I am working on similar interface for pKVM where we “paravirtualize” the SMMU access for guests, it’s different semantics, but I hope we can align that with IOMMUFD (but it’s nowhere near upstream now) I see you are talking in LPC about IOMMUFD: https://lore.kernel.org/linux-iommu/0-v1-01fa10580981+1d-iommu_pt_jgg@xxxxxxxxxx/T/#m2dbb08f3bf8506a492bc7dda2de662e42371e683 Do you have any plans to talk about this also? Thanks, Mostafa > > > Making user space messing with shareability and cacheability of S1 CD access > > feels odd. (Although CD configure page table access which is similar). > > As I understand it, the walk of the CD table will be constrained by > the S2FWB, just like all the other accesses by the guest. > > So we just take a consistent approach of allowing the guest to provide > memattrs in the vSTE, CD, and S1 page table and rely on the HW's S2FWB > to police it. > > As you say there are lots of memattr type bits under direct guest > control, it doesn't necessarily make alot of sense to permit > everything in those contexts and then add extra code to do something > different here. > > Though I agree it looks odd, it is self-consistent. > > Jason