Hi Zhou, On Fri, Feb 26, 2021 at 05:43:27PM +0800, Zhou Wang wrote: > On 2021/2/1 19:14, Jean-Philippe Brucker wrote: > > Hi Zhou, > > > > On Mon, Feb 01, 2021 at 09:18:42AM +0800, Zhou Wang wrote: > >>> @@ -1033,8 +1076,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, > >>> FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | > >>> CTXDESC_CD_0_V; > >>> > >>> - /* STALL_MODEL==0b10 && CD.S==0 is ILLEGAL */ > >>> - if (smmu->features & ARM_SMMU_FEAT_STALL_FORCE) > >>> + if (smmu_domain->stall_enabled) > >> > >> Could we add ssid checking here? like: if (smmu_domain->stall_enabled && ssid). > >> The reason is if not CD.S will also be set when ssid is 0, which is not needed. > > > > Some drivers may want to get stall events on SSID 0: > > https://lore.kernel.org/kvm/20210125090402.1429-1-lushenming@xxxxxxxxxx/#t > > > > Are you seeing an issue with stall events on ssid 0? Normally there > > shouldn't be any fault on this context, but if they happen and no handler > > is registered, the SMMU driver will just abort them and report them like a > > non-stall event. > > Hi Jean, > > I notice that there is problem. In my case, I expect that CD0 is for kernel > and other CDs are for user space. Normally there shouldn't be any fault in > kernel, however, we have RAS case which is for some reason there may has > invalid address access from hardware device. > > So at least there are two different address access failures: 1. hardware RAS problem; > 2. software fault fail(e.g. kill process when doing DMA). Handlings for these > two are different: for 1, we should reset hardware device; for 2, stop related > DMA is enough. Right, and in case 2 there should be no report printed since it can be triggered by user, while you probably want to be loud in case 1. > Currently if SMMU returns the same signal(by SMMU resume abort), master device > driver can not tell these two kinds of cases. This part I don't understand. So the SMMU sends a RESUME(abort) command, and then the master reports the DMA error to the device driver, which cannot differentiate 1 from 2? (I guess there is no SSID in this report?) But how does disabling stall change this? The invalid DMA access will still be aborted by the SMMU. Hypothetically, would it work if all stall events that could not be handled went to the device driver? Those reports would contain the SSID (or lack thereof), so you could reset the device in case 1 and ignore case 2. Though resetting the device in the middle of a stalled transaction probably comes with its own set of problems. > From the basic concept, if a CD is used for kernel, its S bit should not be set. > How about we add iommu domain check here too, if DMA domain we do not set S bit for > CD0, if unmanaged domain we set S bit for all CDs? I think disabling stall for CD0 of a DMA domain makes sense in general, even though I don't really understand how that fixes your issue. But someone might come up with a good use-case for receiving stall events on DMA mappings, so I'm wondering whether the alternative solution where we report unhandled stall events to the device driver would also work for you. Thanks, Jean