Re: [PATCH] arm64: mm: drop tlb flush operation when clearing the access bit

Alistair Popple <apopple@xxxxxxxxxx> · Thu, 26 Oct 2023 10:32:28 +1100

Yu Zhao <yuzhao@xxxxxxxxxx> writes:

> On Wed, Oct 25, 2023 at 4:16 AM Alistair Popple <apopple@xxxxxxxxxx> wrote:
>> >> > >> Now consider the case where some external device is accessing mappings
>> >> > >> via the SMMU. The access flag will be cached in the SMMU TLB. If we
>> >> > >> clear the access flag without a TLB invalidate the access flag in the
>> >> > >> CPU page table will not get updated because it's already set in the SMMU
>> >> > >> TLB.
>> >> > >> As an aside access flag updates happen in one of two ways. If the
>> >> > >> SMMU
>> >> > >> HW supports hardware translation table updates (HTTU) then hardware will
>> >> > >> manage updating access/dirty flags as required. If this is not supported
>> >> > >> then SW is relied on to update these flags which in practice means
>> >> > >> taking a minor fault. But I don't think that is relevant here - in
>> >> > >> either case without a TLB invalidate neither of those things will
>> >> > >> happen.
>> >> > >> I suppose drivers could implement the clear_flush_young() MMU
>> >> > >> notifier
>> >> > >> callback (none do at the moment AFAICT) but then won't that just lead to
>> >> > >> the opposite problem - that every page ever used by an external device
>> >> > >> remains active and unavailable for reclaim because the access flag never
>> >> > >> gets cleared? I suppose they could do the flush then which would ensure
>> >> > >
>> >> > > Yes, I think so too. The reason there is currently no problem, perhaps
>> >> > > I think, there are no actual use cases at the moment? At least on our
>> >> > > Alibaba's fleet, SMMU and MMU do not share page tables now.
>> >> >
>> >> > We have systems that do.
>> >>
>> >> Just curious: do those systems run the Linux kernel? If so, are pages
>> >> shared with SMMU pinned? If not, then how are IO PFs handled after
>> >> pages are reclaimed?
>>
>> Yes, these systems all run Linux. Pages shared with SMMU aren't pinned
>> and fault handling works as Barry notes below - a driver is notified of
>> a fault and calls handle_mm_fault() in response.
>>
>> > it will call handle_mm_fault(vma, prm->addr, fault_flags, NULL); in
>> > I/O PF, so finally
>> > it runs the same codes to get page back just like CPU's PF.
>> >
>> > years ago, we recommended a pin solution, but obviously there were lots of
>> > push backs:
>> > https://lore.kernel.org/linux-mm/1612685884-19514-1-git-send-email-wangzhou1@xxxxxxxxxxxxx/
>>
>> Right. Having to pin pages defeats the whole point of having hardware
>> that can handle page faults.
>
> Thanks. How would a DMA transaction be retried after the kernel
> resolves an IO PF? I.e., does the h/w (PCIe spec, etc.) support auto
> retries or is the s/w responsible for doing so? At least when I worked
> on the PCI subsystem, I didn't know any device that was capable of
> doing auto retries. (Pasha and I will have a talk on IOMMU at the
> coming LPC, so this might be an interesting intersection between IOMMU
> and MM to discuss.)

Generally what happens if a device encounters a page fault is that it
notifies the kernel or driver (eg. via an interrupt) that it has
faulted. It is then up to SW to resolve the fault and tell HW to retry
the translation request once SW thinks the fault is resolved. I'm not
aware of HW that does automatic retries (although I'm a little unclear
what exactly is meant by automatic retry).

In the case of an IOMMU faulting (eg. SMMU on ARM) on a DMA access I
believe it stalls the transaction and SW is responsible for processing
the fault and signalling that the translation should be retried.

It's also possible for the device itself to detect a fault prior to
issuing a DMA request if it's using something like PCIe page request
services. Note my experience with this is more with non-PCIe devices
that are coherently attached, but the concepts are all much the same as
they all channel through the same IOMMU.

Unfortunately it doesn't look I will be at LPC this year otherwise it
would have been good to discuss. Happy to continue the discussion here
or via some other channel though. Hopefully I will be able to see your
talk online.