Re: [BUG] ARM64 KVM: Data abort executing post-indexed LDR on MMIO address

Marc Zyngier <maz@xxxxxxxxxx> · Sun, 06 Oct 2024 11:28:23 +0100

On Sun, 06 Oct 2024 08:59:56 +0100,
Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx> wrote:
> 
> Hello Marc,
> 
> On 05.10.24 23:35, Marc Zyngier wrote:
> > On Sat, 05 Oct 2024 19:38:23 +0100,
> > Ahmad Fatoum <a.fatoum@xxxxxxxxxxxxxx> wrote:
> >>> IIRC, the QEMU flash is implemented as a read-only memslot. A data
> >>> cache invalidation is a *write*, as it can be (and is) upgraded to a
> >>> clean+invalidate by the HW.
> >>
> >> So it's a write, even if there are no dirty cache lines?
> > 
> > Yes.
> > 
> > At the point where this is handled, the CPU has no clue about the
> > dirty state of an arbitrary cache line, at an arbitrary cache level.
> > The CPU simply forward the CMO downstream, and the permission check
> > happens way before that.
> 
> I see. When I added that invalidation, it was suggested[1] that instead
> of invalidation after the remapping, I would clean the range before
> remapping. Cleaning the zero page triggered a data abort, which I didn't
> understand as there should be no dirty lines, but now I get it.
> 
> One more question: This upgrading of DC IVAC to DC CIVAC is because
> the code is run under virtualization, right?

Not necessarily. Virtualisation mandates the upgrade, but CIVAC is
also a perfectly valid implementation of both IVAC and CVAC.  And it
isn't uncommon that CPUs implement everything the same way.

This is why, if you dare looking at the definition of ESR_ELx for a
data abort, you will notice that ESR_ELx.WnR (which indicates whether
the abort was triggered by a read or a write) is always 1 (denoting a
write) on a CMO, irrespective of what CMO triggered it.

Additionally, FEAT_CMOW requires both read and write permissions to
not trigger faults on CMOs, but that's a bit orthogonal to your
problem.

> 
> [1]: https://lore.barebox.org/barebox/9809c04c-58c5-6888-2b14-cf17fa7c4b1a@xxxxxxxxxxxxxx/
> 
> >> The routine[1] that does this remapping invalidates the virtual address range,
> >> because the attributes may change[2]. This invalidate also happens for cfi-flash,
> >> but we should never encounter dirty cache lines there as the remap is done
> >> before driver probe.
> >>
> >> Can you advise what should be done differently?
> > 
> > If you always map the flash as Device memory, there is no need for
> > CMOs. Same thing if you map it as NC.
> 
> Everything, except for RAM, is mapped Device-nGnRnE.

Right, no problem then.

> 
> > And even if you did map it as
> > Cacheable, it wouldn't matter. KVM already handles coherency when the
> > flash is switching between memory-mapped and programming mode, as the
> > physical address space changes (the flash literally drops from the
> > memory map).
> > 
> > In general, issuing CMOs to a device is a bizarre concept, because it
> > is pretty rare that a device can handle a full cache-line as
> > write-back. Devices tend to handle smaller, register-sized accesses,
> > not a full 64-byte eviction.
> 
> The same remap_range function is used to:
> 
>   - remap normal memory:
>     - Mapping memory regions uncached for memory test
>     - Mapping memory buffers coherent or write-combine
>   - remap ROMs: BootROMs at address zero may be hidden behind the NULL
>     page and need to be mapped differently when calling/peeking into it.
>   - remap device memory, e.g. in the case of the cfi-flash here
> 
> The invalidation is done unconditionally for all of them, although it
> makes only the sense in the first case.

Indeed.

> 
> > Now, I'm still wondering whether we should simply forward the CMO to
> > userspace as we do for other writes, and let the VMM deal with it. The
> > main issue is that there is no current way to describe this.
> > 
> > The alternative would be to silently handle the trap and pretend it
> > never occurred, as we do for other bizarre behaviours. But that'd be
> > something only new kernels would support, and I guess you'd like your
> > guest to work today, not tomorrow.
> 
> I think following fix on the barebox side may work:
> 
>   - Walk all pages about to be remapped
>   - Execute the AT instruction on the page's base address

Why do you need AT if you are walking the PTs? If you walk, you
already have access to the memory attributes. In general, AT can be
slower than an actual walk.

Or did you actually mean iterating over the VA range? Even in that
case, AT can be a bad idea, as you are likely to iterate in page-size
increments even if you have a block mapping. Walking the PTs tells you
immediately how much a leaf is mapping (assuming you don't have any
other tracking).

>   - Only if the page was previously mapped cacheable, clean + invalidate
>     the cache
>   - Remove the current cache invalidation after remap
> 
> Does that sound sensible?

This looks reasonable (apart from the AT thingy).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.