Hi Alex, On Mon, 08 Mar 2021 16:53:09 +0000, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hello, > > It's not clear to me why this patch is needed. If one VCPU in the VM is generating > code, is it not the software running in the VM responsible for keeping track of > the MMU state of the other VCPUs and making sure the new code is executed > correctly? Why should KVM get involved? > > I don't see how this is different than running on bare metal (no > hypervisor), and one CPU with the MMU on generates code that another > CPU with the MMU off must execute. The difference is that so far, we have always considered i-caches to be private to each CPU. With a hypervisor that allows migration of vcpus from one physical CPU to another, the i-cache isn't private anymore from the perspective of the vcpus. > > Some comments below. > > On 3/6/21 2:15 PM, Catalin Marinas wrote: > > On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote: > >> On Fri, 05 Mar 2021 19:07:09 +0000, > >> Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > >>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote: > >>>> It recently became apparent that the ARMv8 architecture has interesting > >>>> rules regarding attributes being used when fetching instructions > >>>> if the MMU is off at Stage-1. > >>>> > >>>> In this situation, the CPU is allowed to fetch from the PoC and > >>>> allocate into the I-cache (unless the memory is mapped with > >>>> the XN attribute at Stage-2). > >>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB > >>> as well? > >> The ARM ARM doesn't seem to mention FWB at all when it comes to > >> instruction fetch, which is sort of expected as it only covers the > >> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set > >> though, as the I-side would then snoop the D-side. > > Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to > > PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I > > don't think I-cache snooping the D-cache would happen to the PoU when > > the S1 MMU is off. > > FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means > that no dcache clean is required for instruction to data coherence > (page D13-3086). I interpret that as saying that with FEAT_FWB, > CTR_EL0.IDC is effectively 1, which means that dcache clean is not > required for instruction generation, and icache invalidation is > required only if CTR_EL0.DIC = 0 (according to B2-158). > > > My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses > > are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable > > accesses may be held in the I-cache. > > Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that > instruction fetches are to Normal Cacheable, Inner and Outer > Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly > at D13-3586). I think that's the allocation in unified caches, and not necessarily the i-cache, given that it also mention things such as "Inner Write-Through", which makes no sense for the i-cache. > Like you've pointed out, as mentioned in D4.4.6, it is always > possible that instruction fetches are held in the instruction cache, > regardless of the state of the SCTLR_EL1.M bit. Exactly, and that's what breaks things. > > The FWB rules on combining S1 and S2 says that Normal Non-cacheable at > > S1 is "upgraded" to cacheable. This should happen irrespective of > > whether the S1 MMU is on or off and should apply to both I and D > > accesses (since it does not explicitly says). So I think we could skip > > this IC IALLU when FWB is present. > > > > The same logic should apply when the VMM copies the VM text. With FWB, > > we probably only need D-cache maintenance to PoU and only if > > CTR_EL0.IDC==0. I haven't checked what the code currently does. > > When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't > need a dcache clean in this case. But that isn't what concerns me. FWB is exclusively documented in terms of d-cache, and doesn't describe how that affects the instruction fetch (which is why I'm reluctant to attribute any effect to it). Thanks, M. -- Without deviation from the norm, progress is not possible.