Hello, It's not clear to me why this patch is needed. If one VCPU in the VM is generating code, is it not the software running in the VM responsible for keeping track of the MMU state of the other VCPUs and making sure the new code is executed correctly? Why should KVM get involved? I don't see how this is different than running on bare metal (no hypervisor), and one CPU with the MMU on generates code that another CPU with the MMU off must execute. Some comments below. On 3/6/21 2:15 PM, Catalin Marinas wrote: > On Sat, Mar 06, 2021 at 10:54:47AM +0000, Marc Zyngier wrote: >> On Fri, 05 Mar 2021 19:07:09 +0000, >> Catalin Marinas <catalin.marinas@xxxxxxx> wrote: >>> On Wed, Mar 03, 2021 at 04:45:05PM +0000, Marc Zyngier wrote: >>>> It recently became apparent that the ARMv8 architecture has interesting >>>> rules regarding attributes being used when fetching instructions >>>> if the MMU is off at Stage-1. >>>> >>>> In this situation, the CPU is allowed to fetch from the PoC and >>>> allocate into the I-cache (unless the memory is mapped with >>>> the XN attribute at Stage-2). >>> Digging through the ARM ARM is hard. Do we have this behaviour with FWB >>> as well? >> The ARM ARM doesn't seem to mention FWB at all when it comes to >> instruction fetch, which is sort of expected as it only covers the >> D-side. I *think* we could sidestep this when CTR_EL0.DIC is set >> though, as the I-side would then snoop the D-side. > Not sure this helps. CTR_EL0.DIC refers to the need for maintenance to > PoU while the SCTLR_EL1.M == 0 causes the I-cache to fetch from PoC. I > don't think I-cache snooping the D-cache would happen to the PoU when > the S1 MMU is off. FEAT_FWB requires that CLIDR_EL1.{LoUIS, LoUU} = {0, 0} which means that no dcache clean is required for instruction to data coherence (page D13-3086). I interpret that as saying that with FEAT_FWB, CTR_EL0.IDC is effectively 1, which means that dcache clean is not required for instruction generation, and icache invalidation is required only if CTR_EL0.DIC = 0 (according to B2-158). > My reading of D4.4.4 is that when SCTLR_EL1.M == 0 both I and D accesses > are Normal Non-cacheable with a note in D4.4.6 that Non-cacheable > accesses may be held in the I-cache. Nitpicking, but SCTLR_EL1.M == 0 and SCTLR_EL1.I == 1 means that instruction fetches are to Normal Cacheable, Inner and Outer Read-Allocate memory (ARM DDI 0487G.a, pages D5-2709 and indirectly at D13-3586). Like you've pointed out, as mentioned in D4.4.6, it is always possible that instruction fetches are held in the instruction cache, regardless of the state of the SCTLR_EL1.M bit. > The FWB rules on combining S1 and S2 says that Normal Non-cacheable at > S1 is "upgraded" to cacheable. This should happen irrespective of > whether the S1 MMU is on or off and should apply to both I and D > accesses (since it does not explicitly says). So I think we could skip > this IC IALLU when FWB is present. > > The same logic should apply when the VMM copies the VM text. With FWB, > we probably only need D-cache maintenance to PoU and only if > CTR_EL0.IDC==0. I haven't checked what the code currently does. When FEAT_FWB, CTR_EL0.IDC is effectively 1 (see above), so we don't need a dcache clean in this case. Thanks, Alex _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm