On Fri, Aug 16, 2013 at 10:32 AM, Anup Patel <anup@xxxxxxxxxxxxxx> wrote: > On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall > <christoffer.dall@xxxxxxxxxx> wrote: >> On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote: >>> On 2013-08-15 16:13, Anup Patel wrote: >>> > Hi Marc, >>> > >>> > On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@xxxxxxx> >>> > wrote: >>> >> On 2013-08-15 14:31, Anup Patel wrote: >>> >>> >>> >>> On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier >>> >>> <marc.zyngier@xxxxxxx> >>> >>> wrote: >>> >>>> >>> >>>> On 2013-08-15 07:26, Anup Patel wrote: >>> >>>>> >>> >>>>> >>> >>>>> Hi Marc, >>> >>>>> >>> >>>>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier >>> >>>>> <marc.zyngier@xxxxxxx> >>> >>>>> wrote: >>> >>>>>> >>> >>>>>> >>> >>>>>> Hi Anup, >>> >>>>>> >>> >>>>>> >>> >>>>>> On 2013-08-14 15:22, Anup Patel wrote: >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier >>> >>>>>>> <marc.zyngier@xxxxxxx> >>> >>>>>>> wrote: >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> Hi Pranav, >>> >>>>>>>> >>> >>>>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote: >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> Systems with large external L3-cache (few MBs), might have >>> >>>>>>>>> dirty >>> >>>>>>>>> content belonging to the guest page in L3-cache. To tackle >>> >>>>>>>>> this, >>> >>>>>>>>> we need to flush such dirty content from d-cache so that >>> >>>>>>>>> guest >>> >>>>>>>>> will see correct contents of guest page when guest MMU is >>> >>>>>>>>> disabled. >>> >>>>>>>>> >>> >>>>>>>>> The patch fixes coherent_icache_guest_page() for external >>> >>>>>>>>> L3-cache. >>> >>>>>>>>> >>> >>>>>>>>> Signed-off-by: Pranavkumar Sawargaonkar >>> >>>>>>>>> <pranavkumar@xxxxxxxxxx> >>> >>>>>>>>> Signed-off-by: Anup Patel <anup.patel@xxxxxxxxxx> >>> >>>>>>>>> --- >>> >>>>>>>>> arch/arm64/include/asm/kvm_mmu.h | 2 ++ >>> >>>>>>>>> 1 file changed, 2 insertions(+) >>> >>>>>>>>> >>> >>>>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h >>> >>>>>>>>> b/arch/arm64/include/asm/kvm_mmu.h >>> >>>>>>>>> index efe609c..5129038 100644 >>> >>>>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h >>> >>>>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h >>> >>>>>>>>> @@ -123,6 +123,8 @@ static inline void >>> >>>>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) >>> >>>>>>>>> if (!icache_is_aliasing()) { /* PIPT */ >>> >>>>>>>>> unsigned long hva = gfn_to_hva(kvm, gfn); >>> >>>>>>>>> flush_icache_range(hva, hva + PAGE_SIZE); >>> >>>>>>>>> + /* Flush d-cache for systems with external >>> >>>>>>>>> caches. */ >>> >>>>>>>>> + __flush_dcache_area((void *) hva, PAGE_SIZE); >>> >>>>>>>>> } else if (!icache_is_aivivt()) { /* non >>> >>>>>>>>> ASID-tagged >>> >>>>>>>>> VIVT >>> >>>>>>>>> */ >>> >>>>>>>>> /* any kind of VIPT cache */ >>> >>>>>>>>> __flush_icache_all(); >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> [adding Will to the discussion as we talked about this in the >>> >>>>>>>> past] >>> >>>>>>>> >>> >>>>>>>> That's definitely an issue, but I'm not sure the fix is to hit >>> >>>>>>>> the >>> >>>>>>>> data >>> >>>>>>>> cache on each page mapping. It looks overkill. >>> >>>>>>>> >>> >>>>>>>> Wouldn't it be enough to let userspace do the cache cleaning? >>> >>>>>>>> kvmtools >>> >>>>>>>> knows which bits of the guest memory have been touched, and >>> >>>>>>>> can do a >>> >>>>>>>> "DC >>> >>>>>>>> DVAC" on this region. >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> It seems a bit unnatural to have cache cleaning is user-space. >>> >>>>>>> I am >>> >>>>>>> sure >>> >>>>>>> other architectures don't have such cache cleaning in >>> >>>>>>> user-space for >>> >>>>>>> KVM. >>> >>>>>>> >>> >>>>>>>> >>> >>>>>>>> The alternative is do it in the kernel before running any vcpu >>> >>>>>>>> - but >>> >>>>>>>> that's not very nice either (have to clean the whole of the >>> >>>>>>>> guest >>> >>>>>>>> memory, which makes a full dcache clean more appealing). >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> >>> >>>>>>> Actually, cleaning full d-cache by set/way upon first run of >>> >>>>>>> VCPU was >>> >>>>>>> our second option but current approach seemed very simple hence >>> >>>>>>> we went for this. >>> >>>>>>> >>> >>>>>>> If more people vote for full d-cache clean upon first run of >>> >>>>>>> VCPU then >>> >>>>>>> we should revise this patch. >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> Can you please give the attached patch a spin on your HW? I've >>> >>>>>> boot-tested >>> >>>>>> it on a model, but of course I can't really verify that it fixes >>> >>>>>> your >>> >>>>>> issue. >>> >>>>>> >>> >>>>>> As far as I can see, it should fix it without any additional >>> >>>>>> flushing. >>> >>>>>> >>> >>>>>> Please let me know how it goes. >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are >>> >>>>> treated as "Normal Non-shareable, Inner Write-Back >>> >>>>> Write-Allocate, >>> >>>>> Outer Write-Back Write-Allocate memory" >>> >>>>> >>> >>>>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are >>> >>>>> treated as "Strongly-ordered device memory" >>> >>>>> >>> >>>>> Now if Guest/VM access hardware MMIO devices directly (such as >>> >>>>> VGIC CPU interface) with MMU off then MMIO devices will be >>> >>>>> accessed as normal memory when HCR_EL2.DC=1. >>> >>>> >>> >>>> >>> >>>> >>> >>>> I don't think so. Stage-2 still applies, and should force MMIO to >>> >>>> be >>> >>>> accessed as device memory. >>> >>>> >>> >>>> >>> >>>>> The HCR_EL2.DC=1 makes sense only if we have all software >>> >>>>> emulated devices for Guest/VM which is not true for KVM ARM or >>> >>>>> KVM ARM64 because we use VGIC. >>> >>>>> >>> >>>>> IMHO, this patch enforces incorrect memory attribute for Guest/VM >>> >>>>> when Stage1 MMU is off. >>> >>>> >>> >>>> >>> >>>> >>> >>>> See above. My understanding is that HCR.DC controls the default >>> >>>> output of >>> >>>> Stage-1, and Stage-2 overrides still apply. >>> >>> >>> >>> >>> >>> You had mentioned that PAGE_S2_DEVICE attribute was redundant >>> >>> and wanted guest to decide the memory attribute. In other words, >>> >>> you >>> >>> did not want to enforce any memory attribute in Stage2. >>> >>> >>> >>> Please refer to this patch >>> >>> https://patchwork.kernel.org/patch/2543201/ >>> >> >>> >> >>> >> This patch has never been merged. If you carry on following the >>> >> discussion, >>> >> you will certainly notice it was dropped for a very good reason: >>> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html >>> >> >>> >> So Stage-2 memory attributes are used, they are not going away, and >>> >> they are >>> >> essential to the patch I sent this morning. >>> >> >>> >> >>> >> M. >>> >> -- >>> >> Fast, cheap, reliable. Pick two. >>> > >>> > HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM >>> > is >>> > provided a DMA-capable device in pass-through mode. The reason being >>> > bootloader/firmware typically don't enable MMU and such >>> > bootloader/firmware >>> > will programme a pass-through DMA-capable device without any flushes >>> > to >>> > guest RAM (because it has not enabled MMU). >>> > >>> > A good example of such a device would be SATA AHCI controller given >>> > to a >>> > Guest/VM as direct access (using SystemMMU) and Guest >>> > bootloader/firmware >>> > accessing this SATA AHCI controller to load kernel images from SATA >>> > disk. >>> > In this situation, we will have to hack Guest bootloader/firmware >>> > AHCI driver to >>> > explicitly flushes to Guest RAM (because have HCR_EL2.DC=1). >>> >>> OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out of >>> curiosity: is that a made up example or something you actually have? >>> >>> Back to square one: >>> Can you please benchmark the various cache cleaning options (global at >>> startup time, per-page on S2 translation fault, and user-space)? >>> >> Eh, why is this a more valid argument than the vgic? The device >> passthrough Stage-2 mappings would still have the Stage-2 memory >> attributes to configure that memory region as device memory. Why is it >> relevant if the device is DMA-capable in this context? >> >> -Christoffer > > Most ARM bootloader/firmware run with MMU off hence, they will not do > explicit cache flushes when programming DMA-capable device. Now If > HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware > will not be visible to DMA-capable device. > > --Anup The approach of flushing d-cache by set/way upon first run of VCPU will not work because for set/way operations ARM ARM says: "For set/way operations, and for All (entire cache) operations, the point is defined to be to the next level of caching". In other words, set/way operations work upto point of unification. Also, Guest Linux already does __flush_dcache_all() from __cpu_setup() at bootup time which does not work for us when L3-cache is enabled so, there is no point is adding one more __flush_dcache_all() upon first run of VCPU in KVM ARM64. IMHO, we are left with following options: 1. Flush all RAM regions of VCPU using __flush_dcache_range() upon first run of VCPU 2. Implement outer-cache framework for ARM64 and flush all caches + outer cache (i.e. L3-cache) upon first run of VCPU 3. Use an alternate version of flush_icache_range() which will flush d-cache by PoC instead of PoU. We can also ensure that coherent_icache_guest_page() function will be called upon Stage2 prefetch aborts only. Regards, Anup _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm