Re: [kvm-unit-tests PATCH v3 15/18] arm/arm64: Perform dcache clean + invalidate after turning MMU off

Andrew Jones <drjones@xxxxxxxxxx> · Mon, 6 Jan 2020 17:28:35 +0100

On Mon, Jan 06, 2020 at 02:27:31PM +0000, Alexandru Elisei wrote:
> Hi,
> 
> On 1/3/20 4:49 PM, Andre Przywara wrote:
> > On Tue, 31 Dec 2019 16:09:46 +0000
> > Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote:
> >
> > Hi,
> >
> >> When the MMU is off, data accesses are to Device nGnRnE memory on arm64 [1]
> >> or to Strongly-Ordered memory on arm [2]. This means that the accesses are
> >> non-cacheable.
> >>
> >> Perform a dcache clean to PoC so we can read the newer values from the
> >> cache after we turn the MMU off, instead of the stale values from memory.
> > Wow, did we really not do this before?
> >  
> >> Perform an invalidation so we can access the data written to memory after
> >> we turn the MMU back on. This prevents reading back the stale values we
> >> cleaned from the cache when we turned the MMU off.
> >>
> >> Data caches are PIPT and the VAs are translated using the current
> >> translation tables, or an identity mapping (what Arm calls a "flat
> >> mapping") when the MMU is off [1, 2]. Do the clean + invalidate when the
> >> MMU is off so we don't depend on the current translation tables and we can
> >> make sure that the operation applies to the entire physical memory.
> > The intention of the patch is very much valid, I am just wondering if there is any reason why you do the cache line size determination in (quite some lines of) C?
> > Given that you only use that in asm, wouldn't it be much easier to read the CTR register there, just before you actually use it? The actual CTR read is (inline) assembly anyway, so you just need the mask/shift/add in asm as well. You could draw inspiration from here, for instance:
> > https://gitlab.denx.de/u-boot/u-boot/blob/master/arch/arm/cpu/armv8/cache.S#L132
> 
> Computing the dcache line size in assembly is how Linux does it as well. I chose
> to do it in C because I like to avoid using assembly as much as possible. But I
> have no strong preference in keeping it in C. Andrew, what do you think? Should
> the cache line size be computed in C or in assembly, in asm_mmu_disable?

I also prefer to minimize the amount of assembly and to minimize the
amount of code in general. For something like this I probably wouldn't
have introduced the macros, unless there's reason to believe unit tests
will also make use of them. Instead, I would just introduce get_ctr()
to avoid #ifdef's and then put the calculation directly into cpu_init()
like below. However I don't have a strong opinion here, so whatever
makes you guys happy :-)

Thanks,
drew

diff --git a/lib/arm/asm/processor.h b/lib/arm/asm/processor.h
index a8c4628da818..ae7e3816e676 100644
--- a/lib/arm/asm/processor.h
+++ b/lib/arm/asm/processor.h
@@ -64,6 +64,7 @@ extern bool is_user(void);
 
 #define CNTVCT		__ACCESS_CP15_64(1, c14)
 #define CNTFRQ		__ACCESS_CP15(c14, 0, c0, 0)
+#define CTR		__ACCESS_CP15(c0, 0, c0, 1)
 
 static inline u64 get_cntvct(void)
 {
@@ -76,4 +77,11 @@ static inline u32 get_cntfrq(void)
 	return read_sysreg(CNTFRQ);
 }
 
+static inline u32 get_ctr(void)
+{
+	return read_sysreg(CTR);
+}
+
+extern u32 dcache_line_size;
+
 #endif /* _ASMARM_PROCESSOR_H_ */
diff --git a/lib/arm/setup.c b/lib/arm/setup.c
index 4f02fca85607..11b9cc9602ea 100644
--- a/lib/arm/setup.c
+++ b/lib/arm/setup.c
@@ -35,6 +35,8 @@ int nr_cpus;
 struct mem_region mem_regions[NR_MEM_REGIONS];
 phys_addr_t __phys_offset, __phys_end;
 
+u32 dcache_line_size;
+
 int mpidr_to_cpu(uint64_t mpidr)
 {
 	int i;
@@ -59,6 +61,8 @@ static void cpu_init(void)
 {
 	int ret;
 
+	dcache_line_size = 4 << ((get_ctr() >> 16) & 0xf);
+
 	nr_cpus = 0;
 	ret = dt_for_each_cpu_node(cpu_set, NULL);
 	assert(ret == 0);
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 1d9223f728a5..5cba1591eda7 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -105,5 +105,12 @@ static inline u32 get_cntfrq(void)
 	return read_sysreg(cntfrq_el0);
 }
 
+static inline u32 get_ctr(void)
+{
+	return read_sysreg(ctr_el0);
+}
+
+extern u32 dcache_line_size;
+
 #endif /* !__ASSEMBLY__ */
 #endif /* _ASMARM64_PROCESSOR_H_ */