Hello Andre,
On 2024-04-29 12:33, Andre Przywara wrote:
On Sun, 28 Apr 2024 13:40:35 +0200
Dragan Simic <dsimic@xxxxxxxxxxx> wrote:
thanks for taking care of this!
Thank you for reviewing my patch!
Add missing cache information to the Allwinner A64 SoC dtsi, to allow
the userspace, which includes lscpu(1) that uses the virtual files
provided
by the kernel under the /sys/devices/system/cpu directory, to display
the
proper A64 cache information.
While there, use a more self-descriptive label for the L2 cache node,
which
also makes it more consistent with other SoC dtsi files.
The cache parameters for the A64 dtsi were obtained and partially
derived
by hand from the cache size and layout specifications found in the
following
datasheets and technical reference manuals:
- Allwinner A64 datasheet, version 1.1
- ARM Cortex-A53 revision r0p3 TRM, version E
For future reference, here's a brief summary of the documentation:
- All caches employ the 64-byte cache line length
- Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
instruction
cache and 32 KB of L1 4-way, set-associative data cache
- The entire SoC has 512 KB of unified L2 16-way, set-associative
cache
So that looks correct when checking the manuals, and the per-CPU
entries below match both between themselves and with that description
above.
However I have some level of distrust towards the Allwinner manuals,
regarding the cache sizes (which are chosen by Allwinner).
Quite frankly, I was surprised a bit to see that the A64 contains
512 KB of L2 cache. IMHO, that's quite a lot for an SoC that was
advertised primarily as a cost-effective solution.
So while I haven't measured this myself, nor checked the cache type
registers, tinymembench's memory latency test supports those sizes are
correct:
https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)
Ah, that's a nice benchmark report. Let me copy & paste the most
relevant part of that report below, just for future reference in
case that web page becomes inaccessible at some point:
==========================================================================
== Memory latency test
==
==
==
== Average time is measured for random memory accesses in the buffers
==
== of different sizes. The larger is the buffer, the more significant
==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM
==
== accesses. For extremely large buffer sizes we are expecting to see
==
== page table walk with several requests to SDRAM for almost every
==
== memory access (though 64MiB is not nearly large enough to experience
==
== this effect to its fullest).
==
==
==
== Note 1: All the numbers are representing extra time, which needs to
==
== be added to L1 cache latency. The cycle timings for L1 cache
==
== latency can be usually found in the processor documentation.
==
== Note 2: Dual random read means that we are simultaneously performing
==
== two independent memory accesses at a time. In the case if
==
== the memory subsystem can't handle multiple outstanding
==
== requests, dual random read has the same timings as two
==
== single reads performed one after another.
==
==========================================================================
block size : single random read / dual random read, [MADV_NOHUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 5.9 ns / 10.0 ns
131072 : 9.1 ns / 14.0 ns
262144 : 10.7 ns / 15.5 ns
524288 : 12.7 ns / 17.7 ns
1048576 : 92.8 ns / 143.2 ns
2097152 : 134.9 ns / 184.4 ns
4194304 : 163.5 ns / 207.1 ns
8388608 : 178.6 ns / 217.6 ns
16777216 : 187.5 ns / 223.7 ns
33554432 : 192.8 ns / 228.0 ns
67108864 : 195.8 ns / 230.7 ns
block size : single random read / dual random read, [MADV_HUGEPAGE]
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 5.9 ns / 10.0 ns
131072 : 9.1 ns / 14.0 ns
262144 : 10.7 ns / 15.6 ns
524288 : 12.6 ns / 17.8 ns
1048576 : 92.7 ns / 142.6 ns
2097152 : 134.7 ns / 184.3 ns
4194304 : 155.8 ns / 198.4 ns
8388608 : 166.4 ns / 203.8 ns
16777216 : 171.6 ns / 206.0 ns
33554432 : 174.2 ns / 206.9 ns
67108864 : 175.4 ns / 207.4 ns
Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>
Reviewed-by: Andre Przywara <andre.przywara@xxxxxxx>
Thanks!
---
arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37
++++++++++++++++---
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
index 57ac18738c99..86074d03afa9 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
@@ -51,49 +51,76 @@ cpu0: cpu@0 {
device_type = "cpu";
reg = <0>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};
cpu1: cpu@1 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <1>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};
cpu2: cpu@2 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <2>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};
cpu3: cpu@3 {
compatible = "arm,cortex-a53";
device_type = "cpu";
reg = <3>;
enable-method = "psci";
- next-level-cache = <&L2>;
clocks = <&ccu CLK_CPUX>;
clock-names = "cpu";
#cooling-cells = <2>;
+ i-cache-size = <0x8000>;
+ i-cache-line-size = <64>;
+ i-cache-sets = <256>;
+ d-cache-size = <0x8000>;
+ d-cache-line-size = <64>;
+ d-cache-sets = <128>;
+ next-level-cache = <&l2_cache>;
};
- L2: l2-cache {
+ l2_cache: l2-cache {
compatible = "cache";
cache-level = <2>;
cache-unified;
+ cache-size = <0x80000>;
+ cache-line-size = <64>;
+ cache-sets = <512>;
};
};