Re: [PATCH] arm64: dts: allwinner: Add cache information to the SoC dtsi for H616

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri,  3 May 2024 11:09:41 +0200
Dragan Simic <dsimic@xxxxxxxxxxx> wrote:

> Add missing cache information to the Allwinner H616 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H616 cache information.
> 
> Adding the cache information to the H616 SoC dtsi also makes the following
> warning message in the kernel log go away:
> 
>   cacheinfo: Unable to detect cache hierarchy for CPU 0
> 
> Rather conspicuously, almost no cache-related information is available in
> the publicly available Allwinner H616 datasheet (version 1.0) and H616 user
> manual (version 1.0).  Thus, the cache parameters for the H616 SoC dtsi were
> obtained and derived by hand from the cache size and layout specifications
> found in the following technical reference manual, and from the cache size
> and die revision hints available from the following community-provided data
> and memory subsystem benchmarks:
> 
>   - ARM Cortex-A53 revision r0p4 TRM, version J
>   - Summary of the two available H616 die revisions and their differences
>     in cache sizes observed from the CSSIDR_EL1 register readouts, provided
>     by Andre Przywara [1][2]
>   - Tinymembench benchmark results of the H616-based OrangePi Zero 2 SBC,
>     provided by Thomas Kaiser [3]
> 
> For future reference, here's a brief summary of the available documentation
> and the community-provided data and memory subsystem benchmarks:
> 
>   - All caches employ the 64-byte cache line length
>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
>     cache and 32 KB of L1 4-way, set-associative data cache
>   - The size of the L2 cache depends on the actual H616 die revision (there
>     are two die revisions), so the entire SoC can have either 256 KB or 1 MB
>     of unified L2 16-way, set-associative cache [1]
> 
> Also for future reference, here's the relevant excerpt from the community-
> provided H616 memory subsystem benchmark, [3] which confirms that 32 KB and
> 256 KB are the L1 data and L2 cache sizes, respectively:
> 
>     block size : single random read / dual random read
>           1024 :    0.0 ns          /     0.0 ns
>           2048 :    0.0 ns          /     0.0 ns
>           4096 :    0.0 ns          /     0.0 ns
>           8192 :    0.0 ns          /     0.0 ns
>          16384 :    0.0 ns          /     0.0 ns
>          32768 :    0.0 ns          /     0.0 ns
>          65536 :    4.3 ns          /     7.3 ns
>         131072 :    6.6 ns          /    10.5 ns
>         262144 :    9.8 ns          /    15.2 ns
>         524288 :   91.8 ns          /   142.9 ns
>        1048576 :  138.6 ns          /   188.3 ns
>        2097152 :  163.0 ns          /   204.8 ns
>        4194304 :  178.8 ns          /   213.5 ns
>        8388608 :  187.1 ns          /   217.9 ns
>       16777216 :  192.2 ns          /   220.9 ns
>       33554432 :  196.5 ns          /   224.0 ns
>       67108864 :  215.7 ns          /   259.5 ns

Thanks for dumping the elaborate information here!

> The changes introduced to the H616 SoC dtsi by this patch specify 256 KB as
> the L2 cache size.  As outlined by Andre Przywara, [2] a follow-up TF-A patch
> will perform runtime adjustment of the device tree data, making the correct
> L2 cache size of 1 MB present in the device tree for the boards based on the
> revision of H616 that actually provides 1 MB of L2 cache.

I pushed that TF-A patch for review now: 
https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/28694/1
On my OrangePi Zero3 (with an 1MB H618 SoC) the size and number of sets
get adjusted to describe 1MB:
=> fdt list /cpus/l2-cache
l2-cache {
        compatible = "cache";
        cache-level = <0x00000002>;
        cache-unified;
        cache-size = <0x00100000>;
        cache-line-size = <0x00000040>;
        cache-sets = <0x00000400>;
        phandle = <0x00000003>;
};

> [1] https://lore.kernel.org/linux-sunxi/20240430114627.0cfcd14a@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/linux-sunxi/20240501103059.10a8f7de@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
> [3] https://raw.githubusercontent.com/ThomasKaiser/sbc-bench/master/results/4knM.txt
> 
> Suggested-by: Andre Przywara <andre.przywara@xxxxxxx>
> Helped-by: Andre Przywara <andre.przywara@xxxxxxx>
> Signed-off-by: Dragan Simic <dsimic@xxxxxxxxxxx>

So I can confirm that the information above is correct, and also matches
the DT properties added below.

Reviewed-by: Andre Przywara <andre.przywara@xxxxxxx>

Thanks!
Andre

> ---
>  .../arm64/boot/dts/allwinner/sun50i-h616.dtsi | 37 +++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> index b2e85e52d1a1..4faed88d8909 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h616.dtsi
> @@ -26,30 +26,67 @@ cpu0: cpu@0 {
>  			reg = <0>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu1: cpu@1 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <1>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu2: cpu@2 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <2>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu3: cpu@3 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <3>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
> +		};
> +
> +		l2_cache: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +			cache-unified;
> +			cache-size = <0x40000>;
> +			cache-line-size = <64>;
> +			cache-sets = <256>;
>  		};
>  	};
>  





[Index of Archives]     [Device Tree Compilter]     [Device Tree Spec]     [Linux Driver Backports]     [Video for Linux]     [Linux USB Devel]     [Linux PCI Devel]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Yosemite Backpacking]


  Powered by Linux