On 2021-11-03 12:15:49 [+0900], Chanho Park wrote: > Dear RT folks, Hi, > I found an uncomprehended value of hackbench while I tested preempt rt > patches on my ARM64(Cortex A76 x 8) target. > So, I decided to check it on QEMU x86_64 KVM with yocto. I executed both > images with below command. > > $ runqemu qemux86-64 kvm nographic qemuparams="-smp cores=4" > > I was able to get similar score values with my arm64 target. It was half > than 5.10.75 kernel like below. > Any idea about this? Actually, I'm not sure it could be a regression or not. > > <5.10.75-rt47> > root@qemux86-64:~# hackbench -l 10000 > Running in process mode with 10 groups using 40 file descriptors each (== > 400 tasks) > Each sender will pass 10000 messages of 100 bytes > Time: 49.898 > > <5.14.14-rt21> > root@qemux86-64:~# hackbench -l 10000 > Running in process mode with 10 groups using 40 file descriptors each (== > 400 tasks) > Each sender will pass 10000 messages of 100 bytes > Time: 96.973 The 5.14 series has a different SLUB implementation. Could you please make sure that SLUB_CPU_PARTIAL is disabled? So v5.14-rc3-rt1 should be worse than v5.10. Then v5.14-rc3-rt2 introduced adaptive spinning which should improve the situation. However it is slightly worse than v5.10 but it should have improved. Could verify that? Also could double check this on hardware? I have no idea how well the adaptive spinning is working in KVM and this (hackbench) is a micro benchmark for the memory allocator/SLUB and any spin/guest preemption can have a visible outcome. While I saw worse numbers here (hackbench) I didn't observe it in a real-work workload like a kernel build for instance. > Best Regards, > Chanho Park Sebastian