Re: hackbench score comparison between 5.10.75-rt47 and 5.14.14-rt21

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Wed, 3 Nov 2021 10:13:49 +0100

On 2021-11-03 12:15:49 [+0900], Chanho Park wrote:
> Dear RT folks,
Hi,

> I found an uncomprehended value of hackbench while I tested preempt rt
> patches on my ARM64(Cortex A76 x 8) target.
> So, I decided to check it on QEMU x86_64 KVM with yocto. I executed both
> images with below command.
> 
> $ runqemu qemux86-64 kvm nographic qemuparams="-smp cores=4"
> 
> I was able to get similar score values with my arm64 target. It was half
> than 5.10.75 kernel like below.
> Any idea about this? Actually, I'm not sure it could be a regression or not.
> 
> <5.10.75-rt47>
> root@qemux86-64:~# hackbench -l 10000
> Running in process mode with 10 groups using 40 file descriptors each (==
> 400 tasks)
> Each sender will pass 10000 messages of 100 bytes
> Time: 49.898
> 
> <5.14.14-rt21>
> root@qemux86-64:~# hackbench -l 10000
> Running in process mode with 10 groups using 40 file descriptors each (==
> 400 tasks)
> Each sender will pass 10000 messages of 100 bytes
> Time: 96.973

The 5.14 series has a different SLUB implementation. Could you please
make sure that SLUB_CPU_PARTIAL is disabled?
So v5.14-rc3-rt1 should be worse than v5.10. Then v5.14-rc3-rt2
introduced adaptive spinning which should improve the situation. However
it is slightly worse than v5.10 but it should have improved.
Could verify that?

Also could double check this on hardware? I have no idea how well the
adaptive spinning is working in KVM and this (hackbench) is a micro
benchmark for the memory allocator/SLUB and any spin/guest preemption
can have a visible outcome.
While I saw worse numbers here (hackbench) I didn't observe it in a
real-work workload like a kernel build for instance.

> Best Regards,
> Chanho Park

Sebastian