RE: hackbench score comparison between 5.10.75-rt47 and 5.14.14-rt21

"Chanho Park" <chanho61.park@xxxxxxxxxxx> · Fri, 5 Nov 2021 10:41:40 +0900

Hi,

> -----Original Message-----
> From: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
> Sent: Wednesday, November 3, 2021 6:14 PM
> To: Chanho Park <chanho61.park@xxxxxxxxxxx>
> Cc: linux-rt-users@xxxxxxxxxxxxxxx; 'Thomas Gleixner' <tglx@xxxxxxxxxxxxx>
> Subject: Re: hackbench score comparison between 5.10.75-rt47 and 5.14.14-
> rt21
> 
> On 2021-11-03 12:15:49 [+0900], Chanho Park wrote:
> > Dear RT folks,
> Hi,
> 
> > I found an uncomprehended value of hackbench while I tested preempt rt
> > patches on my ARM64(Cortex A76 x 8) target.
> > So, I decided to check it on QEMU x86_64 KVM with yocto. I executed
> > both images with below command.
> >
> > $ runqemu qemux86-64 kvm nographic qemuparams="-smp cores=4"
> >
> > I was able to get similar score values with my arm64 target. It was
> > half than 5.10.75 kernel like below.
> > Any idea about this? Actually, I'm not sure it could be a regression or
> not.
> >
> > <5.10.75-rt47>
> > root@qemux86-64:~# hackbench -l 10000
> > Running in process mode with 10 groups using 40 file descriptors each
> > (==
> > 400 tasks)
> > Each sender will pass 10000 messages of 100 bytes
> > Time: 49.898
> >
> > <5.14.14-rt21>
> > root@qemux86-64:~# hackbench -l 10000
> > Running in process mode with 10 groups using 40 file descriptors each
> > (==
> > 400 tasks)
> > Each sender will pass 10000 messages of 100 bytes
> > Time: 96.973
> 
> The 5.14 series has a different SLUB implementation. Could you please make
> sure that SLUB_CPU_PARTIAL is disabled?
> So v5.14-rc3-rt1 should be worse than v5.10. Then v5.14-rc3-rt2 introduced
> adaptive spinning which should improve the situation. However it is
> slightly worse than v5.10 but it should have improved.
> Could verify that?
> 
> Also could double check this on hardware? I have no idea how well the
> adaptive spinning is working in KVM and this (hackbench) is a micro
> benchmark for the memory allocator/SLUB and any spin/guest preemption can
> have a visible outcome.
> While I saw worse numbers here (hackbench) I didn't observe it in a real-
> work workload like a kernel build for instance.

I checked the same test on my aarch64 target. I'll do more realistic benchmark such as compile bench.

<5.10.73-rt54 aarch64>
root@euto-v9-sadk:~# hackbench -l 10000
Time: 24.994

<5.15.0-rt17 aarch64 w/o CONFIG_SLUB_CPU_PARTIAL>
root@euto-v9-sadk:~# hackbench -l 10000
Time: 31.372

<5.15.0-rt17 aarch64 w/ CONFIG_SLUB_CPU_PARTIAL>
root@euto-v9-sadk:~# hackbench -l 10000
Time: 35.269

Best Regards,
Chanho Park