Hi, > -----Original Message----- > From: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> > Sent: Wednesday, November 3, 2021 6:14 PM > To: Chanho Park <chanho61.park@xxxxxxxxxxx> > Cc: linux-rt-users@xxxxxxxxxxxxxxx; 'Thomas Gleixner' <tglx@xxxxxxxxxxxxx> > Subject: Re: hackbench score comparison between 5.10.75-rt47 and 5.14.14- > rt21 > > On 2021-11-03 12:15:49 [+0900], Chanho Park wrote: > > Dear RT folks, > Hi, > > > I found an uncomprehended value of hackbench while I tested preempt rt > > patches on my ARM64(Cortex A76 x 8) target. > > So, I decided to check it on QEMU x86_64 KVM with yocto. I executed > > both images with below command. > > > > $ runqemu qemux86-64 kvm nographic qemuparams="-smp cores=4" > > > > I was able to get similar score values with my arm64 target. It was > > half than 5.10.75 kernel like below. > > Any idea about this? Actually, I'm not sure it could be a regression or > not. > > > > <5.10.75-rt47> > > root@qemux86-64:~# hackbench -l 10000 > > Running in process mode with 10 groups using 40 file descriptors each > > (== > > 400 tasks) > > Each sender will pass 10000 messages of 100 bytes > > Time: 49.898 > > > > <5.14.14-rt21> > > root@qemux86-64:~# hackbench -l 10000 > > Running in process mode with 10 groups using 40 file descriptors each > > (== > > 400 tasks) > > Each sender will pass 10000 messages of 100 bytes > > Time: 96.973 > > The 5.14 series has a different SLUB implementation. Could you please make > sure that SLUB_CPU_PARTIAL is disabled? > So v5.14-rc3-rt1 should be worse than v5.10. Then v5.14-rc3-rt2 introduced > adaptive spinning which should improve the situation. However it is > slightly worse than v5.10 but it should have improved. > Could verify that? > > Also could double check this on hardware? I have no idea how well the > adaptive spinning is working in KVM and this (hackbench) is a micro > benchmark for the memory allocator/SLUB and any spin/guest preemption can > have a visible outcome. > While I saw worse numbers here (hackbench) I didn't observe it in a real- > work workload like a kernel build for instance. I checked the same test on my aarch64 target. I'll do more realistic benchmark such as compile bench. <5.10.73-rt54 aarch64> root@euto-v9-sadk:~# hackbench -l 10000 Time: 24.994 <5.15.0-rt17 aarch64 w/o CONFIG_SLUB_CPU_PARTIAL> root@euto-v9-sadk:~# hackbench -l 10000 Time: 31.372 <5.15.0-rt17 aarch64 w/ CONFIG_SLUB_CPU_PARTIAL> root@euto-v9-sadk:~# hackbench -l 10000 Time: 35.269 Best Regards, Chanho Park