On Fri, 2 Feb 2024 at 17:35, Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> wrote: > > On Fri, Feb 2, 2024 at 2:00 PM Marco Elver <elver@xxxxxxxxxx> wrote: > > > > > Maybe we can try something else? > > > > That's strange - the patches at [1] definitely revert the change you > > bisected to. It's possible there is some other strange side-effect. (I > > assume that you are still running all this with a KASAN kernel.) > > Yes. build .config not changed between kernel builds. > > > Just so I understand it right: > > You say before commit cc478e0b6bdffd20561e1a07941a65f6c8962cab the > > game's FPS were good. But that is strange, because at that point we're > > already doing stackdepot refcounting, i.e. after commit > > 773688a6cb24b0b3c2ba40354d883348a2befa38 which you reported as the > > initial performance regression. The patches at [2] fixed that problem. > > > > So now it's unclear to me how the simple change in > > cc478e0b6bdffd20561e1a07941a65f6c8962cab causes the performance > > problem, when in fact this is already with KASAN stackdepot > > refcounting enabled but without the performance fixes from [1] and > > [2]. > > > > [2] https://lore.kernel.org/all/20240118110216.2539519-2-elver@xxxxxxxxxx/ > > > > My questions now would be: > > - What was the game's FPS in the last stable kernel (v6.7)? > > [6.7] - 83 FPS - 13060 frames during benchmark. > > > - Can you collect another set of performance profiles between good and > > bad? Maybe it would show where the time in the kernel is spent. > > Yes, > please look at [aaa2c9a97c22 perf] and [cc478e0b6bdf perf] > > > perf diff perf-git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.data perf-git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.data > No kallsyms or vmlinux with build-id > de2a040f828394c5ce34802389239c2a0668fcc7 was found > No kallsyms or vmlinux with build-id > 33ab1cd545f96f5ffc2a402a4c4cfa647fd727a0 was found > # Event 'cycles:P' > # > # Baseline Delta Abs Shared Object > Symbol > # ........ ......... .............................................. > ..................................................................................................................................................................................... > # > 48.48% +21.75% [kernel.kallsyms] > [k] 0xffffffff860065c0 > 36.13% -16.49% ShadowOfTheTombRaider > [.] 0x00000000001d7f5e > 4.43% -2.10% libvulkan_radeon.so > [.] 0x000000000006b870 > 3.28% -0.63% libcef.so > [.] 0x00000000021720e0 > 1.11% -0.53% libc.so.6 > [.] syscall > 0.65% -0.24% libc.so.6 > [.] __memmove_avx512_unaligned_erms > 0.31% -0.14% libc.so.6 > [.] __memset_avx512_unaligned_erms > 0.26% -0.13% libm.so.6 > [.] __powf_fma > 0.20% -0.10% [amdgpu] > [k] amdgpu_bo_placement_from_domain > 0.22% -0.09% [amdgpu] > [k] amdgpu_vram_mgr_compatible > 0.67% -0.09% armada-drm_dri.so > [.] 0x00000000000192b4 > 0.15% -0.08% libc.so.6 > [.] sem_post@GLIBC_2.2.5 > 0.16% -0.07% [amdgpu] > [k] amdgpu_vm_bo_update > 0.14% -0.07% [amdgpu] > [k] amdgpu_bo_list_entry_cmp > 0.13% -0.06% libm.so.6 > [.] powf@GLIBC_2.2.5 > 0.14% -0.06% libMangoHud.so > [.] 0x000000000001c4c0 > 0.10% -0.06% libc.so.6 > [.] __futex_abstimed_wait_common > 0.19% -0.05% libGLESv2.so > [.] 0x0000000000160a11 > 0.07% -0.04% libc.so.6 > [.] __new_sem_wait_slow64.constprop.0 > 0.10% -0.04% radeonsi_dri.so > [.] 0x0000000000019454 > 0.05% -0.03% [amdgpu] > [k] optc1_get_position > 0.05% -0.03% libc.so.6 > [.] sem_wait@@GLIBC_2.34 > 0.22% -0.02% [vdso] > [.] 0x00000000000005a0 > 0.10% -0.02% libc.so.6 > [.] __memcmp_evex_movbe > +0.02% [JIT] tid 8383 > [.] 0x00007f2de0052823 > > > > - Could it be an inconclusive bisection? > > I checked twice: > [6.7] - 83 FPS > [aaa2c9a97c22] - 111 FPS > [cc478e0b6bdf] - 64 FPS > [6.8-rc2 with patches] - 82 FPS > > > [6.7] https://i.postimg.cc/15yyzZBr/v6-7.png > [6.7 perf] https://mega.nz/file/QwJ3hbob#RslLFVYgz1SWMcPR3eF9uEpFuqxdgkwXSatWts-1wVA > > [aaa2c9a97c22] https://i.postimg.cc/Sxv4VYhg/git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.png > [aaa2c9a97c22 perf] > https://mega.nz/file/dwQxha4J#2_nBF6uNzY11VX-T-Lr_-60WIMrbl1YEvPgY4CuXqEc > > [cc478e0b6bdf] https://i.postimg.cc/W3cQfMfw/git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.png > [cc478e0b6bdf perf] > https://mega.nz/file/hl5kwLTC#_4Fg1KBXCnQ-8OElY7EYmPOoDG6ZeZYnKFjamWpklWw > > [6.8-rc2 with patches] https://i.postimg.cc/26dPpVsR/v6-8-rc2-with-patches.png > [6.8-rc2 with patches perf] > https://mega.nz/file/NxgTAb4L#0KO_WU-svpDw60Y3148RZhELPcUtFg3_VCDzJqSyz34 Thanks a lot for these results. There's definitely something strange going - I'll try to have a detailed look some time next week. In the meantime, this is clear: there does not seem to be a regression between 6.7 and 6.8-rc with the patches, which is what I was expecting. The fact that aaa2c9a97c22 is so much better could indicate that until cc478e0b6bdf there was either a bug which turned something into a no-op - or, the memsets() were acting as some kind of prefetching hint to the CPU, which in turn caused a significant reduction in cache misses. I think at this point we're not trying to fix a regression, because we're on par with 6.7, but trying to make sense of this information to optimize the code properly without luck (but not sure if feasible). Hrm....