On Fri, 2 Feb 2024 at 17:47, Marco Elver <elver@xxxxxxxxxx> wrote: > > On Fri, 2 Feb 2024 at 17:35, Mikhail Gavrilov > <mikhail.v.gavrilov@xxxxxxxxx> wrote: > > > > On Fri, Feb 2, 2024 at 2:00 PM Marco Elver <elver@xxxxxxxxxx> wrote: > > > > > > > Maybe we can try something else? > > > > > > That's strange - the patches at [1] definitely revert the change you > > > bisected to. It's possible there is some other strange side-effect. (I > > > assume that you are still running all this with a KASAN kernel.) > > > > Yes. build .config not changed between kernel builds. > > > > > Just so I understand it right: > > > You say before commit cc478e0b6bdffd20561e1a07941a65f6c8962cab the > > > game's FPS were good. But that is strange, because at that point we're > > > already doing stackdepot refcounting, i.e. after commit > > > 773688a6cb24b0b3c2ba40354d883348a2befa38 which you reported as the > > > initial performance regression. The patches at [2] fixed that problem. > > > > > > So now it's unclear to me how the simple change in > > > cc478e0b6bdffd20561e1a07941a65f6c8962cab causes the performance > > > problem, when in fact this is already with KASAN stackdepot > > > refcounting enabled but without the performance fixes from [1] and > > > [2]. > > > > > > [2] https://lore.kernel.org/all/20240118110216.2539519-2-elver@xxxxxxxxxx/ > > > > > > My questions now would be: > > > - What was the game's FPS in the last stable kernel (v6.7)? > > > > [6.7] - 83 FPS - 13060 frames during benchmark. > > > > > - Can you collect another set of performance profiles between good and > > > bad? Maybe it would show where the time in the kernel is spent. > > > > Yes, > > please look at [aaa2c9a97c22 perf] and [cc478e0b6bdf perf] > > > > > perf diff perf-git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.data perf-git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.data > > No kallsyms or vmlinux with build-id > > de2a040f828394c5ce34802389239c2a0668fcc7 was found > > No kallsyms or vmlinux with build-id > > 33ab1cd545f96f5ffc2a402a4c4cfa647fd727a0 was found > > # Event 'cycles:P' > > # > > # Baseline Delta Abs Shared Object > > Symbol > > # ........ ......... .............................................. > > ..................................................................................................................................................................................... > > # > > 48.48% +21.75% [kernel.kallsyms] > > [k] 0xffffffff860065c0 > > 36.13% -16.49% ShadowOfTheTombRaider > > [.] 0x00000000001d7f5e > > 4.43% -2.10% libvulkan_radeon.so > > [.] 0x000000000006b870 > > 3.28% -0.63% libcef.so > > [.] 0x00000000021720e0 > > 1.11% -0.53% libc.so.6 > > [.] syscall > > 0.65% -0.24% libc.so.6 > > [.] __memmove_avx512_unaligned_erms > > 0.31% -0.14% libc.so.6 > > [.] __memset_avx512_unaligned_erms > > 0.26% -0.13% libm.so.6 > > [.] __powf_fma > > 0.20% -0.10% [amdgpu] > > [k] amdgpu_bo_placement_from_domain > > 0.22% -0.09% [amdgpu] > > [k] amdgpu_vram_mgr_compatible > > 0.67% -0.09% armada-drm_dri.so > > [.] 0x00000000000192b4 > > 0.15% -0.08% libc.so.6 > > [.] sem_post@GLIBC_2.2.5 > > 0.16% -0.07% [amdgpu] > > [k] amdgpu_vm_bo_update > > 0.14% -0.07% [amdgpu] > > [k] amdgpu_bo_list_entry_cmp > > 0.13% -0.06% libm.so.6 > > [.] powf@GLIBC_2.2.5 > > 0.14% -0.06% libMangoHud.so > > [.] 0x000000000001c4c0 > > 0.10% -0.06% libc.so.6 > > [.] __futex_abstimed_wait_common > > 0.19% -0.05% libGLESv2.so > > [.] 0x0000000000160a11 > > 0.07% -0.04% libc.so.6 > > [.] __new_sem_wait_slow64.constprop.0 > > 0.10% -0.04% radeonsi_dri.so > > [.] 0x0000000000019454 > > 0.05% -0.03% [amdgpu] > > [k] optc1_get_position > > 0.05% -0.03% libc.so.6 > > [.] sem_wait@@GLIBC_2.34 > > 0.22% -0.02% [vdso] > > [.] 0x00000000000005a0 > > 0.10% -0.02% libc.so.6 > > [.] __memcmp_evex_movbe > > +0.02% [JIT] tid 8383 > > [.] 0x00007f2de0052823 > > > > > > > - Could it be an inconclusive bisection? > > > > I checked twice: > > [6.7] - 83 FPS > > [aaa2c9a97c22] - 111 FPS > > [cc478e0b6bdf] - 64 FPS > > [6.8-rc2 with patches] - 82 FPS > > > > > > [6.7] https://i.postimg.cc/15yyzZBr/v6-7.png > > [6.7 perf] https://mega.nz/file/QwJ3hbob#RslLFVYgz1SWMcPR3eF9uEpFuqxdgkwXSatWts-1wVA > > > > [aaa2c9a97c22] https://i.postimg.cc/Sxv4VYhg/git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.png > > [aaa2c9a97c22 perf] > > https://mega.nz/file/dwQxha4J#2_nBF6uNzY11VX-T-Lr_-60WIMrbl1YEvPgY4CuXqEc > > > > [cc478e0b6bdf] https://i.postimg.cc/W3cQfMfw/git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.png > > [cc478e0b6bdf perf] > > https://mega.nz/file/hl5kwLTC#_4Fg1KBXCnQ-8OElY7EYmPOoDG6ZeZYnKFjamWpklWw > > > > [6.8-rc2 with patches] https://i.postimg.cc/26dPpVsR/v6-8-rc2-with-patches.png > > [6.8-rc2 with patches perf] > > https://mega.nz/file/NxgTAb4L#0KO_WU-svpDw60Y3148RZhELPcUtFg3_VCDzJqSyz34 > > Thanks a lot for these results. There's definitely something strange > going - I'll try to have a detailed look some time next week. > > In the meantime, this is clear: there does not seem to be a regression > between 6.7 and 6.8-rc with the patches, which is what I was > expecting. The fact that aaa2c9a97c22 is so much better could indicate > that until cc478e0b6bdf there was either a bug which turned something > into a no-op - or, the memsets() were acting as some kind of > prefetching hint to the CPU, which in turn caused a significant > reduction in cache misses. I think at this point we're not trying to > fix a regression, because we're on par with 6.7, but trying to make > sense of this information to optimize the code properly without luck > (but not sure if feasible). Hrm.... Your config has lockdep enabled, right? Because cc478e0b6bdf was fixing an issue with lockdep, does your kernel before that commit show some lockdep errors? Because if lockdep encounters an error it usually turns itself off right away, which would explain the improved performance. :-)