Re: regression/bisected commit 773688a6cb24b0b3c2ba40354d883348a2befa38 make my system completely unusable under high load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2 Feb 2024 at 17:35, Mikhail Gavrilov
<mikhail.v.gavrilov@xxxxxxxxx> wrote:
>
> On Fri, Feb 2, 2024 at 2:00 PM Marco Elver <elver@xxxxxxxxxx> wrote:
> >
> > > Maybe we can try something else?
> >
> > That's strange - the patches at [1] definitely revert the change you
> > bisected to. It's possible there is some other strange side-effect. (I
> > assume that you are still running all this with a KASAN kernel.)
>
> Yes. build .config not changed between kernel builds.
>
> > Just so I understand it right:
> > You say before commit cc478e0b6bdffd20561e1a07941a65f6c8962cab the
> > game's FPS were good. But that is strange, because at that point we're
> > already doing stackdepot refcounting, i.e. after commit
> > 773688a6cb24b0b3c2ba40354d883348a2befa38 which you reported as the
> > initial performance regression. The patches at [2] fixed that problem.
> >
> > So now it's unclear to me how the simple change in
> > cc478e0b6bdffd20561e1a07941a65f6c8962cab causes the performance
> > problem, when in fact this is already with KASAN stackdepot
> > refcounting enabled but without the performance fixes from [1] and
> > [2].
> >
> > [2] https://lore.kernel.org/all/20240118110216.2539519-2-elver@xxxxxxxxxx/
> >
> > My questions now would be:
> > - What was the game's FPS in the last stable kernel (v6.7)?
>
> [6.7] - 83 FPS - 13060 frames during benchmark.
>
> > - Can you collect another set of performance profiles between good and
> > bad? Maybe it would show where the time in the kernel is spent.
>
> Yes,
> please look at [aaa2c9a97c22 perf] and [cc478e0b6bdf perf]
>
> > perf diff perf-git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.data perf-git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.data
> No kallsyms or vmlinux with build-id
> de2a040f828394c5ce34802389239c2a0668fcc7 was found
> No kallsyms or vmlinux with build-id
> 33ab1cd545f96f5ffc2a402a4c4cfa647fd727a0 was found
> # Event 'cycles:P'
> #
> # Baseline  Delta Abs  Shared Object
> Symbol
> # ........  .........  ..............................................
> .....................................................................................................................................................................................
> #
>     48.48%    +21.75%  [kernel.kallsyms]
> [k] 0xffffffff860065c0
>     36.13%    -16.49%  ShadowOfTheTombRaider
> [.] 0x00000000001d7f5e
>      4.43%     -2.10%  libvulkan_radeon.so
> [.] 0x000000000006b870
>      3.28%     -0.63%  libcef.so
> [.] 0x00000000021720e0
>      1.11%     -0.53%  libc.so.6
> [.] syscall
>      0.65%     -0.24%  libc.so.6
> [.] __memmove_avx512_unaligned_erms
>      0.31%     -0.14%  libc.so.6
> [.] __memset_avx512_unaligned_erms
>      0.26%     -0.13%  libm.so.6
> [.] __powf_fma
>      0.20%     -0.10%  [amdgpu]
> [k] amdgpu_bo_placement_from_domain
>      0.22%     -0.09%  [amdgpu]
> [k] amdgpu_vram_mgr_compatible
>      0.67%     -0.09%  armada-drm_dri.so
> [.] 0x00000000000192b4
>      0.15%     -0.08%  libc.so.6
> [.] sem_post@GLIBC_2.2.5
>      0.16%     -0.07%  [amdgpu]
> [k] amdgpu_vm_bo_update
>      0.14%     -0.07%  [amdgpu]
> [k] amdgpu_bo_list_entry_cmp
>      0.13%     -0.06%  libm.so.6
> [.] powf@GLIBC_2.2.5
>      0.14%     -0.06%  libMangoHud.so
> [.] 0x000000000001c4c0
>      0.10%     -0.06%  libc.so.6
> [.] __futex_abstimed_wait_common
>      0.19%     -0.05%  libGLESv2.so
> [.] 0x0000000000160a11
>      0.07%     -0.04%  libc.so.6
> [.] __new_sem_wait_slow64.constprop.0
>      0.10%     -0.04%  radeonsi_dri.so
> [.] 0x0000000000019454
>      0.05%     -0.03%  [amdgpu]
> [k] optc1_get_position
>      0.05%     -0.03%  libc.so.6
> [.] sem_wait@@GLIBC_2.34
>      0.22%     -0.02%  [vdso]
> [.] 0x00000000000005a0
>      0.10%     -0.02%  libc.so.6
> [.] __memcmp_evex_movbe
>                +0.02%  [JIT] tid 8383
> [.] 0x00007f2de0052823
>
>
> > - Could it be an inconclusive bisection?
>
> I checked twice:
> [6.7] - 83 FPS
> [aaa2c9a97c22] - 111 FPS
> [cc478e0b6bdf] - 64 FPS
> [6.8-rc2 with patches] - 82 FPS
>
>
> [6.7] https://i.postimg.cc/15yyzZBr/v6-7.png
> [6.7 perf] https://mega.nz/file/QwJ3hbob#RslLFVYgz1SWMcPR3eF9uEpFuqxdgkwXSatWts-1wVA
>
> [aaa2c9a97c22] https://i.postimg.cc/Sxv4VYhg/git-aaa2c9a97c22af5bf011f6dd8e0538219b45af88.png
> [aaa2c9a97c22 perf]
> https://mega.nz/file/dwQxha4J#2_nBF6uNzY11VX-T-Lr_-60WIMrbl1YEvPgY4CuXqEc
>
> [cc478e0b6bdf] https://i.postimg.cc/W3cQfMfw/git-cc478e0b6bdffd20561e1a07941a65f6c8962cab.png
> [cc478e0b6bdf perf]
> https://mega.nz/file/hl5kwLTC#_4Fg1KBXCnQ-8OElY7EYmPOoDG6ZeZYnKFjamWpklWw
>
> [6.8-rc2 with patches] https://i.postimg.cc/26dPpVsR/v6-8-rc2-with-patches.png
> [6.8-rc2 with patches perf]
> https://mega.nz/file/NxgTAb4L#0KO_WU-svpDw60Y3148RZhELPcUtFg3_VCDzJqSyz34

Thanks a lot for these results. There's definitely something strange
going - I'll try to have a detailed look some time next week.

In the meantime, this is clear: there does not seem to be a regression
between 6.7 and 6.8-rc with the patches, which is what I was
expecting. The fact that aaa2c9a97c22 is so much better could indicate
that until cc478e0b6bdf there was either a bug which turned something
into a no-op - or, the memsets() were acting as some kind of
prefetching hint to the CPU, which in turn caused a significant
reduction in cache misses. I think at this point we're not trying to
fix a regression, because we're on par with 6.7, but trying to make
sense of this information to optimize the code properly without luck
(but not sure if feasible). Hrm....





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux