Hi all, Greetings! I'm using a laptop with Ryzen 7 Pro 3700U APU with xorg. However, I'm facing a high latency on those GPU-accelerated programs. My system config: --- Kernel: 5.6.2-arch1-2 x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: i3 4.18 dm: startx Distro: Arch Linux CPU: Topology: Quad Core model: AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx bits: 64 type: MT MCP arch: Zen+ rev: 1 L2 cache: 2048 KiB flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 36751 Speed: 1183 MHz min/max: 1400/2300 MHz Core speeds (MHz): 1: 1177 2: 1176 3: 1177 4: 1177 5: 1178 6: 1177 7: 2931 8: 2944 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Picasso vendor: Lenovo driver: amdgpu v: kernel bus ID: 05:00.0 chip ID: 1002:15d8 Display: server: X.Org 1.20.8 driver: amdgpu resolution: 3840x2160~60Hz OpenGL: renderer: AMD RAVEN (DRM 3.36.0 5.6.2-arch1-2 LLVM 9.0.1) v: 4.6 Mesa 20.0.4 direct render: Yes --- xorg config: --- Section "Device" Identifier "AMD" Driver "amdgpu" Option "TearFree" "true" Option "VariableRefresh" "true" EndSection --- When I'm using GPU-accelerated terminal emulators (e.g. alacritty and kitty), the initial window rendering time is significantly high. For instance, `time alacritty -e false` gave 0.24s user 0.04s system 54% cpu 0.531 total, which means the window rendering costs ~500ms. This is weird for a GPU-accelerated application. In comparison, a laptop with Intel Core i7-1065G7 using the same config could render the window of this kind of applications within ~130ms, which is snappy. For other programs, like the picom compositor, if I use "glx" as its backend, the whole rendering is extremely laggy. If I use "xrender" instead (which means no hw-acceleration I guess), it runs snappy. After some basic profiling I thought this is a amdgpu-xorg related issue. I used callgrind to record the trace of `alacritty -e false` with xorg-amdgpu, xorg-modesetting and wayland and got different results (I will just show the first few lines, which is the hot spot in the program I think): xorg-amdgpu: --- 280,109,580 ???:0x00000000008cd930 [/usr/lib/dri/radeonsi_dri.so] 107,423,760 ???:0x00000000008cd9e0 [/usr/lib/dri/radeonsi_dri.so] 99,890,145 ???:do_lookup_x [/usr/lib/ld-2.31.so] 71,160,050 ???:0x0000000000042210 [/usr/lib/libGLX_mesa.so.0.0.0] 54,633,060 ???:0x00000000008acca0 [/usr/lib/dri/radeonsi_dri.so] 50,114,882 ???:0x00000000000251c0 [/usr/lib/libfontconfig.so.1.12.0] 49,050,274 ???:0x000000000001d8b0 [/usr/lib/libfontconfig.so.1.12.0] 45,988,140 ???:0x00000000008a7600 [/usr/lib/dri/radeonsi_dri.so] 40,037,150 ???:0x000000000001d790 [/usr/lib/libfontconfig.so.1.12.0] 34,129,137 ???:__memcpy_avx_unaligned_erms [/usr/lib/libc-2.31.so] 30,595,302 ???:_dl_lookup_symbol_x [/usr/lib/ld-2.31.so] --- xorg-modesetting (similar results): --- 281,058,300 ???:0x00000000008cd930 [/usr/lib/dri/radeonsi_dri.so] 107,787,600 ???:0x00000000008cd9e0 [/usr/lib/dri/radeonsi_dri.so] 99,890,145 ???:do_lookup_x [/usr/lib/ld-2.31.so] 71,160,050 ???:0x0000000000042210 [/usr/lib/libGLX_mesa.so.0.0.0] 54,818,100 ???:0x00000000008acca0 [/usr/lib/dri/radeonsi_dri.so] 50,112,204 ???:0x00000000000251c0 [/usr/lib/libfontconfig.so.1.12.0] 49,050,274 ???:0x000000000001d8b0 [/usr/lib/libfontconfig.so.1.12.0] 46,143,900 ???:0x00000000008a7600 [/usr/lib/dri/radeonsi_dri.so] 40,037,150 ???:0x000000000001d790 [/usr/lib/libfontconfig.so.1.12.0] 35,147,037 ???:__memcpy_avx_unaligned_erms [/usr/lib/libc-2.31.so] 30,595,302 ???:_dl_lookup_symbol_x [/usr/lib/ld-2.31.so] --- wayland: --- 69,189,362 ???:do_lookup_x [/usr/lib/ld-2.31.so] 50,111,905 ???:0x00000000000251c0 [/usr/lib/libfontconfig.so.1.12.0] 49,050,274 ???:0x000000000001d8b0 [/usr/lib/libfontconfig.so.1.12.0] 40,946,389 ???:__memcpy_avx_unaligned_erms [/usr/lib/libc-2.31.so] 40,037,150 ???:0x000000000001d790 [/usr/lib/libfontconfig.so.1.12.0] 37,270,021 ???:fread [/usr/lib/libc-2.31.so] 30,019,946 ???:_dl_lookup_symbol_x [/usr/lib/ld-2.31.so] 23,285,006 /build/rust/src/rustc-1.42.0-src/src/libcore/iter/range.rs:core::iter::range::<impl core::iter::traits::iterator::Iterator for core::ops::range::Range<A>>::next [/home/chen/Downloads/alacritty-master/target/debug/alacritty] 22,951,556 ???:_IO_file_xsgetn [/usr/lib/libc-2.31.so] 21,423,383 ???:_int_malloc [/usr/lib/libc-2.31.so] 20,091,007 /build/rust/src/rustc-1.42.0-src/src/libcore/slice/mod.rs:<core::ops::range::Range<usize> as core::slice::SliceIndex<[T]>>::index [/home/chen/Downloads/alacritty-master/target/debug/alacritty] 19,161,509 ???:__strchr_avx2 [/usr/lib/libc-2.31.so] ... 2,936,304 ???:0x0000000000632910 [/usr/lib/dri/radeonsi_dri.so] --- Sorry I am not using a source based distro so the debug symbols are not available. What I could see here is that when I'm running GPU-accelerated programs under xorg a same symbol in radeonsi_dri is overwhelmingly called. However, I do not know where the issue is. I did not see many similar issues so if this is related to xorg and amdgpu, this might be model-specific. I don't know if this is a correct way to use this mailing list. If you have any advice, please let me know. I am also happy to provide more information if needed. Thanks! _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx