Hi Thomas, Jocelyn
Starting with 6.1.128 longterm kernel failed and it seems to be a 'drm
error'
Xorg error :
(==) Log file: "/var/log/Xorg.0.log", Time: Fri Feb 14 17:32:59 2025
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
(==) No Layout section. Using the first Screen section.
(==) No screen section available. Using defaults.
(**) |-->Screen "Default Screen Section" (0)
(**) | |-->Monitor "<default monitor>"
(==) No monitor specified for screen "Default Screen Section".
a default monitor configuration.
(==) Automatically adding devices
(==) Automatically enabling devices
(==) Automatically adding GPU devices
(==) Automatically binding GPU devices
(==) Max clients allowed: 256, resource mask: 0x1fffff
(WW) The directory "/usr/share/fonts/X11/cyrillic" does not exist.
Entry deleted from font path.
(==) FontPath set to:
/usr/share/fonts/X11/misc,
/usr/share/fonts/X11/100dpi/:unscaled,
/usr/share/fonts/X11/75dpi/:unscaled,
/usr/share/fonts/X11/Type1,
/usr/share/fonts/X11/100dpi,
/usr/share/fonts/X11/75dpi,
built-ins
(==) ModulePath set to "/usr/lib/xorg/modules"
(II) The server relies on udev to provide the list of input devices.
no devices become available, reconfigure udev or disable
AutoAddDevices.
(II) Loader magic: 0x561372a83f00
(II) Module ABI versions:
X.Org ANSI C Emulation: 0.4
X.Org Video Driver: 25.2
X.Org XInput driver : 24.4
X.Org Server Extension : 10.0
(++) using VT number 1
(II) systemd-logind: took control of session
/org/freedesktop/login1/session/c13
(II) xfree86: Adding drm device (/dev/dri/card1)
(II) Platform probe for
/sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/drm/card1
(II) systemd-logind: got fd for /dev/dri/card1 226:1 fd 14 paused 0
(EE)
(EE) Backtrace:
(EE) 0: /usr/lib/xorg/Xorg (OsLookupColor+0x139) [0x5613729f7f79]
(EE) 1: /lib/x86_64-linux-gnu/libc.so.6 (__sigaction+0x40)
[0x7f0dad05b050]
(EE) 2: /lib/x86_64-linux-gnu/libc.so.6 (__nss_database_lookup+0xcd19)
[0x7f0dad17m.so.2 (drmGetVe728e67a4]
(EE) 6: /usr/lib/xorg/Xorg (xf86PlatformDeviceCheckBusID+0x1bb)
[0x5613728e6aab]
(EE) 7: /usr/lib/xorg/Xorg (config_fini+0x19b7) [0x5613728e3a97]
(EE) 8: /usr/lib/xorg/Xorg (xf86PlatformMatchDriver+0x1b5)
[0x5613728e0615]
(EE) 9: /usr/lib/xorg/Xorg (xf86BusProbe+0x9) [0x5613728b9329]
(EE) 10: /usr/lib/xorg/Xorg (InitOutput+0x69a) [0x5613728c72ca]
(EE) 11: /usr/lib/xorg/Xorg (InitFonts+0x1ce) [0x56137288866e]
(EE) 12: /lib/x86_64-linux-gnu/libc.so.6 (__libc_init_first+0x8a)
[0x7f0dad04624a]
(EE) 13: /lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main+0x85)
[0x7f0dad046305]
(EE) 14: /usr/lib/xorg/Xorg (_start+0x21) [0x561372871b71]
(EE)
(EE) Segmentation fault at address 0x0
(EE)
server error:
(EE) Caught signal 11 (Segmentation fault). Server aborting
(EE)
(EE)
consult the The X.Org Foundation support
at http://wiki.x.org
for help.
(EE) Please also check the log file at "/var/log/Xorg.0.log" for
additional information.
(EE)
(EE) Server terminated with error (1). Closing log file.
Kernel trace :
------------[ cut here ]------------
BUG: the value to copy was not set!
WARNING: CPU: 10 PID: 6240 at drivers/gpu/drm/drm_ioctl.c:478
drm_copy_field+0xa2/0xb0 [drm]
Modules linked in: xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E)
ipt_REJECT(E) nf_reject_ipv4(E) xt_tcpudp(E) nft_compat(E)
nft_chain_nat(E) nf_tables(E) nls_utf8(E) nfnetlink(E)
cpufreq_userspace(E) cifs(E) l2tp_ppp(E) cifs_arc4(E) l2tp_netlink(E)
cpufreq_ondemand(E) rdma_cm(E) l2tp_core(E) iw_cm(E) ip6_udp_tunnel(E)
udp_tunnel(E) cpufreq_conservative(E) ib_cm(E) pppox(E) ppp_generic(E)
slhc(E) ib_core(E) cifs_md4(E) dns_resolver(E) cpufreq_powersave(E)
xfrm_user(E) xfrm_algo(E) scsi_transport_iscsi(E) nvme_fabrics(E)
team_mode_loadbalance(E) 8021q(E) garp(E) mrp(E) team(E) bridge(E)
stp(E) llc(E) qrtr(E) openvswitch(E) nsh(E) nf_conncount(E) nf_nat(E)
nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) cmac(E)
algif_hash(E) algif_skcipher(E) af_alg(E) bnep(E) binfmt_misc(E)
nls_ascii(E) nls_cp437(E) vfat(E) fat(E) ext4(E) mbcache(E) jbd2(E)
intel_rapl_msr(E) intel_rapl_common(E) nvidia_drm(POE)
snd_hda_codec_hdmi(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E)
nvidia_modeset(POE) intel_uncore_frequency(E)
intel_uncore_frequency_common(E) sb_edac(E) btusb(E) snd_hda_intel(E)
btrtl(E) snd_usb_audio(E) x86_pkg_temp_thermal(E) btbcm(E)
snd_intel_dspcfg(E) snd_intel_sdw_acpi(E) intel_powerclamp(E)
snd_usbmidi_lib(E) btintel(E) snd_rawmidi(E) coretemp(E) btmtk(E)
snd_seq_device(E) snd_hda_codec(E) nvidia(POE) eeepc_wmi(E)
snd_hda_core(E) mc(E) snd_pcsp(E) snd_hwdep(E) rapl(E) asus_wmi(E)
ipmi_ssif(E) battery(E) iTCO_wdt(E) bluetooth(E) intel_cstate(E)
snd_pcm(E) sparse_keymap(E) ledtrig_audio(E) snd_timer(E)
intel_pmc_bxt(E) acpi_ipmi(E) platform_profile(E) intel_uncore(E)
crc16(E) wmi_bmof(E) rfkill(E) mei_me(E) iTCO_vendor_support(E)
ipmi_si(E) snd(E) watchdog(E) mei(E) video(E) soundcore(E)
ipmi_devintf(E) ipmi_msghandler(E) joydev(E) evdev(E) sg(E) msr(E)
parport_pc(E) ppdev(E) nfsd(E) lp(E) parport(E) auth_rpcgss(E)
nfs_acl(E) lockd(E) grace(E) loop(E) efi_pstore(E) configfs(E)
sunrpc(E) ip_tables(E) x_tables(E) autofs4(E)
btrfs(E) blake2b_generic(E) zstd_compress(E) efivarfs(E) raid10(E)
sr_mod(E) cdrom(E) hid_logitech_hidpp(E) hid_plantronics(E)
hid_logitech_dj(E) hid_generic(E) uas(E) usbhid(E) hid(E)
usb_storage(E) raid456(E) async_raid6_recov(E) async_memcpy(E)
async_pq(E) async_xor(E) async_tx(E) xor(E) raid1(E) raid0(E)
raid6_pq(E) dm_mod(E) crc32c_intel(E) md_mod(E) sd_mod(E) ast(E)
drm_vram_helper(E) drm_ttm_helper(E) ghash_clmulni_intel(E) ttm(E)
sha512_ssse3(E) sha256_ssse3(E) drm_kms_helper(E) sha1_ssse3(E) ahci(E)
libahci(E) xhci_pci(E) ehci_pci(E) xhci_hcd(E) ehci_hcd(E)
aesni_intel(E) nvme(E) mxm_wmi(E) igb(E) libata(E) crypto_simd(E)
i2c_i801(E) cryptd(E) drm(E) dca(E) i2c_smbus(E) lpc_ich(E) usbcore(E)
scsi_mod(E) i2c_algo_bit(E) nvme_core(E) t10_pi(E) usb_common(E)
scsi_common(E) i40e(E) wmi(E) button(E)
CPU: 10 PID: 6240 Comm: Xorg Tainted: P OE 6.1.128-amd64
#0
Hardware name: ASUS All Series/X99-WS/IPMI, BIOS 4001 05/28/2019
RIP: 0010:drm_copy_field+0xa2/0xb0 [drm]
Code: 00 00 74 13 49 c7 45 00 00 00 00 00 eb e0 0f 0b b8 f2 ff ff ff eb
d9 48 c7 c7 70 cb 17 c1 c6 05 43 17 07 00 01 e8 2e 09 57 dd <0f> 0b eb
d6 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00
RSP: 0018:ffffbd9941b0fba8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffffbd9941b0fc60 RCX: 0000000000000027
RDX: ffff9ec3bf6a13a8 RSI: 0000000000000001 RDI: ffff9ec3bf6a13a0
RBP: ffff9e8487476800 R08: 0000000000000000 R09: ffffbd9941b0fa20
R10: 0000000000000003 R11: ffff9ec3bff15ee8 R12: ffffffffc1132570
R13: ffffbd9941b0fc80 R14: ffff9e85881d7200 R15: 0000000000000040
FS: 00007f59dd209ac0(0000) GS:ffff9ec3bf680000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056272a2123d0 CR3: 000000010c396005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __warn+0x81/0xd0
? drm_copy_field+0xa2/0xb0 [drm]
? report_bug+0xe6/0x150
? handle_bug+0x41/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? drm_ioctl_flags+0x50/0x50 [drm]
? drm_copy_field+0xa2/0xb0 [drm]
? drm_copy_field+0xa2/0xb0 [drm]
? drm_ioctl_flags+0x50/0x50 [drm]
drm_version+0x73/0xa0 [drm]
drm_ioctl_kernel+0xcd/0x170 [drm]
drm_ioctl+0x233/0x410 [drm]
? drm_ioctl_flags+0x50/0x50 [drm]
__x64_sys_ioctl+0x94/0xd0
do_syscall_64+0x59/0xb0
? vfs_write+0x2b1/0x3f0
? vfs_write+0x2b1/0x3f0
? ksys_write+0x6f/0xf0
? exit_to_user_mode_prepare+0x40/0x1e0
? syscall_exit_to_user_mode+0x22/0x40
? do_syscall_64+0x65/0xb0
? __x64_sys_fcntl+0x94/0xc0
? exit_to_user_mode_prepare+0x40/0x1e0
? syscall_exit_to_user_mode+0x22/0x40
? do_syscall_64+0x65/0xb0
? syscall_exit_to_user_mode+0x22/0x40
? do_syscall_64+0x65/0xb0
? exit_to_user_mode_prepare+0x40/0x1e0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0033:0x7f59dd31ccdb
Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89
44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d
00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
RSP: 002b:00007fff22f5d440 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000056272a211f10 RCX: 00007f59dd31ccdb
RDX: 000056272a211f10 RSI: 00000000c0406400 RDI: 000000000000000e
RBP: 000056272a211f10 R08: 00007f59dd3f1cc0 R09: 0000000000000070
R10: 00007f59dd236378 R11: 0000000000000246 R12: 00000000c0406400
R13: 000000000000000e R14: 000000000000000e R15: 000056272a211510
</TASK>
---[ end trace 0000000000000000 ]---
Maybe my Xorg is too recent (but I hope not) as I don't want to
downgrade Xorg (nor reinstall an older debian version) so I will try
another kernel version...
6.1.128 was build by me and maybe the 'make olddefconfig' from mainline
to 6.1.128 lost too many options (for ex
device-drivers/graphic-support>drm does not exist in menuconfig and I
found AST module directly in device-drivers/graphic-support ...)
6.1.124 exist prepackaged by Debian so it .config should be more
generic so I will test it
Thanks again for help,
Kind regards
Nicolas Baranger
Le 2025-02-14 16:01, Nicolas Baranger a écrit :
Hi Thomas
Thanks again for help
Nicolas, if you find an old kernel version that works correctly, and if
you know how to git-bisect the kernel, it would be helpful if you could
bisect to the commit that introduced the problem.
Ok, I will try to find a working kernel and to git bisect to find the
commit which introduce the problem.
I will start with longterm 6.1.128
Kind regards
Nicolas
Le 2025-02-14 13:36, Thomas Zimmermann a écrit :
Hi Jocelyn
Am 14.02.25 um 10:11 schrieb Jocelyn Falempe: On 13/02/2025 10:27,
Nicolas Baranger wrote: Dear Thomas
Thanks for answer and help.
Yes, due to .date total removal in linux 6.14 (https://github.com/
torvalds/linux/commit/cb2e1c2136f71618142557ceca3a8802e87a44cd
<https:// github.com/torvalds/linux/commit/
cb2e1c2136f71618142557ceca3a8802e87a44cd>) the last DKMS sources are :
https://xba.soartist.net/ast-drm_nba_20250211/nba-dkms/
nba_last_src_20250212/src/ <https://xba.soartist.net/ast-
drm_nba_20250211/nba-dkms/nba_last_src_20250212/src/>
You can also find this sources in directory drivers/gpu/drm/ast_new of
the tarball https://xba.soartist.net/ast-drm_nba_20250211/nba-kernel/
linux-6.14.0.1-ast1.15.1-rc2_nba0_20250212.tar.gz <https://
xba.soartist.net/ast-drm_nba_20250211/nba-kernel/linux-6.14.0.1-
ast1.15.1-rc2_nba0_20250212.tar.gz>
I'm surprised by the fact the in-kernel driver 0.1.0 is more advanced
than Aspeed version 1.15.1 because on my system it has very poor
rendering and is very slow, twinkle is high and had poor colors.
The screen flickering is high and it's like if I was using a very old
cathode ray tube monitor (In fact I'm using a SAMSUNG LCD monitor which
is perfectly functionnal and which display a nice and eyes confortable
picture when using ast 1.15.1 driver or the video output of the Nvidia
GPU ).
My testing system is a test Xeon server with an AST2400 BMC with its
AST VGA card as the main video output (to be able to have a screen on
the BMC KVM) +a discrete NVIDIA GPU I'm using for GPGPU and 3D
rendering with Nvidia prime render offload.
What I constat with embed kernel driver 0.1.0 is that the Xeon
processor is doing the video job for example when watching a video, and
it's not the case with version 1.15.1 even when displaying on the AST
VGA card a vulkan rotating cube (compute by nvidia GPU with nvidia
prime but display by the AST VGA card of the AST2400).
Note that with in-kernel version 0.1.0 it's nearly impossible to make
round the vulkan cube at more than half a round by second where it's
working (very) fine for a 32MB video memory card with version 1.15.1 as
you can see in the video present in the online directory
I'm not developer or kernel developer so be sure that I wouldn't have
done all this work if the in-kernel ast version 0.1.0 was usable
out-of- the-box
Sure you can give me a patch I will test on this server (building
mainline+ast_new yesterday tooks 19 minutes on this server)
PS:
here is a 'git diff linux-6.14.0.1-ast-rc2/drivers/gpu/drm/ast
linux-6.14.0.1-ast-rc2/drivers/gpu/drm/ast_new'
https://xba.soartist.net/ast-drm_nba_20250211/nba-dump/ast-
fullpatch.patch
<https://xba.soartist.net/ast-drm_nba_20250211/nba-dump/
ast-fullpatch.patch>
Diff is about 250+ kb so the 2 drivers seems to have nothing to do with
each others...
Thanks again for help
Kind regards
Nicolas
Le 2025-02-13 08:57, Thomas Zimmermann a écrit :
Hi Nicolas
Am 12.02.25 um 19:58 schrieb Nicolas Baranger: Dear maintener
That's mostly me and Jocelyn.
I did include ast-drm driver version 1.15.1 (in replacement of version
0.1.0) on the new mainline kernel too (6.14.0-rc2) and I issue a new
dkms patch
Last DKMS patch had been sucessfully tested on mainline.
And last ast.ko version 1.15.1 included in linux tree had also been
sucessfully tested
Online directory is updated with :
- new DKMS patch
- new DKMS srouces
- new DKMS debian package
- new tarball of mainline included ast_new ported in kernel tree
- new kernel debian package (mainline with ast_new)
NB: online directory is here: https://xba.soartist.net/ast-
drm_nba_20250211/ <https://xba.soartist.net/ast-drm_nba_20250211/>
Please let me know what I should do to see this change in linux-next
I'm having a little trouble with figuring out which of the many driver
sources is the relevant one. Am I correct to assume it's the one at
https://xba.soartist.net/ast-drm_nba_20250211/nba-dkms/
nba_last_src_20250212/src/ <https://xba.soartist.net/ast-
drm_nba_20250211/nba-dkms/nba_last_src_20250212/src/>
About that driver: Although the official driver reports an ancient
version number, it is an up-to-date driver. It is actually more up-to-
date than Aspeed's package. Both drivers share source code and a few
years ago there was an effort to bring the kernel's driver up to the
same feature set. Since then, the kernel's driver has been updated,
reworked and improved.
About the performance: From what I can tell, the only significant
difference in these drivers is memory management. Your ast_new driver
uses an older algorithm that we replaced quite a few releases ago. The
old version was unreliable on systems with little video memory, so we
had to replace it. I don't know why the new code should be slower
though.
Regarding the performances of ast driver, I remember doing profiling
some times ago, and when running glxgears (with llvmpipe), 65% of the
CPU time was wasted in page fault
(https://elixir.bootlin.com/linux/v6.13.2/source/drivers/gpu/drm/drm_gem_shmem_helper.c#L534)
But as this driver is mostly used for console/basic desktop usage, I
didn't investigate more.
Now that's an interesting find. The GEM shmem helpers vunmap ASAP to
make pages swappable, I think. IIRC there was a patchset circulating
that implements a shrinker [1] for shmem helpers. With that in place,
we'd only update the page tables if necessary. If it's really that
easy, we should try to merge that.
[1]
https://elixir.bootlin.com/linux/v6.13.2/source/include/linux/shrinker.h#L82
If I remember correctly, the switch to shmem, is because some devices
have only 16MB of memory, and 1920x1200x32bits takes ~9MB, so it's not
possible to have double buffering in this case. (And this is required
by most desktop environment).
Exactly. There are ast devices with as little as 8 MiB of video memory.
But FullHD@32bit already requires ~8 MiB. Atomic modesetting with the
old memory manager requires overcommitting by a factor of 3 (to ~24
MiB) to account for all corner cases. Hence we sometimes had failed
display updates with lower-end devices.
The switch to shmem was done with "f2fa5a99ca81c drm/ast: Convert ast
to SHMEM", and introduced in v6.2. So maybe if you can try with a v6.1
kernel, using the built-in ast driver and report if it has better
performances.
Nicolas, if you find an old kernel version that works correctly, and if
you know how to git-bisect the kernel, it would be helpful if you could
bisect to the commit that introduced the problem.
Best regards
Thomas
Best regards,