On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:
On 11/4/2022 4:26 PM, Brian Norris wrote:
Hi,
On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
Don't know if this is real or not yet, hit it while running
selftests a bit. Something to keep an eye on.
[ 2928.370577] ODEBUG: init destroyed (active state 0) object type:
i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915]
[ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502
debug_print_object+0x6b/0x90
[ 2928.370984] Modules linked in: i915(+) drm_display_helper
drm_kms_helper netconsole cmac algif_hash algif_skcipher af_alg bnep
nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec
snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp snd_seq_midi
snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel
btmtk btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer
rapl intel_cstate snd_seq_device input_leds mac80211 ecdh_generic
libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 soundcore
cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop
intel_pch_thermal platform_profile sparse_keymap acpi_pad
sch_fq_codel msr efi_pstore ip_tables x_tables autofs4
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3
aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd
vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci
syscopyarea ahci
[ 2928.371145] xhci_pci_renesas sysfillrect sysimgblt libahci
fb_sys_fops video wmi [last unloaded: drm_kms_helper]
[ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U
W 6.1.0-rc1 #196
[ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS
DCCN34WW(V2.03) 12/01/2015
[ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
[ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb
8b 4b 14 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec
5b 60 00 <0f> 0b 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a
5a 3e 01
[ 2928.371782] RSP: 0018:ffff9ed841607a18 EFLAGS: 00010286
[ 2928.371841] RAX: 0000000000000000 RBX: ffff9208116a1d48 RCX:
0000000000000000
[ 2928.371909] RDX: 0000000000000001 RSI: ffffffffbbd277d2 RDI:
00000000ffffffff
[ 2928.372024] RBP: ffffffffc176a540 R08: 0000000000000000 R09:
ffffffffbc07a1e0
[ 2928.372128] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff9208122da830
[ 2928.372192] R13: ffff92080089b000 R14: ffff9208122da770 R15:
0000000000000000
[ 2928.372259] FS: 00007f53e7617c40(0000) GS:ffff92086e500000(0000)
knlGS:0000000000000000
[ 2928.372365] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2928.372425] CR2: 000055cd28b33070 CR3: 0000000110dbd006 CR4:
00000000003706e0
[ 2928.372526] Call Trace:
[ 2928.372568] <TASK>
[ 2928.372614] ? intel_guc_hang_check+0xb0/0xb0 [i915]
[ 2928.373001] __i915_sw_fence_init+0x2b/0x50 [i915]
[ 2928.373374] intel_huc_init_early+0x75/0xb0 [i915]
[ 2928.373868] intel_uc_init_early+0x4e/0x210 [i915]
[ 2928.374241] intel_gt_common_init_early+0x16f/0x180 [i915]
[ 2928.374718] intel_root_gt_init_early+0x49/0x60 [i915]
[ 2928.375074] i915_driver_probe+0x917/0xed0 [i915]
...
Did you track this down? Or consider reverting? This is tripping me up
No. I didn't manage to repro locally after Tvrtko reported it (I run
the full selftest suite twice on both ADL-S and DG2 with the debug
config enabled), so I was keeping an eye out as suggested to see if it
popped out again. If you can repro this consistently, can you share
your setup info? What platform you're running on, if you're using the
latest drm-tip, any non-default params you're using, etc. Dmesg would
also be useful to see if there are other errors before this one.
Just to further clarify, this issue is also not showing up in our CI
runs (which do have both the DEBUG_OBJECTS kconfigs you pointed out
enabled), hence why I'm suspecting that this is only happening on
specific setups, potentially due to a different kconfig or modparam
being involved.
Daniele
Thanks,
Daniele
on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.
Brian