Re: [CI 11/15] drm/i915/huc: track delayed HuC load with a fence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/4/2022 5:38 PM, Ceraolo Spurio, Daniele wrote:


On 11/4/2022 4:26 PM, Brian Norris wrote:
Hi,

On Wed, Oct 19, 2022 at 10:54:34AM +0100, Tvrtko Ursulin wrote:
Don't know if this is real or not yet, hit it while running selftests a bit. Something to keep an eye on.

[ 2928.370577] ODEBUG: init destroyed (active state 0) object type: i915_sw_fence hint: sw_fence_dummy_notify+0x0/0x10 [i915] [ 2928.370903] WARNING: CPU: 2 PID: 1113 at lib/debugobjects.c:502 debug_print_object+0x6b/0x90 [ 2928.370984] Modules linked in: i915(+) drm_display_helper drm_kms_helper netconsole cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp snd_seq_midi snd_seq_midi_event coretemp snd_rawmidi btusb btrtl btbcm kvm_intel btmtk btintel ath10k_pci snd_seq kvm ath10k_core bluetooth snd_timer rapl intel_cstate snd_seq_device input_leds mac80211 ecdh_generic libarc4 ath snd ecc serio_raw intel_wmi_thunderbolt at24 soundcore cfg80211 mei_me intel_xhci_usb_role_switch mei ideapad_laptop intel_pch_thermal platform_profile sparse_keymap acpi_pad sch_fq_codel msr efi_pstore ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha512_ssse3 aesni_intel prime_numbers crypto_simd atkbd drm_buddy cryptd vivaldi_fmap r8169 ttm i2c_i801 i2c_smbus cec realtek xhci_pci syscopyarea ahci [ 2928.371145]  xhci_pci_renesas sysfillrect sysimgblt libahci fb_sys_fops video wmi [last unloaded: drm_kms_helper] [ 2928.371489] CPU: 2 PID: 1113 Comm: modprobe Tainted: G U  W          6.1.0-rc1 #196 [ 2928.371550] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
[ 2928.371615] RIP: 0010:debug_print_object+0x6b/0x90
[ 2928.371664] Code: 49 89 c1 8b 43 10 83 c2 01 48 c7 c7 e8 be d6 bb 8b 4b 14 89 15 ca be b4 02 4c 8b 45 00 48 8b 14 c5 40 56 a8 bb e8 ec 5b 60 00 <0f> 0b 83 05 28 5a 3e 01 01 48 83 c4 08 5b 5d c3 83 05 1a 5a 3e 01
[ 2928.371782] RSP: 0018:ffff9ed841607a18 EFLAGS: 00010286
[ 2928.371841] RAX: 0000000000000000 RBX: ffff9208116a1d48 RCX: 0000000000000000 [ 2928.371909] RDX: 0000000000000001 RSI: ffffffffbbd277d2 RDI: 00000000ffffffff [ 2928.372024] RBP: ffffffffc176a540 R08: 0000000000000000 R09: ffffffffbc07a1e0 [ 2928.372128] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9208122da830 [ 2928.372192] R13: ffff92080089b000 R14: ffff9208122da770 R15: 0000000000000000 [ 2928.372259] FS:  00007f53e7617c40(0000) GS:ffff92086e500000(0000) knlGS:0000000000000000
[ 2928.372365] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2928.372425] CR2: 000055cd28b33070 CR3: 0000000110dbd006 CR4: 00000000003706e0
[ 2928.372526] Call Trace:
[ 2928.372568]  <TASK>
[ 2928.372614]  ? intel_guc_hang_check+0xb0/0xb0 [i915]
[ 2928.373001]  __i915_sw_fence_init+0x2b/0x50 [i915]
[ 2928.373374]  intel_huc_init_early+0x75/0xb0 [i915]
[ 2928.373868]  intel_uc_init_early+0x4e/0x210 [i915]
[ 2928.374241]  intel_gt_common_init_early+0x16f/0x180 [i915]
[ 2928.374718]  intel_root_gt_init_early+0x49/0x60 [i915]
[ 2928.375074]  i915_driver_probe+0x917/0xed0 [i915]
...

Did you track this down? Or consider reverting? This is tripping me up

No. I didn't manage to repro locally after Tvrtko reported it (I run the full selftest suite twice on both ADL-S and DG2 with the debug config enabled), so I was keeping an eye out as suggested to see if it popped out again. If you can repro this consistently, can you share your setup info? What platform you're running on, if you're using the latest drm-tip, any non-default params you're using, etc. Dmesg would also be useful to see if there are other errors before this one.


Just to further clarify, this issue is also not showing up in our CI runs (which do have both the DEBUG_OBJECTS kconfigs you pointed out enabled), hence why I'm suspecting that this is only happening on specific setups, potentially due to a different kconfig or modparam being involved.

Daniele

Thanks,
Daniele

on drm-tip now when running selftests with CONFIG_DEBUG_OBJECTS=y /
CONFIG_DRM_I915_SW_FENCE_DEBUG_OBJECTS=y. It means I can't actually run
any subsequent tests, because of the kernel taint.

Brian





[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux