Hi Andi, > Hi Nitin, > > On Mon, Feb 24, 2025 at 12:01:04PM +0530, Nitin Gote wrote: > > Sometimes engine reset fails because the engine resumes from an > > incorrect RING_HEAD. Engine head failed to set to zero even after > > writing into it. This is a timing issue and we experimented different > > values and found out that 20ms delay works best based on testing. > > > > So, add a 20ms delay to let engine resumes from correct RING_HEAD. > > > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13650 > > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c > > b/drivers/gpu/drm/i915/gt/intel_ring_submission.c > > index 6e9977b2d180..5576f000e965 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c > > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c > > @@ -365,6 +365,13 @@ static void reset_prepare(struct intel_engine_cs > *engine) > > ENGINE_READ_FW(engine, RING_HEAD), > > ENGINE_READ_FW(engine, RING_TAIL), > > ENGINE_READ_FW(engine, RING_START)); > > + /* > > + * Sometimes engine head failed to set to zero even after writing > into it. > > + * Use 20ms delay to let engine resumes from correct > RING_HEAD. > > + * Experimented different values and determined that 20ms > works best > > + * based on testing. > > + */ > > + mdelay(20); > > Is there any extremely strong reason for using mdelay here, rather than any other > delay function? > > Andi Yes. Firstly I checked with udelay(20000) and while testing a test for 1000 times, a couple of times got an issue of "BUG: scheduling while atomic: i915_selftest/10313/0x00000201" from the scheduler. Adding here a failure stack trace in case you want to take a look. And that's why I used mdelay(20), where I have not seen this issue. I have tested with mdelay(20), thousands of times and it worked. stack trace: i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine BUG: scheduling while atomic: i915_selftest/10313/0x00000201 1 lock held by i915_selftest/10313: #0: ffff888102e011b0 (&dev->mutex){....}-{3:3}, at: __device_driver_lock+0x43/0x60 CPU: 4 UID: 0 PID: 10313 Comm: i915_selftest Tainted: G U 6.14.0-rc3-ci-drm-16154+ #1 Tainted: [U]=USER Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013 Call Trace: <TASK> dump_stack_lvl+0xa0/0xc0 dump_stack+0x10/0x20 __schedule_bug+0x6c/0x90 __schedule+0x1a04/0x21a0 ? lock_acquire+0xc7/0x300 ? find_held_lock+0x31/0x90 ? lock_release+0xd1/0x2a0 schedule+0x40/0x130 schedule_timeout+0x82/0x100 ? __pfx_process_timeout+0x10/0x10 ? msleep+0x13/0x50 msleep+0x3b/0x50 reset_prepare+0x10b/0x1d0 [i915] reset_prepare_engine+0x31/0x40 [i915] __intel_engine_reset_bh+0xac/0x230 [i915] ? intel_engine_reset+0x21/0x60 [i915] intel_engine_reset+0x34/0x60 [i915] igt_reset_nop_engine+0x22e/0x4e0 [i915] __i915_subtests+0xb3/0x230 [i915] ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915] ? __pfx___intel_gt_live_setup+0x10/0x10 [i915] intel_hangcheck_live_selftests+0xc0/0x110 [i915] __run_selftests+0xd4/0x1d0 [i915] ? acpi_dev_found+0x68/0x80 i915_live_selftests+0x53/0x90 [i915] i915_pci_probe+0x118/0x210 [i915] local_pci_probe+0x4b/0xb0 pci_device_probe+0xe7/0x270 really_probe+0xfb/0x390 __driver_probe_device+0x8a/0x170 driver_probe_device+0x23/0xb0 __driver_attach+0xc7/0x190 ? __pfx___driver_attach+0x10/0x10 bus_for_each_dev+0x7f/0xd0 driver_attach+0x1e/0x30 bus_add_driver+0x146/0x280 driver_register+0x64/0x130 __pci_register_driver+0x7d/0x90 i915_pci_register_driver+0x23/0x30 [i915] i915_init+0x37/0x120 [i915] ? __pfx_i915_init+0x10/0x10 [i915] do_one_initcall+0x63/0x3d0 do_init_module+0x99/0x2b0 load_module+0x2313/0x27d0 init_module_from_file+0x9c/0xe0 ? init_module_from_file+0x9c/0xe0 idempotent_init_module+0x1a5/0x2b0 __x64_sys_finit_module+0x63/0xc0 x64_sys_call+0x1b6f/0x2140 do_syscall_64+0x8f/0x170 ? syscall_exit_to_user_mode+0x11a/0x300 ? do_syscall_64+0x9b/0x170 ? __fput+0x1cb/0x2f0 ? syscall_exit_to_user_mode+0x11a/0x300 ? do_syscall_64+0x9b/0x170 ? ksys_read+0x70/0xf0 ? syscall_exit_to_user_mode+0x11a/0x300 ? do_syscall_64+0x9b/0x170 ? seq_read_iter+0x216/0x470 ? lock_release+0xd1/0x2a0 ? __mutex_unlock_slowpath+0x41/0x300 ? mutex_unlock+0x12/0x20 ? seq_read_iter+0x216/0x470 ? vfs_read+0x139/0x360 ? vfs_read+0x139/0x360 ? ksys_read+0x70/0xf0 ? syscall_exit_to_user_mode+0x11a/0x300 ? do_syscall_64+0x9b/0x170 ? sysvec_apic_timer_interrupt+0x56/0xb0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7ab0b172725d - Nitin > > > if (!stop_ring(engine)) { > > drm_err(&engine->i915->drm, > > "failed to set %s head to zero " > > -- > > 2.25.1