RE: [PATCH] drm/i915/gt: Add a delay to let engine resumes correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andi,

> Hi Nitin,
> 
> On Mon, Feb 24, 2025 at 12:01:04PM +0530, Nitin Gote wrote:
> > Sometimes engine reset fails because the engine resumes from an
> > incorrect RING_HEAD. Engine head failed to set to zero even after
> > writing into it. This is a timing issue and we experimented different
> > values and found out that 20ms delay works best based on testing.
> >
> > So, add a 20ms delay to let engine resumes from correct RING_HEAD.
> >
> > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13650
> > Signed-off-by: Nitin Gote <nitin.r.gote@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_ring_submission.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > index 6e9977b2d180..5576f000e965 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_ring_submission.c
> > @@ -365,6 +365,13 @@ static void reset_prepare(struct intel_engine_cs
> *engine)
> >  			     ENGINE_READ_FW(engine, RING_HEAD),
> >  			     ENGINE_READ_FW(engine, RING_TAIL),
> >  			     ENGINE_READ_FW(engine, RING_START));
> > +		/*
> > +		 * Sometimes engine head failed to set to zero even after writing
> into it.
> > +		 * Use 20ms delay to let engine resumes from correct
> RING_HEAD.
> > +		 * Experimented different values and determined that 20ms
> works best
> > +		 * based on testing.
> > +		 */
> > +		mdelay(20);
> 
> Is there any extremely strong reason for using mdelay here, rather than any other
> delay function?
> 
> Andi

Yes. Firstly I checked with udelay(20000) and while testing a test for 1000 times, 
a couple of times got an issue of "BUG: scheduling while atomic: i915_selftest/10313/0x00000201" from the scheduler.
Adding here a failure stack trace in case you want to take a look.

And that's why I used mdelay(20), where I have not seen this issue. I have tested with mdelay(20), thousands of times and it worked.

stack trace:
i915: Running intel_hangcheck_live_selftests/igt_reset_nop_engine
BUG: scheduling while atomic: i915_selftest/10313/0x00000201
1 lock held by i915_selftest/10313:
 #0: ffff888102e011b0 (&dev->mutex){....}-{3:3}, at: __device_driver_lock+0x43/0x60
 CPU: 4 UID: 0 PID: 10313 Comm: i915_selftest Tainted: G     U             6.14.0-rc3-ci-drm-16154+ #1
 Tainted: [U]=USER
 Hardware name: LENOVO 10AGS00601/SHARKBAY, BIOS FBKT34AUS 04/24/2013
 Call Trace:
  <TASK>
  dump_stack_lvl+0xa0/0xc0
  dump_stack+0x10/0x20
  __schedule_bug+0x6c/0x90
  __schedule+0x1a04/0x21a0
  ? lock_acquire+0xc7/0x300
  ? find_held_lock+0x31/0x90
  ? lock_release+0xd1/0x2a0
  schedule+0x40/0x130
  schedule_timeout+0x82/0x100
  ? __pfx_process_timeout+0x10/0x10
  ? msleep+0x13/0x50
  msleep+0x3b/0x50
  reset_prepare+0x10b/0x1d0 [i915]
  reset_prepare_engine+0x31/0x40 [i915]
  __intel_engine_reset_bh+0xac/0x230 [i915]
  ? intel_engine_reset+0x21/0x60 [i915]
  intel_engine_reset+0x34/0x60 [i915]
  igt_reset_nop_engine+0x22e/0x4e0 [i915]
  __i915_subtests+0xb3/0x230 [i915]
  ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915]
  ? __pfx___intel_gt_live_setup+0x10/0x10 [i915]
  intel_hangcheck_live_selftests+0xc0/0x110 [i915]
  __run_selftests+0xd4/0x1d0 [i915]
  ? acpi_dev_found+0x68/0x80
  i915_live_selftests+0x53/0x90 [i915]
  i915_pci_probe+0x118/0x210 [i915]
  local_pci_probe+0x4b/0xb0
  pci_device_probe+0xe7/0x270
  really_probe+0xfb/0x390
  __driver_probe_device+0x8a/0x170
  driver_probe_device+0x23/0xb0
  __driver_attach+0xc7/0x190
  ? __pfx___driver_attach+0x10/0x10
  bus_for_each_dev+0x7f/0xd0
  driver_attach+0x1e/0x30
  bus_add_driver+0x146/0x280
  driver_register+0x64/0x130
  __pci_register_driver+0x7d/0x90
  i915_pci_register_driver+0x23/0x30 [i915]
  i915_init+0x37/0x120 [i915]
  ? __pfx_i915_init+0x10/0x10 [i915]
  do_one_initcall+0x63/0x3d0
  do_init_module+0x99/0x2b0
  load_module+0x2313/0x27d0
  init_module_from_file+0x9c/0xe0
  ? init_module_from_file+0x9c/0xe0
  idempotent_init_module+0x1a5/0x2b0
  __x64_sys_finit_module+0x63/0xc0
  x64_sys_call+0x1b6f/0x2140
  do_syscall_64+0x8f/0x170
  ? syscall_exit_to_user_mode+0x11a/0x300
  ? do_syscall_64+0x9b/0x170
  ? __fput+0x1cb/0x2f0
  ? syscall_exit_to_user_mode+0x11a/0x300
  ? do_syscall_64+0x9b/0x170
  ? ksys_read+0x70/0xf0
  ? syscall_exit_to_user_mode+0x11a/0x300
  ? do_syscall_64+0x9b/0x170
  ? seq_read_iter+0x216/0x470
  ? lock_release+0xd1/0x2a0
  ? __mutex_unlock_slowpath+0x41/0x300
  ? mutex_unlock+0x12/0x20
  ? seq_read_iter+0x216/0x470
  ? vfs_read+0x139/0x360
  ? vfs_read+0x139/0x360
  ? ksys_read+0x70/0xf0
  ? syscall_exit_to_user_mode+0x11a/0x300
  ? do_syscall_64+0x9b/0x170
  ? sysvec_apic_timer_interrupt+0x56/0xb0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7ab0b172725d

- Nitin


> 
> >  		if (!stop_ring(engine)) {
> >  			drm_err(&engine->i915->drm,
> >  				"failed to set %s head to zero "
> > --
> > 2.25.1




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux