Re: [Intel-gfx] [PATCH v3 0/2] Fix error propagation amongst request

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Andi,

After applying these two patches, deadlock is being detected in the call stack below. Please review whether the patch to update the intel_context_migrate_copy() part affected the deadlock.


https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_114451v1/bat-dg2-8/igt@i915_module_load@xxxxxxxxx#dmesg-warnings1037

<4> [33.070967] ============================================
<4> [33.070968] WARNING: possible recursive locking detected
<4> [33.070969] 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 Not tainted
<4> [33.070970] --------------------------------------------
<4> [33.070971] i915_module_loa/948 is trying to acquire lock:
<4> [33.070972] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: i915_request_create+0x1c6/0x230 [i915]
<4> [33.071215]
but task is already holding lock:
<4> [33.071235] ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915]
<4> [33.071484]
other info that might help us debug this:
<4> [33.071504]  Possible unsafe locking scenario:
<4> [33.071522]        CPU0
<4> [33.071532]        ----
<4> [33.071541]   lock(migrate);
<4> [33.071554]   lock(migrate);
<4> [33.071567]
 *** DEADLOCK ***
<4> [33.071585]  May be due to missing lock nesting notation
<4> [33.071606] 3 locks held by i915_module_loa/948:
<4> [33.071622] #0: ffffc90001eb7b70 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0xae2/0x21c0 [i915] <4> [33.071893] #1: ffff8881127b9c28 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __intel_context_do_pin_ww+0x7a/0xa30 [i915] <4> [33.072133] #2: ffff8881127f0478 (migrate){+.+.}-{3:3}, at: intel_context_migrate_copy+0x1b3/0xa80 [i915]
<4> [33.072384]
stack backtrace:
<4> [33.072399] CPU: 7 PID: 948 Comm: i915_module_loa Not tainted 6.2.0-Patchwork_114451v1-g8589fd9227ca+ #1 <4> [33.072428] Hardware name: Intel Corporation CoffeeLake Client Platform/CoffeeLake S UDIMM RVP, BIOS CNLSFWR1.R00.X220.B00.2103302221 03/30/2021
<4> [33.072465] Call Trace:
<4> [33.072475]  <TASK>
<4> [33.072486]  dump_stack_lvl+0x5b/0x85
<4> [33.072503]  __lock_acquire.cold+0x158/0x33b
<4> [33.072524]  lock_acquire+0xd6/0x310
<4> [33.072541]  ? i915_request_create+0x1c6/0x230 [i915]
<4> [33.072812]  __mutex_lock+0x95/0xf40
<4> [33.072829]  ? i915_request_create+0x1c6/0x230 [i915]
<4> [33.073093]  ? rcu_read_lock_sched_held+0x55/0x80
<4> [33.073112]  ? __mutex_lock+0x133/0xf40
<4> [33.073128]  ? i915_request_create+0x1c6/0x230 [i915]
<4> [33.073388]  ? intel_context_migrate_copy+0x1b3/0xa80 [i915]
<4> [33.073619]  ? i915_request_create+0x1c6/0x230 [i915]
<4> [33.073876]  i915_request_create+0x1c6/0x230 [i915]
<4> [33.074135]  intel_context_migrate_copy+0x1d0/0xa80 [i915]
<4> [33.074360]  __i915_ttm_move+0x7a8/0x940 [i915]
<4> [33.074538]  ? _raw_spin_unlock_irqrestore+0x41/0x70
<4> [33.074552]  ? dma_resv_iter_next+0x91/0xb0
<4> [33.074564]  ? dma_resv_iter_first+0x42/0xb0
<4> [33.074576]  ? i915_deps_add_resv+0x4c/0xc0 [i915]
<4> [33.074744]  i915_ttm_move+0x2ac/0x430 [i915]
<4> [33.074910]  ttm_bo_handle_move_mem+0xb5/0x140 [ttm]
<4> [33.074930]  ttm_bo_validate+0xe9/0x1a0 [ttm]
<4> [33.074947]  __i915_ttm_get_pages+0x4e/0x190 [i915]
<4> [33.075112]  i915_ttm_get_pages+0xf3/0x160 [i915]
<4> [33.075280]  ____i915_gem_object_get_pages+0x36/0xb0 [i915]
<4> [33.075446]  __i915_gem_object_get_pages+0x95/0xa0 [i915]
<4> [33.075608]  i915_vma_get_pages+0xfa/0x160 [i915]
<4> [33.075779]  i915_vma_pin_ww+0xdc/0xb50 [i915]
<4> [33.075953]  eb_validate_vmas+0x1c6/0xac0 [i915]
<4> [33.076114]  i915_gem_do_execbuffer+0xb2a/0x21c0 [i915]
<4> [33.076276]  ? __stack_depot_save+0x3f/0x4e0
<4> [33.076292]  ? 0xffffffff81000000
<4> [33.076301]  ? _raw_spin_unlock_irq+0x41/0x50
<4> [33.076312]  ? lockdep_hardirqs_on+0xc3/0x140
<4> [33.076325]  ? set_track_update+0x25/0x50
<4> [33.076338]  ? __lock_acquire+0x5f2/0x2130
<4> [33.076356]  i915_gem_execbuffer2_ioctl+0x123/0x2e0 [i915]
<4> [33.076519]  ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915]
<4> [33.076679]  drm_ioctl_kernel+0xb4/0x150
<4> [33.076692]  drm_ioctl+0x21d/0x420
<4> [33.076703]  ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915]
<4> [33.076864]  ? __vm_munmap+0xd3/0x170
<4> [33.076877]  __x64_sys_ioctl+0x76/0xb0
<4> [33.076889]  do_syscall_64+0x3c/0x90
<4> [33.076900]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
<4> [33.076913] RIP: 0033:0x7f304aa903ab
<4> [33.076923] Code: 0f 1e fa 48 8b 05 e5 7a 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 7a 0d 00 f7 d8 64 89 01 48 <4> [33.076957] RSP: 002b:00007fffb1424cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 <4> [33.076975] RAX: ffffffffffffffda RBX: 00007fffb1424da0 RCX: 00007f304aa903ab <4> [33.076990] RDX: 00007fffb1424da0 RSI: 0000000040406469 RDI: 0000000000000005 <4> [33.077004] RBP: 0000000040406469 R08: 0000000000000005 R09: 0000000100003000 <4> [33.077019] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000010000 <4> [33.077034] R13: 0000000000000005 R14: 00000000ffffffff R15: 00000000000056a0
<4> [33.077052]  </TASK>

Br,

G.G.

On 2/28/23 4:11 AM, Andi Shyti wrote:
Hi,

This series of two patches fixes the issue introduced in
cf586021642d80 ("drm/i915/gt: Pipelined page migration") where,
as reported by Matt, in a chain of requests an error is reported
only if happens in the last request.

However Chris noticed that without ensuring exclusivity in the
locking we might end up in some deadlock. That's why patch 1
throttles for the ringspace in order to make sure that no one is
holding it.

Version 1 of this patch has been reviewed by matt and this
version is adding Chris exclusive locking.

Thanks Chris for this work.

Andi

Changelog
=========
v1 -> v2
  - Add patch 1 for ensuring exclusive locking of the timeline
  - Reword git commit of patch 2.

Andi Shyti (1):
   drm/i915/gt: Make sure that errors are propagated through request
     chains

Chris Wilson (1):
   drm/i915: Throttle for ringspace prior to taking the timeline mutex

  drivers/gpu/drm/i915/gt/intel_context.c | 41 +++++++++++++++++++++++++
  drivers/gpu/drm/i915/gt/intel_context.h |  2 ++
  drivers/gpu/drm/i915/gt/intel_migrate.c | 39 +++++++++++++++++------
  drivers/gpu/drm/i915/i915_request.c     |  3 ++
  4 files changed, 75 insertions(+), 10 deletions(-)




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux