An oversight in commit 87701b4b5593 ("drm/i915: Only free the oldest stale object before a fresh allocation") was that not only do we have to serialise concurrent users of llist_del_first(), but we also have to lock llist_del_first() vs llist_del_all(). >From llist.h, * This can be summarized as follows: * * | add | del_first | del_all * add | - | - | - * del_first | | L | L * del_all | | | - * * Where, a particular row's operation can happen concurrently with a column's * operation, with "-" being no lock needed, while "L" being lock is needed. This should hopefully explain: <4>[ 89.287106] general protection fault: 0000 [#1] PREEMPT SMP <4>[ 89.287126] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp i915 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 mii mei_me mei snd_pcm prime_numbers i2c_hid pinctrl_geminilake pinctrl_intel <4>[ 89.287226] CPU: 2 PID: 23 Comm: ksoftirqd/2 Tainted: G U 4.14.0-rc8-CI-CI_DRM_3315+ #1 <4>[ 89.287247] Hardware name: Intel Corp. Geminilake/GLK RVP2 LP4SD (07), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 89.287270] task: ffff88017ab34ec0 task.stack: ffffc90000128000 <4>[ 89.287290] RIP: 0010:llist_add_batch+0x4/0x20 <4>[ 89.287301] RSP: 0018:ffffc9000012bdb8 EFLAGS: 00010296 <4>[ 89.287314] RAX: ffffffff811017ad RBX: 6e468801a1560000 RCX: ef3e53fceecdeb81 <4>[ 89.287330] RDX: 6e468801a1566130 RSI: ffff880103d73d98 RDI: ffff880103d73d98 <4>[ 89.287346] RBP: ffffc9000012bdb8 R08: ffff88017ab35780 R09: 0000000000000000 <4>[ 89.287361] R10: ffffc9000012bd68 R11: 00000000abb18c3d R12: ffffffffa01369e0 <4>[ 89.287377] R13: ffff88017fd1b8f8 R14: ffff88017ab34ec0 R15: 000000000000000a <4>[ 89.287393] FS: 0000000000000000(0000) GS:ffff88017fd00000(0000) knlGS:0000000000000000 <4>[ 89.287411] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 89.287424] CR2: 00007ff0c0755018 CR3: 000000016df9b000 CR4: 00000000003406e0 <4>[ 89.287440] Call Trace: <4>[ 89.287511] __i915_gem_free_object_rcu+0x20/0x40 [i915] <4>[ 89.287527] rcu_process_callbacks+0x27a/0x730 <4>[ 89.287544] __do_softirq+0xc0/0x4ae <4>[ 89.287559] ? smpboot_thread_fn+0x2d/0x280 <4>[ 89.287571] run_ksoftirqd+0x1f/0x70 <4>[ 89.287582] smpboot_thread_fn+0x18a/0x280 <4>[ 89.287595] kthread+0x114/0x150 <4>[ 89.287605] ? sort_range+0x30/0x30 <4>[ 89.287615] ? kthread_create_on_node+0x40/0x40 <4>[ 89.287628] ret_from_fork+0x27/0x40 <4>[ 89.287641] Code: 0d 48 83 ea 01 4c 89 c1 48 83 fa ff 74 12 48 23 0c d7 74 ed 48 c1 e2 06 48 0f bd c9 48 8d 04 0a 5d c3 90 90 90 90 90 55 48 89 e5 <48> 8b 0a 48 89 0e 48 89 c8 f0 48 0f b1 3a 48 39 c1 75 ed 48 85 <1>[ 89.287774] RIP: llist_add_batch+0x4/0x20 RSP: ffffc9000012bdb8 <4>[ 89.287826] ---[ end trace e775d15174d8ae02 ]--- (Lockless lists are only easy (and lockless) when using llist_add/llist_del_all!) Fixes: 87701b4b5593 ("drm/i915: Only free the oldest stale object before a fresh allocation") Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_gem.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 6a71805be389..889ae8810d5f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4636,11 +4636,17 @@ static void __i915_gem_free_work(struct work_struct *work) * unbound now. */ + spin_lock(&i915->mm.free_lock); while ((freed = llist_del_all(&i915->mm.free_list))) { + spin_unlock(&i915->mm.free_lock); + __i915_gem_free_objects(i915, freed); if (need_resched()) - break; + return; + + spin_lock(&i915->mm.free_lock); } + spin_unlock(&i915->mm.free_lock); } static void __i915_gem_free_object_rcu(struct rcu_head *head) -- 2.15.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx