On Fri, 2021-12-10 at 11:05 +0000, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > This effectively removes writeback which was added in 2d6692e642e7 > ("drm/i915: Start writeback from the shrinker"). > > Digging through the history it seems we went back and forth on the > topic > of whether it would be safe a couple of times. See for instance > 5537252b6b6d ("drm/i915: Invalidate our pages under memory pressure") > where Hugh Dickins has advised against it. I do not have enough > expertise > in the memory management area so am hoping for expert input here. > > Reason for proposing removal is that there are reports from the field > which indicate a sysetm wide deadlock (of a sort) implicating i915 > doing > writeback at shrinking time. > > Signature is a hung task notifier kicking in and task traces such as: It would be interesting to see what exactly the find_get_entry is blocked on. The other two tasks are blocked on the shrinker_rwsem which is held by i915. If it's indeed a deadlock with either of those two, then the fix Chris is working on for an unrelated issue we discovered with shrinking would move out the writeback call from the shrinker_rwsem and resolve this, but if i915 is in turn deadlocking with another process and these two are just hanging waiting for the shrinker_rwsem, we would still have other issues. Do you by any chance have the list of the locks held by the system at this point? /Thomas > > [ 247.030274] minijail-init D 0 1773 1770 0x80004082 > [ 247.036419] Call Trace: > [ 247.039167] __schedule+0x57e/0x10d2 > [ 247.043175] ? __schedule+0x586/0x10d2 > [ 247.047381] ? _raw_spin_unlock+0xe/0x20 > [ 247.051779] ? __queue_work+0x316/0x371 > [ 247.056079] schedule+0x7c/0x9f > [ 247.059602] rwsem_down_write_slowpath+0x2ae/0x494 > [ 247.064971] unregister_shrinker+0x20/0x65 > [ 247.069562] deactivate_locked_super+0x38/0x88 > [ 247.074538] cleanup_mnt+0xcc/0x10e > [ 247.078447] task_work_run+0x7d/0xa6 > [ 247.082459] do_exit+0x23d/0x831 > [ 247.086079] ? syscall_trace_enter+0x207/0x20e > [ 247.091055] do_group_exit+0x8d/0x9d > [ 247.095062] __x64_sys_exit_group+0x17/0x17 > [ 247.099750] do_syscall_64+0x54/0x7e > [ 247.103758] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 246.876816] chrome D 0 1791 1785 0x00004080 > [ 246.882965] Call Trace: > [ 246.885713] __schedule+0x57e/0x10d2 > [ 246.889724] ? pcpu_alloc_area+0x25d/0x273 > [ 246.894314] schedule+0x7c/0x9f > [ 246.897836] rwsem_down_write_slowpath+0x2ae/0x494 > [ 246.903207] register_shrinker_prepared+0x19/0x48 > [ 246.908479] ? test_single_super+0x10/0x10 > [ 246.913071] sget_fc+0x1fc/0x20e > [ 246.916691] ? kill_litter_super+0x40/0x40 > [ 246.921334] ? proc_apply_options+0x42/0x42 > [ 246.926044] vfs_get_super+0x3a/0xdf > [ 246.930053] vfs_get_tree+0x2b/0xc3 > [ 246.933965] fc_mount+0x12/0x39 > [ 246.937492] pid_ns_prepare_proc+0x9d/0xc5 > [ 246.942085] alloc_pid+0x275/0x289 > [ 246.945900] copy_process+0x5e5/0xeea > [ 246.950006] _do_fork+0x95/0x303 > [ 246.953628] __se_sys_clone+0x65/0x7f > [ 246.957735] do_syscall_64+0x54/0x7e > [ 246.961743] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > And finally the smoking gun in: > > [ 247.383338] CPU: 3 PID: 88 Comm: kswapd0 Tainted: G > U 5.4.154 #36 > [ 247.383338] Hardware name: Google Delbin/Delbin, BIOS > Google_Delbin.13672.57.0 02/09/2021 > [ 247.383339] RIP: 0010:__rcu_read_lock+0x0/0x1a > [ 247.383339] Code: ff ff 0f 0b e9 61 fe ff ff 0f 0b e9 76 fe ff ff > 0f 0b 49 8b 44 24 20 e9 59 ff ff ff 0f 0b e9 67 ff ff ff 0f 0b e9 1b > ff ff ff <0f> 1f 44 00 00 55 48 89 e5 65 48 8b 04 25 80 5d 01 00 ff > 80 f8 03 > [ 247.383340] RSP: 0018:ffffb0aa0031b978 EFLAGS: 00000286 > [ 247.383340] RAX: 0000000000000000 RBX: fffff6b944ca8040 RCX: > fffff6b944ca8001 > [ 247.383341] RDX: 0000000000000028 RSI: 0000000000000001 RDI: > ffff8b52bc618c18 > [ 247.383341] RBP: ffffb0aa0031b9d0 R08: 0000000000000000 R09: > ffff8b52fb5f00d8 > [ 247.383341] R10: 0000000000000000 R11: 0000000000000000 R12: > 0000000000000000 > [ 247.383342] R13: 61c8864680b583eb R14: 0000000000000001 R15: > ffffb0aa0031b980 > [ 247.383342] FS: 0000000000000000(0000) GS:ffff8b52fbf80000(0000) > knlGS:0000000000000000 > [ 247.383343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 247.383343] CR2: 00007c78a400d680 CR3: 0000000120f46006 CR4: > 0000000000762ee0 > [ 247.383344] PKRU: 55555554 > [ 247.383344] Call Trace: > [ 247.383345] find_get_entry+0x4c/0x116 > [ 247.383345] find_lock_entry+0xc8/0xec > [ 247.383346] shmem_writeback+0x7b/0x163 > [ 247.383346] i915_gem_shrink+0x302/0x40b > [ 247.383347] ? __intel_runtime_pm_get+0x22/0x82 > [ 247.383347] i915_gem_shrinker_scan+0x86/0xa8 > [ 247.383348] shrink_slab+0x272/0x48b > [ 247.383348] shrink_node+0x784/0xbea > [ 247.383348] ? rcu_read_unlock_special+0x66/0x15f > [ 247.383349] ? update_batch_size+0x78/0x78 > [ 247.383349] kswapd+0x75c/0xa56 > [ 247.383350] kthread+0x147/0x156 > [ 247.383350] ? kswapd_run+0xb6/0xb6 > [ 247.383351] ? kthread_blkcg+0x2e/0x2e > [ 247.383351] ret_from_fork+0x1f/0x40 > > You will notice the trace is from an older kernel, the problem being > reproducing the issue on latest upstream base is proving to be tricky > due > other (unrelated) issues. > > It is even tricky to repro on an older kernel, with it seemingly > needing a > very specific game, transparent huge pages enabled and a specific > memory > configuration. > > However given the history on the topic I could find, assuming what I > found > is not incomplete, suspicion on writeback being not the right thing > to do > in general is still there. I would therefore like to have input from > the > experts here. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Fixes: 2d6692e642e7 ("drm/i915: Start writeback from the shrinker") > References: 5537252b6b6d ("drm/i915: Invalidate our pages under > memory pressure") > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Matthew Auld <matthew.auld@xxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Michal Hocko <mhocko@xxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> #v1 > Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > Cc: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@xxxxxxxxx> > Cc: Renato Pereyra <renatopereyra@xxxxxxxxxx> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: <stable@xxxxxxxxxxxxxxx> # v5.3+ > --- > drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 - > .../gpu/drm/i915/gem/i915_gem_object_types.h | 4 +- > drivers/gpu/drm/i915/gem/i915_gem_pages.c | 10 ---- > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 49 ----------------- > -- > drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 18 +++---- > drivers/gpu/drm/i915/gem/i915_gem_shrinker.h | 1 - > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 6 +-- > .../gpu/drm/i915/gem/selftests/huge_pages.c | 3 +- > 8 files changed, 11 insertions(+), 82 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > index 66f20b803b01..352c7158a487 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > @@ -455,7 +455,6 @@ i915_gem_object_unpin_pages(struct > drm_i915_gem_object *obj) > > int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj); > int i915_gem_object_truncate(struct drm_i915_gem_object *obj); > -void i915_gem_object_writeback(struct drm_i915_gem_object *obj); > > /** > * i915_gem_object_pin_map - return a contiguous mapping of the > entire object > @@ -621,7 +620,6 @@ int shmem_sg_alloc_table(struct drm_i915_private > *i915, struct sg_table *st, > unsigned int max_segment); > void shmem_sg_free_table(struct sg_table *st, struct address_space > *mapping, > bool dirty, bool backup); > -void __shmem_writeback(size_t size, struct address_space *mapping); > > #ifdef CONFIG_MMU_NOTIFIER > static inline bool > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index f9f7e44099fe..e188d6137cc0 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -57,10 +57,8 @@ struct drm_i915_gem_object_ops { > void (*put_pages)(struct drm_i915_gem_object *obj, > struct sg_table *pages); > int (*truncate)(struct drm_i915_gem_object *obj); > - void (*writeback)(struct drm_i915_gem_object *obj); > int (*shrinker_release_pages)(struct drm_i915_gem_object > *obj, > - bool no_gpu_wait, > - bool should_writeback); > + bool no_gpu_wait); > > int (*pread)(struct drm_i915_gem_object *obj, > const struct drm_i915_gem_pread *arg); > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > index 49c6e55c68ce..52e975f57956 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c > @@ -168,16 +168,6 @@ int i915_gem_object_truncate(struct > drm_i915_gem_object *obj) > return 0; > } > > -/* Try to discard unwanted pages */ > -void i915_gem_object_writeback(struct drm_i915_gem_object *obj) > -{ > - assert_object_held_shared(obj); > - GEM_BUG_ON(i915_gem_object_has_pages(obj)); > - > - if (obj->ops->writeback) > - obj->ops->writeback(obj); > -} > - > static void __i915_gem_object_reset_page_iter(struct > drm_i915_gem_object *obj) > { > struct radix_tree_iter iter; > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > index cc9fe258fba7..b4b8c921063e 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > @@ -283,54 +283,6 @@ shmem_truncate(struct drm_i915_gem_object *obj) > return 0; > } > > -void __shmem_writeback(size_t size, struct address_space *mapping) > -{ > - struct writeback_control wbc = { > - .sync_mode = WB_SYNC_NONE, > - .nr_to_write = SWAP_CLUSTER_MAX, > - .range_start = 0, > - .range_end = LLONG_MAX, > - .for_reclaim = 1, > - }; > - unsigned long i; > - > - /* > - * Leave mmapings intact (GTT will have been revoked on > unbinding, > - * leaving only CPU mmapings around) and add those pages to > the LRU > - * instead of invoking writeback so they are aged and paged > out > - * as normal. > - */ > - > - /* Begin writeback on each dirty page */ > - for (i = 0; i < size >> PAGE_SHIFT; i++) { > - struct page *page; > - > - page = find_lock_page(mapping, i); > - if (!page) > - continue; > - > - if (!page_mapped(page) && > clear_page_dirty_for_io(page)) { > - int ret; > - > - SetPageReclaim(page); > - ret = mapping->a_ops->writepage(page, &wbc); > - if (!PageWriteback(page)) > - ClearPageReclaim(page); > - if (!ret) > - goto put; > - } > - unlock_page(page); > -put: > - put_page(page); > - } > -} > - > -static void > -shmem_writeback(struct drm_i915_gem_object *obj) > -{ > - __shmem_writeback(obj->base.size, obj->base.filp->f_mapping); > -} > - > void > __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, > struct sg_table *pages, > @@ -503,7 +455,6 @@ const struct drm_i915_gem_object_ops > i915_gem_shmem_ops = { > .get_pages = shmem_get_pages, > .put_pages = shmem_put_pages, > .truncate = shmem_truncate, > - .writeback = shmem_writeback, > > .pwrite = shmem_pwrite, > .pread = shmem_pread, > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > index 157a9765f483..99a38e016780 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > @@ -55,12 +55,11 @@ static bool unsafe_drop_pages(struct > drm_i915_gem_object *obj, > return false; > } > > -static int try_to_writeback(struct drm_i915_gem_object *obj, > unsigned int flags) > +static int obj_invalidate(struct drm_i915_gem_object *obj, unsigned > int flags) > { > if (obj->ops->shrinker_release_pages) > return obj->ops->shrinker_release_pages(obj, > - !(flags & > I915_SHRINK_ACTIVE), > - flags & > I915_SHRINK_WRITEBACK); > + !(flags & > I915_SHRINK_ACTIVE)); > > switch (obj->mm.madv) { > case I915_MADV_DONTNEED: > @@ -70,8 +69,9 @@ static int try_to_writeback(struct > drm_i915_gem_object *obj, unsigned int flags) > return 0; > } > > - if (flags & I915_SHRINK_WRITEBACK) > - i915_gem_object_writeback(obj); > + if (obj->base.filp) > + invalidate_mapping_pages(file_inode(obj->base.filp)- > >i_mapping, > + 0, (loff_t)-1); > > return 0; > } > @@ -227,7 +227,7 @@ i915_gem_shrink(struct i915_gem_ww_ctx *ww, > } > > if > (!__i915_gem_object_put_pages(obj)) { > - if (!try_to_writeback(obj, > shrink)) > + if (!obj_invalidate(obj, > shrink)) > count += obj- > >base.size >> PAGE_SHIFT; > } > if (!ww) > @@ -339,8 +339,7 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, > struct shrink_control *sc) > &sc->nr_scanned, > I915_SHRINK_ACTIVE | > I915_SHRINK_BOUND | > - I915_SHRINK_UNBOUND > | > - > I915_SHRINK_WRITEBACK); > + > I915_SHRINK_UNBOUND); > } > } > > @@ -361,8 +360,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, > unsigned long event, void *ptr) > with_intel_runtime_pm(&i915->runtime_pm, wakeref) > freed_pages += i915_gem_shrink(NULL, i915, -1UL, > NULL, > I915_SHRINK_BOUND | > - I915_SHRINK_UNBOUND | > - > I915_SHRINK_WRITEBACK); > + I915_SHRINK_UNBOUND); > > /* Because we may be allocating inside our own driver, we > cannot > * assert that there are no objects with pinned pages that > are not > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h > b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h > index 8512470f6fd6..789d6947f9b9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.h > @@ -22,7 +22,6 @@ unsigned long i915_gem_shrink(struct > i915_gem_ww_ctx *ww, > #define I915_SHRINK_BOUND BIT(1) > #define I915_SHRINK_ACTIVE BIT(2) > #define I915_SHRINK_VMAPS BIT(3) > -#define I915_SHRINK_WRITEBACK BIT(4) > > unsigned long i915_gem_shrink_all(struct drm_i915_private *i915); > void i915_gem_driver_register__shrinker(struct drm_i915_private > *i915); > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index 218a9b3037c7..b7ca7b66afe7 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -425,8 +425,7 @@ int i915_ttm_purge(struct drm_i915_gem_object > *obj) > } > > static int i915_ttm_shrinker_release_pages(struct > drm_i915_gem_object *obj, > - bool no_wait_gpu, > - bool should_writeback) > + bool no_wait_gpu) > { > struct ttm_buffer_object *bo = i915_gem_to_ttm(obj); > struct i915_ttm_tt *i915_tt = > @@ -467,9 +466,6 @@ static int i915_ttm_shrinker_release_pages(struct > drm_i915_gem_object *obj, > return ret; > } > > - if (should_writeback) > - __shmem_writeback(obj->base.size, i915_tt->filp- > >f_mapping); > - > return 0; > } > > diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > index c69c7d45aabc..24bbf4d6a63d 100644 > --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c > @@ -1647,8 +1647,7 @@ static int igt_shrink_thp(void *arg) > i915_gem_shrink(NULL, i915, -1UL, NULL, > I915_SHRINK_BOUND | > I915_SHRINK_UNBOUND | > - I915_SHRINK_ACTIVE | > - I915_SHRINK_WRITEBACK); > + I915_SHRINK_ACTIVE); > if (should_swap == i915_gem_object_has_pages(obj)) { > pr_err("unexpected pages mismatch, should_swap=%s\n", > yesno(should_swap));