On Fri, Nov 04, 2016 at 08:32:56PM +0530, akash.goel@xxxxxxxxx wrote: > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > On a long run of more than 2-3 days, physical memory tends to get > fragmented severely, which considerably slows down the system. In such a > scenario, the shrinker is also unable to help as lack of memory is not > the actual problem, since it has been observed that there are enough free > pages of 0 order. This also manifests itself when an indiviual zone in > the mm runs out of pages and if we cannot migrate pages between zones, > the kernel hits an out-of-memory even though there are free pages (and > often all of swap) available. > > To address the issue of external fragementation, kernel does a compaction > (which involves migration of pages) but it's efficacy depends upon how > many pages are marked as MOVABLE, as only those pages can be migrated. > > Currently the backing pages for GPU buffers are allocated from shmemfs > with GFP_RECLAIMABLE flag, in units of 4KB pages. In the case of limited > swap space, it may not be possible always to reclaim or swap-out pages of > all the inactive objects, to make way for free space allowing formation > of higher order groups of physically-contiguous pages on compaction. > > Just marking the GPU pages as MOVABLE will not suffice, as i915.ko has to > pin the pages if they are in use by GPU, which will prevent their > migration. So the migratepage callback in shmem is also hooked up to get > a notification when kernel initiates the page migration. On the > notification, i915.ko appropriately unpin the pages. With this we can > effectively mark the GPU pages as MOVABLE and hence mitigate the > fragmentation problem. > > v2: > - Rename the migration routine to gem_shrink_migratepage, move it to the > shrinker file, and use the existing constructs (Chris) > - To cleanup, add a new helper function to encapsulate all page migration > skip conditions (Chris) > - Add a new local helper function in shrinker file, for dropping the > backing pages, and call the same from gem_shrink() also (Chris) > > v3: > - Fix/invert the check on the return value of unsafe_drop_pages (Chris) > > v4: > - Minor tidy > > v5: > - Fix unsafe usage of unsafe_drop_pages() > - Rebase onto vmap-notifier > > v6: > - Remove i915_gem_object_get/put across unsafe_drop_pages() as with > struct_mutex protection object can't disappear. (Chris) > > Testcase: igt/gem_shrink > Bugzilla: (e.g.) https://bugs.freedesktop.org/show_bug.cgi?id=90254 > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Cc: linux-mm@xxxxxxxxx > Signed-off-by: Sourab Gupta <sourab.gupta@xxxxxxxxx> > Signed-off-by: Akash Goel <akash.goel@xxxxxxxxx> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Hi all -mm folks! Any feedback on these two? It's kinda an intermediate step towards a full-blown gemfs, and I think useful for that. Or do we need to go directly to our own backing storage thing? Aside from ack/nack from -mm I think this is ready for merging. Thanks, Daniel > --- > drivers/gpu/drm/i915/i915_drv.h | 2 + > drivers/gpu/drm/i915/i915_gem.c | 9 ++- > drivers/gpu/drm/i915/i915_gem_shrinker.c | 132 +++++++++++++++++++++++++++++++ > 3 files changed, 142 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 4735b417..7f2717b 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1357,6 +1357,8 @@ struct intel_l3_parity { > }; > > struct i915_gem_mm { > + struct shmem_dev_info shmem_info; > + > /** Memory allocator for GTT stolen memory */ > struct drm_mm stolen; > /** Protects the usage of the GTT stolen memory allocator. This is > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 1f995ce..f0d4ce7 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2164,6 +2164,7 @@ void __i915_gem_object_invalidate(struct drm_i915_gem_object *obj) > if (obj->mm.madv == I915_MADV_WILLNEED) > mark_page_accessed(page); > > + set_page_private(page, 0); > put_page(page); > } > obj->mm.dirty = false; > @@ -2310,6 +2311,7 @@ static unsigned int swiotlb_max_size(void) > sg->length += PAGE_SIZE; > } > last_pfn = page_to_pfn(page); > + set_page_private(page, (unsigned long)obj); > > /* Check that the i965g/gm workaround works. */ > WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL)); > @@ -2334,8 +2336,10 @@ static unsigned int swiotlb_max_size(void) > > err_pages: > sg_mark_end(sg); > - for_each_sgt_page(page, sgt_iter, st) > + for_each_sgt_page(page, sgt_iter, st) { > + set_page_private(page, 0); > put_page(page); > + } > sg_free_table(st); > kfree(st); > > @@ -4185,6 +4189,8 @@ struct drm_i915_gem_object * > goto fail; > > mask = GFP_HIGHUSER | __GFP_RECLAIMABLE; > + if (IS_ENABLED(MIGRATION)) > + mask |= __GFP_MOVABLE; > if (IS_CRESTLINE(dev_priv) || IS_BROADWATER(dev_priv)) { > /* 965gm cannot relocate objects above 4GiB. */ > mask &= ~__GFP_HIGHMEM; > @@ -4193,6 +4199,7 @@ struct drm_i915_gem_object * > > mapping = obj->base.filp->f_mapping; > mapping_set_gfp_mask(mapping, mask); > + shmem_set_dev_info(mapping, &dev_priv->mm.shmem_info); > > i915_gem_object_init(obj, &i915_gem_object_ops); > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c > index a6fc1bd..051135ac 100644 > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c > @@ -24,6 +24,7 @@ > > #include <linux/oom.h> > #include <linux/shmem_fs.h> > +#include <linux/migrate.h> > #include <linux/slab.h> > #include <linux/swap.h> > #include <linux/pci.h> > @@ -473,6 +474,132 @@ struct shrinker_lock_uninterruptible { > return NOTIFY_DONE; > } > > +#ifdef CONFIG_MIGRATION > +static bool can_migrate_page(struct drm_i915_gem_object *obj) > +{ > + /* Avoid the migration of page if being actively used by GPU */ > + if (i915_gem_object_is_active(obj)) > + return false; > + > + /* Skip the migration for purgeable objects otherwise there > + * will be a deadlock when shmem will try to lock the page for > + * truncation, which is already locked by the caller before > + * migration. > + */ > + if (obj->mm.madv == I915_MADV_DONTNEED) > + return false; > + > + /* Skip the migration for a pinned object */ > + if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count) > + return false; > + > + if (any_vma_pinned(obj)) > + return false; > + > + return true; > +} > + > +static int do_migrate_page(struct drm_i915_gem_object *obj) > +{ > + struct drm_i915_private *dev_priv = to_i915(obj->base.dev); > + int ret = 0; > + > + if (!can_migrate_page(obj)) > + return -EBUSY; > + > + /* HW access would be required for a GGTT bound object, for which > + * device has to be kept awake. But a deadlock scenario can arise if > + * the attempt is made to resume the device, when either a suspend > + * or a resume operation is already happening concurrently from some > + * other path and that only also triggers compaction. So only unbind > + * if the device is currently awake. > + */ > + if (!intel_runtime_pm_get_if_in_use(dev_priv)) > + return -EBUSY; > + > + if (!unsafe_drop_pages(obj)) > + ret = -EBUSY; > + > + intel_runtime_pm_put(dev_priv); > + return ret; > +} > + > +static int i915_gem_shrinker_migratepage(struct address_space *mapping, > + struct page *newpage, > + struct page *page, > + enum migrate_mode mode, > + void *dev_priv_data) > +{ > + struct drm_i915_private *dev_priv = dev_priv_data; > + struct shrinker_lock_uninterruptible slu; > + int ret; > + > + /* > + * Clear the private field of the new target page as it could have a > + * stale value in the private field. Otherwise later on if this page > + * itself gets migrated, without getting referred by the Driver > + * in between, the stale value would cause the i915_migratepage > + * function to go for a toss as object pointer is derived from it. > + * This should be safe since at the time of migration, private field > + * of the new page (which is actually an independent free 4KB page now) > + * should be like a don't care for the kernel. > + */ > + set_page_private(newpage, 0); > + > + if (!page_private(page)) > + goto migrate; > + > + /* > + * Check the page count, if Driver also has a reference then it should > + * be more than 2, as shmem will have one reference and one reference > + * would have been taken by the migration path itself. So if reference > + * is <=2, we can directly invoke the migration function. > + */ > + if (page_count(page) <= 2) > + goto migrate; > + > + /* > + * Use trylock here, with a timeout, for struct_mutex as > + * otherwise there is a possibility of deadlock due to lock > + * inversion. This path, which tries to migrate a particular > + * page after locking that page, can race with a path which > + * truncate/purge pages of the corresponding object (after > + * acquiring struct_mutex). Since page truncation will also > + * try to lock the page, a scenario of deadlock can arise. > + */ > + if (!i915_gem_shrinker_lock_uninterruptible(dev_priv, &slu, 10)) > + return -EBUSY; > + > + ret = 0; > + if (!PageSwapCache(page) && page_private(page)) { > + struct drm_i915_gem_object *obj = > + (struct drm_i915_gem_object *)page_private(page); > + > + ret = do_migrate_page(obj); > + } > + > + i915_gem_shrinker_unlock_uninterruptible(dev_priv, &slu); > + if (ret) > + return ret; > + > + /* > + * Ideally here we don't expect the page count to be > 2, as driver > + * would have dropped its reference, but occasionally it has been seen > + * coming as 3 & 4. This leads to a situation of unexpected page count, > + * causing migration failure, with -EGAIN error. This then leads to > + * multiple attempts by the kernel to migrate the same set of pages. > + * And sometimes the repeated attempts proves detrimental for stability. > + * Also since we don't know who is the other owner, and for how long its > + * gonna keep the reference, its better to return -EBUSY. > + */ > + if (page_count(page) > 2) > + return -EBUSY; > + > +migrate: > + return migrate_page(mapping, newpage, page, mode); > +} > +#endif > + > /** > * i915_gem_shrinker_init - Initialize i915 shrinker > * @dev_priv: i915 device > @@ -491,6 +618,11 @@ void i915_gem_shrinker_init(struct drm_i915_private *dev_priv) > > dev_priv->mm.vmap_notifier.notifier_call = i915_gem_shrinker_vmap; > WARN_ON(register_vmap_purge_notifier(&dev_priv->mm.vmap_notifier)); > + > + dev_priv->mm.shmem_info.private_data = dev_priv; > +#ifdef CONFIG_MIGRATION > + dev_priv->mm.shmem_info.migratepage = i915_gem_shrinker_migratepage; > +#endif > } > > /** > -- > 1.9.2 > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@xxxxxxxxx. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a> -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>