On Wed 22-11-17 19:53:59, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > Andrew Morton wrote: > > > On Tue, 21 Nov 2017 21:02:37 +0900 Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > There are users not checking for register_shrinker() failure. > > > > Continuing with ignoring failure can lead to later oops at > > > > unregister_shrinker(). > > > > > > > > ... > > > > > > > > --- a/include/linux/shrinker.h > > > > +++ b/include/linux/shrinker.h > > > > @@ -75,6 +75,6 @@ struct shrinker { > > > > #define SHRINKER_NUMA_AWARE (1 << 0) > > > > #define SHRINKER_MEMCG_AWARE (1 << 1) > > > > > > > > -extern int register_shrinker(struct shrinker *); > > > > +extern __must_check int register_shrinker(struct shrinker *); > > > > extern void unregister_shrinker(struct shrinker *); > > > > #endif > > > > > > hm, well, OK, it's a small kmalloc(GFP_KERNEL). That won't be > > > failing. > > > > It failed by fault injection and resulted in a report at > > http://lkml.kernel.org/r/001a113f996099503a055e793dd3@xxxxxxxxxx . > > Since kzalloc() can become > 32KB allocation if CONFIG_NODES_SHIFT > 12 > (which might not be impossible in near future), register_shrinker() can > potentially become a costly allocation which might fail without invoking > the OOM killer. It is a good opportunity to think whether we should allow > register_shrinker() to fail. Is it really that hard to fix callers to handle the error? > > > Affected code seems to be fs/xfs, fs/super.c, fs/quota, > > > arch/x86/kvm/mmu, drivers/gpu/drm/ttm, drivers/md and a bunch of > > > staging stuff. > > > > > > I'm not sure this is worth bothering about? > > > > > > > Continuing with failed register_shrinker() is almost always wrong. > > Though I don't know whether mm/zsmalloc.c case can make sense. > > > > Thinking from the fact that register_shrinker() had been "void" until Linux 3.11 > and we did not take appropriate precautions when changing to "int" in Linux 3.12, > we need to consider making register_shrinker() "void" again. > > If we could agree with opening up the use of __GFP_NOFAIL for allocating a few > non-contiguous pages on large systems, we can make register_shrinker() "void" > again. (Draft patch is shown below. I choose array of kmalloc(PAGE_SIZE) > rather than kvmalloc() in order to use __GFP_NOFAIL.) I am not sure we want to overcomplicate the code too much. Most architectures do not have that many numa nodes to care. If we really need to care maybe we should rethink and get rid of the per numa deferred count altogether. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>