Re: unexpected -ENOMEM from percpu_counter_init()

Dennis Zhou <dennis@xxxxxxxxxx> · Wed, 7 Apr 2021 14:56:38 +0000

Hello,

On Wed, Apr 07, 2021 at 09:09:07PM +0800, Wang Yugui wrote:
> Hi,
> 
> > +CC btrfs
> > 
> > On 4/1/21 12:51 PM, Wang Yugui wrote:
> > > Hi,
> > > 
> > > an unexpected -ENOMEM from percpu_counter_init() happened when xfstest 
> > > with kernel 5.11.10 and 5.10.27
> > 
> > Is there a dmesg log showing allocation failure or something?
> 
> When unexpected -ENOMEM of percpu_counter_init(), btrfs as upper caller
> finally output something to dmesg.
> 
> And we add one trace to btrfs source to make sure that.
> >     if (ret == -ENOMEM) printk("ENOMEM btrfs_drew_lock_init\n");
> 
> 
> Now the reproduce frequency become from >50% to not happen or very slow
> with the flowing change.
> 
> diff --git a/mm/percpu.c b/mm/percpu.c
> index 6596a0a..0127be1 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -104,8 +104,8 @@
>  /* chunks in slots below this are subject to being sidelined on failed alloc */
>  #define PCPU_SLOT_FAIL_THRESHOLD	3
>  
> -#define PCPU_EMPTY_POP_PAGES_LOW	2
> -#define PCPU_EMPTY_POP_PAGES_HIGH	4
> +#define PCPU_EMPTY_POP_PAGES_LOW	8
> +#define PCPU_EMPTY_POP_PAGES_HIGH	16
>  

These settings are from 2014 when Tejun initially implemented the atomic
allocation float. It is probably time to think about increasing the
number of pages. I'd prefer to do it in a dynamic way though (some X% of
a chunk instead of a fixed number increase).

>  #ifdef CONFIG_SMP
>  /* default addr <-> pcpu_ptr mapping, override in asm/percpu.h if necessary */
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index 5e76af7..8cc091b 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -14,7 +14,7 @@
>  
>  /* enough to cover all DEFINE_PER_CPUs in modules */
>  #ifdef CONFIG_MODULES
> -#define PERCPU_MODULE_RESERVE		(8 << 10)
> +#define PERCPU_MODULE_RESERVE		(32 << 10)
>  #else
>  #define PERCPU_MODULE_RESERVE		0
>  #endif
> 

This is a reserved region purely for module static inits.
btrfs_drew_lock_init() is a dynamic init.

> 
> Just some guess,
> 1) maybe some releationship to the trigger of 'vm.dirty_bytes=10737418240'.
> 
> this problem happen in 
> server/T7610 with E5-2660v2 *2 and SSD/SAS(6Gb/s) and 192G memory
> but not happen in
> server/T620 with E5-2680v2 *2 and SSD/NVMe and 192G memory.
> 
> 2) maybe some releationship to numa.
> 128G memory in node1(CPU1), and 64G in node2(CPU2)
> 
> Best Regards
> Wang Yugui (wangyugui@xxxxxxxxxxxx)
> 2021/04/07
> 
> 
> > > direct caller:
> > > int btrfs_drew_lock_init(struct btrfs_drew_lock *lock)
> > > {
> > >     int ret;
> > > 
> > >     ret = percpu_counter_init(&lock->writers, 0, GFP_KERNEL);
> > >     if (ret)
> > >         return ret;
> > > 
> > >     atomic_set(&lock->readers, 0);
> > >     init_waitqueue_head(&lock->pending_readers);
> > >     init_waitqueue_head(&lock->pending_writers);
> > > 
> > >     return 0;
> > > }
> > > 
> > > upper caller:
> > >     nofs_flag = memalloc_nofs_save();
> > >     ret = btrfs_drew_lock_init(&root->snapshot_lock);
> > >     memalloc_nofs_restore(nofs_flag);

The issue is here. nofs is set which means percpu attempts an atomic
allocation. If it cannot find anything already allocated it isn't happy.
This was done before memalloc_nofs_{save/restore}() were pervasive.

Percpu should probably try to allocate some pages if possible even if
nofs is set.

> > >     if (ret == -ENOMEM) printk("ENOMEM btrfs_drew_lock_init\n");
> > >     if (ret)
> > >         goto fail;
> > > 
> > > The hardware of this server:
> > > CPU:  Xeon(R) CPU E5-2660 v2(10 core)  *2
> > > memory:  192G, no swap
> > > 
> > > Only one xfstests job is running in this server, and about 7% of memory
> > > is used.
> > > 
> > > Any advice please.
> > > 
> > > Best Regards
> > > Wang Yugui (wangyugui@xxxxxxxxxxxx)
> > > 2021/04/01
> > > 
> > > 
> 

Thanks,
Dennis