Re: unexpected -ENOMEM from percpu_counter_init()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

some question about workqueue for percpu.

> > > 
> > > And a question about this,
> > > > > > > upper caller:
> > > > > > >     nofs_flag = memalloc_nofs_save();
> > > > > > >     ret = btrfs_drew_lock_init(&root->snapshot_lock);
> > > > > > >     memalloc_nofs_restore(nofs_flag);
> > > > 
> > > > The issue is here. nofs is set which means percpu attempts an atomic
> > > > allocation. If it cannot find anything already allocated it isn't happy.
> > > > This was done before memalloc_nofs_{save/restore}() were pervasive.
> > > > 
> > > > Percpu should probably try to allocate some pages if possible even if
> > > > nofs is set.
> > > 
> > > Should we check and pre-alloc memory inside memalloc_nofs_restore()?
> > > another memalloc_nofs_save() may come soon.
> > > 
> > > something like this in memalloc_nofs_save()?
> > > 	if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_LOW)
> > >  		pcpu_schedule_balance_work();
> > > 
> > 
> > Percpu does do this via a workqueue item. The issue is in v5.9 we
> > introduced 2 types of chunks. However, the free float page number was
> > for the total. So even if 1 chunk type dropped below, the other chunk
> > type might have enough pages. I'm queuing this for 5.12 and will send it
> > out assuming it does fix your problem.

workqueue for percpu maybe not strong enough( not scheduled?) when high
CPU load?

this is our application pipeline.
	file_pre_process |
	bwa.nipt xx |
	samtools.nipt sort xx |
	file_post_process

file_pre_process/file_post_process is fast, so often are blocked by
pipe input/output.

'bwa.nipt xx' is a high-cpu-load, almost all of CPU cores.

'samtools.nipt sort xx' is a high-mem-load, it keep the input in memory.
if the memory is not enough, it will save all the buffer to temp file,
so it is sometimes high-IO-load too(write 60G or more to file).


xfstests(generic/476) is just high-IO-load, cpu/memory load is NOT high.
so xfstests(generic/476) maybe easy than our application pipeline.

Although there is yet not a simple reproducer for another problem
happend here, but there is a little high chance that something is wrong
in btrfs/mm/fs-buffer.
> but another problem(os freezed without call trace, PANIC without OOPS?,
> the reason is yet unkown) still happen.

Best Regards
Wang Yugui (wangyugui@xxxxxxxxxxxx)
2021/04/09






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux