Hi, some question about workqueue for percpu. > > > > > > And a question about this, > > > > > > > upper caller: > > > > > > > nofs_flag = memalloc_nofs_save(); > > > > > > > ret = btrfs_drew_lock_init(&root->snapshot_lock); > > > > > > > memalloc_nofs_restore(nofs_flag); > > > > > > > > The issue is here. nofs is set which means percpu attempts an atomic > > > > allocation. If it cannot find anything already allocated it isn't happy. > > > > This was done before memalloc_nofs_{save/restore}() were pervasive. > > > > > > > > Percpu should probably try to allocate some pages if possible even if > > > > nofs is set. > > > > > > Should we check and pre-alloc memory inside memalloc_nofs_restore()? > > > another memalloc_nofs_save() may come soon. > > > > > > something like this in memalloc_nofs_save()? > > > if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_LOW) > > > pcpu_schedule_balance_work(); > > > > > > > Percpu does do this via a workqueue item. The issue is in v5.9 we > > introduced 2 types of chunks. However, the free float page number was > > for the total. So even if 1 chunk type dropped below, the other chunk > > type might have enough pages. I'm queuing this for 5.12 and will send it > > out assuming it does fix your problem. workqueue for percpu maybe not strong enough( not scheduled?) when high CPU load? this is our application pipeline. file_pre_process | bwa.nipt xx | samtools.nipt sort xx | file_post_process file_pre_process/file_post_process is fast, so often are blocked by pipe input/output. 'bwa.nipt xx' is a high-cpu-load, almost all of CPU cores. 'samtools.nipt sort xx' is a high-mem-load, it keep the input in memory. if the memory is not enough, it will save all the buffer to temp file, so it is sometimes high-IO-load too(write 60G or more to file). xfstests(generic/476) is just high-IO-load, cpu/memory load is NOT high. so xfstests(generic/476) maybe easy than our application pipeline. Although there is yet not a simple reproducer for another problem happend here, but there is a little high chance that something is wrong in btrfs/mm/fs-buffer. > but another problem(os freezed without call trace, PANIC without OOPS?, > the reason is yet unkown) still happen. Best Regards Wang Yugui (wangyugui@xxxxxxxxxxxx) 2021/04/09