If we run the stress-ng in the filesystem of squashfs, the system will be in a state something like hang, the stress-ng couldn't finish running and the console couldn't react to users' input. This issue happens on all arm/arm64 platforms we are working on, through debugging, we found this issue is introduced by oom handling in the kernel. The fs->readahead() is called between memalloc_nofs_save() and memalloc_nofs_restore(), and the squashfs_readahead() calls alloc_page(), in this case, if there is no memory left, the out_of_memory() will be called without __GFP_FS, then the oom killer will not be triggered and this process will loop endlessly and wait for others to trigger oom killer to release some memory. But for a system with the whole root filesystem constructed by squashfs, nearly all userspace processes will call out_of_memory() without __GFP_FS, so we will see that the system enters a state something like hang when running stress-ng. To fix it, we could trigger a kthread to call page_alloc() with __GFP_FS before returning from out_of_memory() due to without __GFP_FS. Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxx> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> Cc: Colin Ian King <colin.i.king@xxxxxxxxx> Cc: Yang Shi <shy828301@xxxxxxxxx> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxx> Cc: Dan Carpenter <dan.carpenter@xxxxxxxxxx> Signed-off-by: Hui Wang <hui.wang@xxxxxxxxxxxxx> --- mm/oom_kill.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 044e1eed720e..c9c38d6b8580 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1094,6 +1094,24 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +/* + * If an oom occurs without the __GFP_FS flag in the gfp_mask, the oom killer + * will not be triggered. In this case, we could call schedule_work to run + * trigger_oom_killer_work() to trigger an oom forcibly with __GFP_FS flag, + * this could make the oom killer run with a fair chance. + */ +static void trigger_oom_killer_work(struct work_struct *work) +{ + struct page *tmp_page; + + /* This could trigger an oom forcibly with a chance */ + tmp_page = alloc_page(GFP_KERNEL); + if (tmp_page) + __free_page(tmp_page); +} + +static DECLARE_WORK(oom_trigger_work, trigger_oom_killer_work); + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1135,8 +1153,10 @@ bool out_of_memory(struct oom_control *oc) * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to * invoke the OOM killer even if it is a GFP_NOFS allocation. */ - if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) + if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc)) { + schedule_work(&oom_trigger_work); return true; + } /* * Check if there were limitations on the allocation (only relevant for -- 2.34.1