On Tue 14-01-25 11:51:18, Rik van Riel wrote: > On Tue, 2025-01-14 at 17:46 +0100, Michal Hocko wrote: > > On Tue 14-01-25 11:09:55, Johannes Weiner wrote: > > > > > > > > We managed to extract a stack trace of the livelocked task: > > > > > > obj_cgroup_may_swap > > > zswap_store > > > swap_writepage > > > shrink_folio_list > > > shrink_lruvec > > > shrink_node > > > do_try_to_free_pages > > > try_to_free_mem_cgroup_pages > > > > OK, so this is the reclaim path and it fails due to reasons you > > mention > > below. This will retry several times until it hits mem_cgroup_oom > > which > > will bail in mem_cgroup_out_of_memory because of task_is_dying > > (returns > > true) and retry the charge + reclaim (as the oom killer hasn't done > > anything) with passed_oom = true this time and eventually got to > > nomem > > path and returns ENOMEM. This should propaged -ENOMEM down the path > > > > > charge_memcg > > > mem_cgroup_swapin_charge_folio > > > __read_swap_cache_async > > > swapin_readahead > > > do_swap_page > > > handle_mm_fault > > > do_user_addr_fault > > > exc_page_fault > > > asm_exc_page_fault > > > __get_user > > > > All the way here and return the failure to futex_cleanup which > > doesn't > > retry __get_user on the failure AFAICS (exit_robust_list). But I > > might > > be missing something, it's been quite some time since I've looked > > into > > futex code. > > Can you explain how -ENOMEM would get propagated down > past the page fault handler? > > This isn't get_user_pages(), which can just pass > -ENOMEM on to the caller. > > If there is code to pass -ENOMEM on past the page > fault exception handler, I have not been able to > find it. How does this work? This might be me misunderstading get_user machinery but doesn't it return a failure on PF handler returing ENOMEM? -- Michal Hocko SUSE Labs