Sorry, I have missed follow ups here. On Fri 21-02-25 10:48:28, Vlastimil Babka wrote: > On 2/21/25 03:36, Dennis Zhou wrote: > > I've thought about this in the back of my head for the past few weeks. I > > think I have 2 questions about this change. > > > > 1. Back to what TJ said earlier about probing. I feel like GFP_KERNEL > > allocations should be okay because that more or less is control plane > > time? I'm not sure dropping PR_SET_IO_FLUSHER is all that big of a > > work around? > > This solves the iscsid case but not other cases, where GFP_KERNEL > allocations are fundamentally impossible. Agreed > > > 2. This change breaks the feedback loop as we discussed above. > > Historically we've targeted 2-4 free pages worth of percpu memory. > > This is done by kicking the percpu work off. That does GFP_KERNEL > > allocations and if that requires reclaim then it goes and does it. > > However, now we're saying kswapd is going to work in parallel while > > we try to get pages in the worker thread. > > > > Given you're more versed in the reclaim side. I presume it must be > > pretty bad if we're failing to get order-0 pages even if we have > > NOFS/NOIO set? > > IMHO yes, so I don't think we need to pre-emptively fear that situation that > much. OTOH in the current state, depleting pcpu's atomic reserves and > failing pcpu_alloc due to not being allowed to take the mutex can happen > easily and even if there's plenty of free memory. Agreed > > My feeling is that we should add back some knowledge of the > > dependency so if the worker fails to get pages, it doesn't reschedule > > immediately. Maybe it's as simple as adding a sleep in the worker or > > playing with delayed work... > > I think if we wanted things to be more robust (and perhaps there's no need > to, see above), the best way would be to make the worker preallocate with > GFP_KERNEL outside of pcpu_alloc_mutex. Yes this would work as it would break the lock chain dependency. > I assume it's probably not easy to > implement as page table allocations are involved in the process and we don't > have a way to supply preallocated memory for those. Why would this be a concern if the allocation is done outside of the lock? -- Michal Hocko SUSE Labs