On Tue 11-02-25 10:55:20, Tejun Heo wrote: > Hello, Michal. > > On Thu, Feb 06, 2025 at 01:26:33PM +0100, Michal Hocko wrote: > ... > > It has turned out that iscsid has worked around this by dropping > > PR_SET_IO_FLUSHER (https://github.com/open-iscsi/open-iscsi/pull/382) > > when scanning host. But we can do better in this case on the kernel side > > FWIW, requiring GFP_KERNEL context for probing doesn't sound too crazy to > me. > > > @@ -2204,7 +2204,12 @@ static void pcpu_balance_workfn(struct work_struct *work) > > * to grow other chunks. This then gives pcpu_reclaim_populated() time > > * to move fully free chunks to the active list to be freed if > > * appropriate. > > + * > > + * Enforce GFP_NOIO allocations because we have pcpu_alloc users > > + * constrained to GFP_NOIO/NOFS contexts and they could form lock > > + * dependency through pcpu_alloc_mutex > > */ > > + unsigned int flags = memalloc_noio_save(); > > Just for context, the reason why the allocation mask support was limited to > GFP_KERNEL or not rather than supporting full range of GFP flags is because > percpu memory area expansion can involve page table allocations in the > vmalloc area which always uses GFP_KERNEL. memalloc_noio_save() masks IO > part out of that, right? It might be worthwhile to explain why we aren't > passing down GPF flags throughout and instead depending on masking. I have gone with masking because that seemed easier to review and more robust solution. vmalloc does support NOFS/NOIO contexts these days (it will just uses scoped masking in those cases). Propagating the gfp throughout the worker code path is likely possible, but I haven't really explored that in detail to be sure. Would that be preferable even if the fix would be more involved? > Also, doesn't the above always prevent percpu allocations from doing fs/io > reclaims? Yes it does. Probably worth mentioning in the changelog. These allocations should be rare so having a constrained reclaim didn't really seem problematic to me. There should be kswapd running in the background with the full reclaim power. > ie. Shouldn't the masking only be used if the passed in gfp > doesn't allow fs/io? This is a good question. I have to admit that my understanding might be incorrect but wouldn't it be possible that we could get the lock dependency chain if GFP_KERNEL and scoped NOFS alloc_pcp calls are competing? fs/io lock pcpu_alloc_noprof(NOFS/NOIO) pcpu_alloc_noprof(GFP_KERNEL) pcpu_schedule_balance_work pcpu_alloc_mutex pcpu_alloc_mutex allocation_deadlock throgh fs/io lock This is currently not possible because constrained allocations only do trylock. Makes sense? -- Michal Hocko SUSE Labs