On 10/23/2019 02:11 AM, Michal Hocko wrote: > On Wed 23-10-19 07:43:44, Dave Chinner wrote: >> On Tue, Oct 22, 2019 at 06:33:10PM +0200, Michal Hocko wrote: > > Thanks for more clarifiation regarding PF_LESS_THROTTLE. > > [...] >>> PF_IO_FLUSHER would mean that the user >>> context is a part of the IO path and therefore there are certain reclaim >>> recursion restrictions. >> >> If PF_IO_FLUSHER just maps to PF_LESS_THROTTLE|PF_MEMALLOC_NOIO, >> then I'm not sure we need a new definition. Maybe that's the ptrace >> flag name, but in the kernel we don't need a PF_IO_FLUSHER process >> flag... > > Yes, the internal implementation would do something like that. I was > more interested in the user space visible API at this stage. Something > generic enough because exporting MEMALLOC flags is just a bad idea IMHO > (especially PF_MEMALLOC). Do you mean we would do something like: prctl() .... case PF_SET_IO_FLUSHER: current->flags |= PF_MEMALLOC_NOIO; .... or are you saying we would add a new PF_IO_FLUSHER flag and then modify PF_MEMALLOC_NOIO uses like in current_gfp_context: if (current->flags & (PF_MEMALLOC_NOIO | PF_IO_FLUSHER) flags &= ~(__GFP_IO | __GFP_FS); ? > >>>>>> This patch allows the userspace deamon to set the PF_MEMALLOC* flags >>>>>> with prctl during their initialization so later allocations cannot >>>>>> calling back into them. >>>>> >>>>> TBH I am not really happy to export these to the userspace. They are >>>>> an internal implementation detail and the userspace shouldn't really >>>> >>>> They care in these cases, because block/fs drivers must be able to make >>>> forward progress during writes. To meet this guarantee kernel block >>>> drivers use mempools and memalloc/GFP flags. >>>> >>>> For these userspace components of the block/fs drivers they already do >>>> things normal daemons do not to meet that guarantee like mlock their >>>> memory, disable oom killer, and preallocate resources they have control >>>> over. They have no control over reclaim like the kernel drivers do so >>>> its easy for us to deadlock when memory gets low. >>> >>> OK, fair enough. How much of a control do they really need though. Is a >>> single PF_IO_FLUSHER as explained above (essentially imply GPF_NOIO >>> context) sufficient? >> >> I think some of these usrspace processes work at the filesystem >> level and so really only need GFP_NOFS allocation (fuse), while >> others work at the block device level (iscsi, nbd) so need GFP_NOIO >> allocation. So there's definitely an argument for providing both... > > The main question is whether giving more APIs is really necessary. Is > there any real problem to give them only PF_IO_FLUSHER and let both > groups use this one? It will imply more reclaim restrictions for solely > FS based ones but is this a practical problem? If yes we can always add > PF_FS_$FOO later on. I am not sure. I will have to defer to general FS experts like Dave or Martin and Damien for the specific fuse case. There do not seem to be a lot of places where we check for __GFP_IO so configs with fuse and bcache for example are probably not a big deal. However, I am not very familiar with some of the other code paths in the mm layer and how FSs interact with them.