On Mon, Jan 29, 2024 at 11:48:00AM +0100, Michal Hocko wrote: > On Sun 28-01-24 14:43:16, Kent Overstreet wrote: > > On Sun, Jan 28, 2024 at 04:45:32PM +0100, Michal Hocko wrote: > > > On Fri 26-01-24 17:07:56, Kent Overstreet wrote: > > > > If we're using PF_MEMALLOC, we might have a fallback and might not to > > > > warn about a failing allocation - thus we need a PF_* equivalent of > > > > __GFP_NOWARN. > > > > > > Could you be more specific about the user? Is this an allocation from > > > the reclaim path or an explicit PF_MEMALLOC one? It would be also really > > > helpful to explain why GFP_NOWARN cannot be used directly. > > > > Explicit PF_MEMALLOC. > > > > It's for a call to alloc_inode(), which doesn't take gfp flags, and > > plumbing it would require modifying a s_ops callback and would touch > > every filesystem - > > OK, I see. This should be part of the changelog. > > > but we want to get away from gfp flags anyways :) > > > > More specifically, the code where I'm using it is doing a "try > > GFP_NOWAIT first; if that fails drop locks and do GFP_KERNEL" dance; > > it's part of a cleanup for some weird lifetime stuff related to > > fs/inode.c. > > > > #define memalloc_flags_do(_flags, _do) \ > > ({ \ > > unsigned _saved_flags = memalloc_flags_save(_flags); \ > > typeof(_do) _ret = _do; \ > > memalloc_noreclaim_restore(_saved_flags); \ > > _ret; \ > > }) > > > > /* > > * Allocate a new inode, dropping/retaking btree locks if necessary: > > */ > > static struct bch_inode_info *bch2_new_inode(struct btree_trans *trans) > > { > > struct bch_fs *c = trans->c; > > > > struct bch_inode_info *inode = > > memalloc_flags_do(PF_MEMALLOC|PF_MEMALLOC_NOWARN, to_bch_ei(new_inode(c->vfs_sb))); > > Is this what you meant by GFP_NOWAIT allocation? You would be right > about this allocation not entering the direct reclaim but do you realize > this is not an ordinary NOWAIT request beause PF_MEMALLOC will allow to > dip into memory reserves without any limits? (Unless the specific > allocation down the road is explicitly GFP_NOMEMALLOC) > A failure here means that the system is really in a desparate state with all > the memory reserves gone which migt break reclaimers who rely on those > reserves. > > Are you sure it is a good idea? Unless I am missing something you are > just giving an ordinary user access to those reserves by creating inodes > without any bounds. Did not realize that; thank you. How about this version? >From 0e87e55058ccddde4b6bcc092f43e66a4e632575 Mon Sep 17 00:00:00 2001 From: Kent Overstreet <kent.overstreet@xxxxxxxxx> Date: Thu, 25 Jan 2024 19:00:24 -0500 Subject: [PATCH] mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN Introduce PF_MEMALLOC_* equivalents of some GFP_ flags: PF_MEMALLOC_NORECLAIM -> GFP_NOWAIT PF_MEMALLOC_NOWARN -> __GFP_NOWARN Cc: Vlastimil Babka <vbabka@xxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Michal Hocko <mhocko@xxxxxxxxxx> Cc: Darrick J. Wong <djwong@xxxxxxxxxx> Signed-off-by: Kent Overstreet <kent.overstreet@xxxxxxxxx> diff --git a/include/linux/sched.h b/include/linux/sched.h index cdb8ea53c365..36de7a2343c3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1632,12 +1632,12 @@ extern struct pid *cad_pid; #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ -#define PF_LOCAL_THROTTLE 0x00100000 /* Throttle writes only against the bdi I write to, +#define PF_MEMALLOC_NORECLAIM 0x00100000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_MEMALLOC_NOWARN 0x00200000 /* All allocation requests will inherit __GFP_NOWARN */ +#define PF_LOCAL_THROTTLE 0x00400000 /* Throttle writes only against the bdi I write to, * I am cleaning dirty pages from some other bdi. */ -#define PF_KTHREAD 0x00200000 /* I am a kernel thread */ -#define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF__HOLE__00800000 0x00800000 -#define PF__HOLE__01000000 0x01000000 +#define PF_KTHREAD 0x00800000 /* I am a kernel thread */ +#define PF_RANDOMIZE 0x01000000 /* Randomize virtual address space */ #define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index f00d7ecc2adf..c29059a76052 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -236,16 +236,25 @@ static inline gfp_t current_gfp_context(gfp_t flags) { unsigned int pflags = READ_ONCE(current->flags); - if (unlikely(pflags & (PF_MEMALLOC_NOIO | PF_MEMALLOC_NOFS | PF_MEMALLOC_PIN))) { + if (unlikely(pflags & (PF_MEMALLOC_NOIO | + PF_MEMALLOC_NOFS | + PF_MEMALLOC_NORECLAIM | + PF_MEMALLOC_NOWARN | + PF_MEMALLOC_PIN))) { /* - * NOIO implies both NOIO and NOFS and it is a weaker context - * so always make sure it makes precedence + * Stronger flags before weaker flags: + * NORECLAIM implies NOIO, which in turn implies NOFS */ - if (pflags & PF_MEMALLOC_NOIO) + if (pflags & PF_MEMALLOC_NORECLAIM) + flags &= ~__GFP_DIRECT_RECLAIM; + else if (pflags & PF_MEMALLOC_NOIO) flags &= ~(__GFP_IO | __GFP_FS); else if (pflags & PF_MEMALLOC_NOFS) flags &= ~__GFP_FS; + if (pflags & PF_MEMALLOC_NOWARN) + flags |= __GFP_NOWARN; + if (pflags & PF_MEMALLOC_PIN) flags &= ~__GFP_MOVABLE; }