> -----Original Message----- > From: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > Sent: Wednesday, May 22, 2024 7:25 PM > To: linux-rdma@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; rds-devel@xxxxxxxxxxxxxx > Cc: Jason Gunthorpe <jgg@xxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; > Saeed Mahameed <saeedm@xxxxxxxxxx>; Tariq Toukan <tariqt@xxxxxxxxxx>; > David S . Miller <davem@xxxxxxxxxxxxx>; Eric Dumazet > <edumazet@xxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni > <pabeni@xxxxxxxxxx>; Tejun Heo <tj@xxxxxxxxxx>; Lai Jiangshan > <jiangshanlai@xxxxxxxxx>; Allison Henderson <allison.henderson@xxxxxxxxxx>; > Manjunath Patil <manjunath.b.patil@xxxxxxxxxx>; Mark Zhang > <markzhang@xxxxxxxxxx>; Håkon Bugge <haakon.bugge@xxxxxxxxxx>; Chuck > Lever <chuck.lever@xxxxxxxxxx>; Shiraz Saleem <shiraz.saleem@xxxxxxxxx>; > Yang Li <yang.lee@xxxxxxxxxxxxxxxxx> > Subject: [PATCH v3 4/6] RDMA/cm: Brute force GFP_NOIO > > In ib_cm_init(), we call memalloc_noio_{save,restore} in a parenthetic fashion > when enabled by the module parameter force_noio. > > This in order to conditionally enable ib_cm to work aligned with block I/O devices. > Any work queued later on work-queues created during module initialization will > inherit the PF_MEMALLOC_{NOIO,NOFS} flag(s), due to commit ("workqueue: > Inherit NOIO and NOFS alloc flags"). > > We do this in order to enable ULPs using the RDMA stack to be used as a > network block I/O device. This to support a filesystem on top of a raw block > device which uses said ULP(s) and the RDMA stack as the network transport > layer. > > Under intense memory pressure, we get memory reclaims. Assume the filesystem > reclaims memory, goes to the raw block device, which calls into the ULP in > question, which calls the RDMA stack. Now, if regular GFP_KERNEL allocations > in ULP or the RDMA stack require reclaims to be fulfilled, we end up in a circular > dependency. > > We break this circular dependency by: > > 1. Force all allocations in the ULP and the relevant RDMA stack to use > GFP_NOIO, by means of a parenthetic use of > memalloc_noio_{save,restore} on all relevant entry points. > > 2. Make sure work-queues inherits current->flags > wrt. PF_MEMALLOC_{NOIO,NOFS}, such that work executed on the > work-queue inherits the same flag(s). > > Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > --- > drivers/infiniband/core/cm.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index > 07fb8d3c037f0..767eec38eb57d 100644 > --- a/drivers/infiniband/core/cm.c > +++ b/drivers/infiniband/core/cm.c > @@ -22,6 +22,7 @@ > #include <linux/workqueue.h> > #include <linux/kdev_t.h> > #include <linux/etherdevice.h> > +#include <linux/sched/mm.h> > > #include <rdma/ib_cache.h> > #include <rdma/ib_cm.h> > @@ -35,6 +36,11 @@ MODULE_DESCRIPTION("InfiniBand CM"); > MODULE_LICENSE("Dual BSD/GPL"); > > #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */ > + > +static bool cm_force_noio; > +module_param_named(force_noio, cm_force_noio, bool, 0444); > +MODULE_PARM_DESC(force_noio, "Force the use of GFP_NOIO (Y/N)"); > + > static const char * const ibcm_rej_reason_strs[] = { > [IB_CM_REJ_NO_QP] = "no QP", > [IB_CM_REJ_NO_EEC] = "no EEC", > @@ -4504,6 +4510,10 @@ static void cm_remove_one(struct ib_device > *ib_device, void *client_data) static int __init ib_cm_init(void) { > int ret; > + unsigned int noio_flags; minor: please follow reverse xmas tree order > + > + if (cm_force_noio) > + noio_flags = memalloc_noio_save(); > > INIT_LIST_HEAD(&cm.device_list); > rwlock_init(&cm.device_lock); > @@ -4527,10 +4537,13 @@ static int __init ib_cm_init(void) > if (ret) > goto error3; > > - return 0; > + goto error2; > error3: > destroy_workqueue(cm.wq); > error2: > + if (cm_force_noio) > + memalloc_noio_restore(noio_flags); > + > return ret; > } > > -- > 2.31.1 >