On Wed, Jun 17, 2020 at 10:07:56PM +0300, Leon Romanovsky wrote: > On Wed, Jun 17, 2020 at 03:20:46PM -0300, Jason Gunthorpe wrote: > > On Wed, Jun 17, 2020 at 02:28:11PM +0300, Leon Romanovsky wrote: > > > On Wed, Jun 17, 2020 at 04:07:32PM +0530, haris.iqbal@xxxxxxxxxxxxxxx wrote: > > > > From: Md Haris Iqbal <haris.iqbal@xxxxxxxxxxxxxxx> > > > > > > > > Fixes: 2de6c8de192b ("block/rnbd: server: main functionality") > > > > Reported-by: kernel test robot <rong.a.chen@xxxxxxxxx> > > > > Signed-off-by: Md Haris Iqbal <haris.iqbal@xxxxxxxxxxxxxxx> > > > > > > > > The rnbd_server module's communication manager initialization depends on the > > > > registration of the "network namespace subsystem" of the RDMA CM agent module. > > > > As such, when the kernel is configured to load the rnbd_server and the RDMA > > > > cma module during initialization; and if the rnbd_server module is initialized > > > > before RDMA cma module, a null ptr dereference occurs during the RDMA bind > > > > operation. > > > > This patch delays the initialization of the rnbd_server module to the > > > > late_initcall level, since RDMA cma module uses module_init which puts it into > > > > the device_initcall level. > > > > drivers/block/rnbd/rnbd-srv.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/block/rnbd/rnbd-srv.c b/drivers/block/rnbd/rnbd-srv.c > > > > index 86e61523907b..213df05e5994 100644 > > > > +++ b/drivers/block/rnbd/rnbd-srv.c > > > > @@ -840,5 +840,5 @@ static void __exit rnbd_srv_cleanup_module(void) > > > > rnbd_srv_destroy_sysfs_files(); > > > > } > > > > > > > > -module_init(rnbd_srv_init_module); > > > > +late_initcall(rnbd_srv_init_module); > > > > > > I don't think that this is correct change. Somehow nvme-rdma works: > > > module_init(nvme_rdma_init_module); > > > -> nvme_rdma_init_module > > > -> nvmf_register_transport(&nvme_rdma_transport); > > > -> nvme_rdma_create_ctrl > > > -> nvme_rdma_setup_ctrl > > > -> nvme_rdma_configure_admin_queue > > > -> nvme_rdma_alloc_queue > > > -> rdma_create_id > > > > If it does work, it is by luck. > > I didn't check every ULP, but it seems that all ULPs use the same > module_init. > > > > > Keep in mind all this only matters for kernels without modules. > > Can it be related to the fact that other ULPs call to ib_register_client() > before calling to rdma-cm? RNBD does not have such call. If the rdma_create_id() is not on a callchain from module_init then you don't have a problem. nvme has a bug here, IIRC. It is not OK to create RDMA CM IDs outside a client - CM IDs are supposed to be cleaned up when the client is removed. Similarly they are supposed to be created from the client attachment. Though listening CM IDs unbound to any device may change that slightly, I think it is probably best practice to create the listening ID only if a client is bound. Most probably that is the best way to fix rnbd > I'm not proposing this, but just loudly wondering, do we really need rdma-cm > as a separate module? Can we bring it to be part of ib_core? No idea.. It doesn't help this situation at least Jason