On Tue, Feb 18, 2020 at 09:04:36PM +0000, Jason Gunthorpe wrote: > The rdma_cm must be used single threaded. > > This appears to be a bug in the design, as it does have lots of locking > that seems like it should allow concurrency. However, when it is all said > and done every single place that uses the cma_exch() scheme is broken, and > all the unlocked reads from the ucma of the cm_id data are wrong too. > > syzkaller has been finding endless bugs related to this. > > Fixing this in any elegant way is some enormous amount of work. Take a > very big hammer and put a mutex around everything to do with the > ucma_context at the top of every syscall. > > Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") > Reported-by: syzbot+adb15cf8c2798e4e0db4@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+e5579222b6a3edd96522@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+4b628fcc748474003457@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+29ee8f76017ce6cf03da@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+6956235342b7317ec564@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+b358909d8d01556b790b@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+6b46b135602a3f3ac99e@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+8458d13b13562abf6b77@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+bd034f3fdc0402e942ed@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+c92378b32760a4eef756@xxxxxxxxxxxxxxxxxxxxxxxxx > Reported-by: syzbot+68b44a1597636e0b342c@xxxxxxxxxxxxxxxxxxxxxxxxx > Cc: Eric Biggers <ebiggers@xxxxxxxxxx> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > --- > drivers/infiniband/core/ucma.c | 50 ++++++++++++++++++++++++++++++++-- > 1 file changed, 48 insertions(+), 2 deletions(-) > > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-rc > > Lets see if I told syzkaller about this properly.. > > EricB: If there are other rdma_cm related hits in syzkaller besides > these 11 lets include them as well. I wasn't able to find a way to > search for things, this list is from your past email, thanks. > Unfortunately I haven't had time to work on syzkaller bugs lately, so I can't provide an updated list until I go through the long backlog of bugs. A comment on the patch below: > @@ -1112,13 +1134,17 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf, > if (cmd.conn_param.valid) { > ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param); > mutex_lock(&file->mut); > + mutex_lock(&ctx->mutex); > ret = __rdma_accept(ctx->cm_id, &conn_param, NULL); > + mutex_unlock(&ctx->mutex); > if (!ret) > ctx->uid = cmd.uid; > mutex_unlock(&file->mut); This is nesting the new ucma_context::mutex inside the existing ucma_file::mut. > @@ -1403,6 +1443,7 @@ static ssize_t ucma_process_join(struct ucma_file *file, > if (IS_ERR(ctx)) > return PTR_ERR(ctx); > > + mutex_lock(&ctx->mutex); > mutex_lock(&file->mut); > mc = ucma_alloc_multicast(ctx); > if (!mc) { ... but this is doing the opposite. So it can deadlock. What's the intended order? Also, are these two separate mutexes actually needed? I.e., did you consider using the existing ucma_file::mut, but it didn't work or it wasn't fine-grained enough? (It looks like one ucma_file can have multiple ucma_contexts.) - Eric