On 6/6/22 11:28, Linus Torvalds wrote: > On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> Has anyone looked into this lock ordering issues? > > The deadlock is > >>>> [78140.503821] CPU0 CPU1 >>>> [78140.503823] ---- ---- >>>> [78140.503824] lock(&newf->file_lock); >>>> [78140.503826] lock(&p->alloc_lock); >>>> [78140.503828] lock(&newf->file_lock); >>>> [78140.503830] lock(&ctx->lock); > > and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show() > in fs/proc/fd.c: > > task_lock(task); > files = task->files; > if (files) { > unsigned int fd = proc_fd(m->private); > > spin_lock(&files->file_lock); > > and that looks all normal. > > But the other chains look painful. > > I do see the IPC code doing ugly things, in particular I detest this code: > > task_lock(current); > list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist); > task_unlock(current); > > where it is using the task lock to protect the shm_clist list. Nasty. > > And it's doing that inside the shm_ids.rwsem lock _and_ inside the > shp->shm_perm.lock. > > So the IPC code has newseg() doing > > shmget -> > ipcget(): > down_write(ids->rwsem) -> > newseg(): > ipc_addid gets perm->lock > task_lock(current) > > so you have > > ids->rwsem -> perm->lock -> alloc_lock > > there. > > So now we have that > > ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock > > when you put those sequences together. > > But I didn't figure out what the security subsystem angle is and how > that then apparently mixes things up with execve. > > Yes, newseg() is doing that > > error = security_shm_alloc(&shp->shm_perm); > > while holding rwsem, but I can't see how that matters. From the > lockdep output, rwsem doesn't actually seem to be part of the whole > sequence. > > It *looks* like we have > > apparmour ctx->lock --> > radix_tree_preloads.lock --> > ipcperm->lock > > and apparently that's called under the file_lock somewhere, completing > the circle. > > I guess the execve component is that > > begin_new_exec -> > security_bprm_committing_creds -> > apparmor_bprm_committing_creds -> > aa_inherit_files -> > iterate_fd -> *takes file_lock* > match_file -> > aa_file_perm -> > update_file_ctx *takes ctx->lock* > > so that's how you get file_lock -> ctx->lock. > yes > So you have: > > SHMGET: > ipcperm->lock -> alloc_lock > /proc: > alloc_lock -> file_lock > apparmor_bprm_committing_creds: > file_lock -> ctx->lock > > and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part. > yeah that is the part I got stuck on, before being pulled away from this > I suspect that part is that both Apparmor and IPC use the idr local lock. > bingo, apparmor moved its secids allocation from a custom radix tree to idr in 99cc45e48678 apparmor: Use an IDR to allocate apparmor secids and ipc is using the idr for its id allocation as well I can easily lift the secid() allocation out of the ctx->lock but that would still leave it happening under the file_lock and not fix the problem. I think the quick solution would be for apparmor to stop using idr, reverting back at least temporarily to the custom radix tree.