On Fri, May 13, 2022 at 09:53:05AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, May 6, 2022 8:25 AM > > > > This is necessary to avoid various user triggerable races, for instance > > racing SET_CONTAINER/UNSET_CONTAINER: > > > > ioctl(VFIO_GROUP_SET_CONTAINER) > > ioctl(VFIO_GROUP_UNSET_CONTAINER) > > vfio_group_unset_container > > int users = atomic_cmpxchg(&group->container_users, 1, 0); > > // users == 1 container_users == 0 > > __vfio_group_unset_container(group); > > > > vfio_group_set_container() > > if (atomic_read(&group->container_users)) > > if (!atomic_read(...)) > > > down_write(&container->group_lock); > > group->container = container; > > up_write(&container->group_lock); > > > > It'd be clearer to add below step here: > > container = group->container; > > otherwise if this assignment is done before above race then the > original container is not leaked. Not quite, it is the assignment to NULL that is trouble: > > down_write(&container->group_lock); > > group->container = NULL; > > up_write(&container->group_lock); Either we leaked the original container or we leaked the new container, and now the atomics are out of sync with the state of group->container. But sure, it is a touch clearer to focus only one scenario, I picked the one where group->container is read early and we leak the new container. Thanks, Jason