On Fri, Nov 11, 2016 at 7:44 PM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > On Fri, Nov 11, 2016 at 6:27 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: >> On Fri, Nov 11, 2016 at 06:17:07PM +0200, Amir Goldstein wrote: >> >>> I'm afraid so. >>> It seems to me that most of the time, lock_rename() is being used in >>> overlayfs for no better alternative to lock 2 directories (work+upper). >>> >>> My suggestion is a small modification to the overlayfs locking scheme. >>> ---Instead of this: >>> assert(lock_rename(workdir, upperdir) != NULL)); >>> copy_up(src, tmp); >>> vfs_rename(tmp, dst); >>> unlock_rename(workdir, upperdir); >>> >>> +++Use this: >>> assert(lock_rename(workdir, upperdir) != NULL)); >>> mutex_unlock(s_vfs_rename_mutex); >>> copy_up(src, tmp); >>> inode_unlock(upperdir); >>> inode_unlock(workdir); >>> assert(lock_rename(workdir, upperdir) != NULL)); >>> vfs_rename(tmp, dst); >>> unlock_rename(workdir, upperdir); >>> >>> Miklos, >>> >>> Do you see any problem with the proposed scheme? >>> Anything that can go wrong while releasing the workdir lock before vfs_rename()? >> >> Huh??? >> >> ->rename() definitely counts upon parents being locked; please, read the >> damn Documentation/filesystems/locking, it's there for a reason. >> >> The real question is why the fsck do you need to lock the workdir for the >> duration of copying at all. O_TMPFILE open + writes there doesn't need >> lock_rename() *or* parents being locked. You need the parent locked when >> you link the sucker in place, but that's it. IDGI... > > There's that. The other thing that lock_rename does is prevent > multiple copy ups on the same file. Arguably it's an overkill, but a > replacement needs to be added. > Miklos, FYI, I started to work on copy up using O_TMPFILE with i_mutex held only for upper dir. > There's that. The other thing that lock_rename does is prevent > multiple copy ups on the same file. Arguably it's an overkill, but a > replacement needs to be added. > I think you suggested here to use a waitqueue for pending copyups? Did you mean a waitqueue per overlay parent directory inode or did you mean something else? > Fact is, nobody ever reported this blatant performance bottleneck. > Probably because copying up gigabyte files is not the usual use case > for union filesystems... Still, it's a loophole for some serious DoS. An unprivileged user inside container can touch a file and block all copy up on all other containers on that host and all renames on that host fs as well! Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html