Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > On Tue, Nov 26, 2013 at 4:17 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> >> Gao feng <gaofeng@xxxxxxxxxxxxxx> reported that commit >> e51db73532955dc5eaba4235e62b74b460709d5b >> userns: Better restrictions on when proc and sysfs can be mounted >> caused a regression on mounting a new instance of proc in a mount >> namespace created with user namespace privileges, when binfmt_misc >> is mounted on /proc/sys/fs/binfmt_misc. >> >> This is an unintended regression caused by the absolutely bogus empty >> directory check in fs_fully_visible. The check fs_fully_visible replaced >> didn't even bother to attempt to verify proc was fully visible and >> hiding proc files with any kind of mount is rare. So for now fix >> the userspace regression by allowing directory with nlink == 1 >> as /proc/sys/fs/binfmt_misc has. >> >> I will have a better patch but it is not stable material, or >> last minute kernel material. So it will have to wait. > > Is the better fix to fix procfs to set nlink == 2? The better fix should be to drop locks, read the directory (f_op->iterate?) and ensure it is empty and then take locks again. nlink is insufficient to check if a directory is empty and a mount is covering a file with something interesting. Only under /proc/sys/... do directories have nlink == 1 so the nlink check continues to provide value for now. The only real world reasonable cases are mounting over an empty directory in /proc or /sys or mounting over the filesystem entirely and the nlink check actually catches the latter because the nlink count is correct on the root directories. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers