Re: Failure to execute file on overlayfs during switch_root/chroot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 14, 2019 at 3:09 AM aszlig <aszlig@nix.build> wrote:
>
> On Sun, Feb 03, 2019 at 03:51:53PM +0200, Amir Goldstein wrote:
> > OK, what I don't understand and requires debugging is that the print of
> > (realfile, IS_ERR(realfile) ? 0 : realfile->f_flags) suggests that realfile
> > is not an error value and realfile->f_flags are 0.
>
> Just got back to debugging this properly.
>
> I think you're confusing the same thing as I ded when first looking at the
> code, because realfile actually _is_ an error in this case, so the output is
> correct (I personally probably also got confused because of the
> realinode/realfile variable names).
>
> So after debugging this further (and totally digging in wrong places at first)
> I found that the actual problem here is the O_NOATIME flag that is passed to
> the underlying file system. If you look in fs/namei.c in function may_open(),
> there is a check for inode_owner_or_capable().
>
> Being able to read a file despite being the owner but having read permissions
> is pefectly fine, but due to the fact that O_NOATIME is passed, the open()
> fails.
>
> Now in normal situations where the overlayfs is mounted as root, this shouldn't
> be a problem, but as soon as you have a networked file system, things go bad.
>
> That's what happened in our case, where we have a 9p file system mounted in a
> guest VM and a lowerdir of overlayfs on top of that. If the file owner on the
> host is the same as the current uid of qemu process, the open() works
> correctly. However if it's not the case, it will fail with EPERM on the host
> side (even though you have read access).
>
> The attached patch simply removes the O_NOATIME flag, which fixes the issue.
>
> I originally thought about adding a condition on whether to add the flag, but I
> only see two options here, which IMHO are bad in their own rights:
>
>   * Using inode_owner_or_capable() to check whether to add O_NOATIME, which has
>     the downside that it will not work with networked file systems where you
>     map different users (I've tested this already with a different patch[1]).
>   * Check for failure of open_with_fake_path() and retry without O_NOATIME,
>     which *could* be an option, but I think that might come with a performance
>     penalty.
>
> Actually, a third option would be to just ignore O_NOATIME in fs/namei.c
> instead of returning -EPERM, but I think that could open up a whole range of
> other bugs.
>
> In summary, I think just removing O_NOATIME IMHO is the most sensible option,
> because it doesn't cause problems with network filesystems and also leaves the
> atime/noatime decision to the administrator of the corresponding system.
>
> Or is there something that I've missed where one is in dire need of O_NOATIME?
>

Overlayfs is not expected to modify the lower layer.

OTOH, I can't really think anything that should break horribly if
we allow overlayfs to update atime on a writable lower layer??

Thanks,
Amir.



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux