Re: Failure to execute file on overlayfs during switch_root/chroot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 03, 2019 at 03:51:53PM +0200, Amir Goldstein wrote:
> OK, what I don't understand and requires debugging is that the print of
> (realfile, IS_ERR(realfile) ? 0 : realfile->f_flags) suggests that realfile
> is not an error value and realfile->f_flags are 0.

Just got back to debugging this properly.

I think you're confusing the same thing as I ded when first looking at the
code, because realfile actually _is_ an error in this case, so the output is
correct (I personally probably also got confused because of the
realinode/realfile variable names).

So after debugging this further (and totally digging in wrong places at first)
I found that the actual problem here is the O_NOATIME flag that is passed to
the underlying file system. If you look in fs/namei.c in function may_open(),
there is a check for inode_owner_or_capable().

Being able to read a file despite being the owner but having read permissions
is pefectly fine, but due to the fact that O_NOATIME is passed, the open()
fails.

Now in normal situations where the overlayfs is mounted as root, this shouldn't
be a problem, but as soon as you have a networked file system, things go bad.

That's what happened in our case, where we have a 9p file system mounted in a
guest VM and a lowerdir of overlayfs on top of that. If the file owner on the
host is the same as the current uid of qemu process, the open() works
correctly. However if it's not the case, it will fail with EPERM on the host
side (even though you have read access).

The attached patch simply removes the O_NOATIME flag, which fixes the issue.

I originally thought about adding a condition on whether to add the flag, but I
only see two options here, which IMHO are bad in their own rights:

  * Using inode_owner_or_capable() to check whether to add O_NOATIME, which has
    the downside that it will not work with networked file systems where you
    map different users (I've tested this already with a different patch[1]).
  * Check for failure of open_with_fake_path() and retry without O_NOATIME,
    which *could* be an option, but I think that might come with a performance
    penalty.

Actually, a third option would be to just ignore O_NOATIME in fs/namei.c
instead of returning -EPERM, but I think that could open up a whole range of
other bugs.

In summary, I think just removing O_NOATIME IMHO is the most sensible option,
because it doesn't cause problems with network filesystems and also leaves the
atime/noatime decision to the administrator of the corresponding system.

Or is there something that I've missed where one is in dire need of O_NOATIME?

a!
-- 
aszlig
Universal dilettante
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 84dd957efa24..3f9b9275267b 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -31,7 +31,7 @@ static struct file *ovl_open_realfile(const struct file *file,
 	const struct cred *old_cred;
 
 	old_cred = ovl_override_creds(inode->i_sb);
-	realfile = open_with_fake_path(&file->f_path, file->f_flags | O_NOATIME,
+	realfile = open_with_fake_path(&file->f_path, file->f_flags,
 				       realinode, current_cred());
 	revert_creds(old_cred);
 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux