On Friday March 21, miklos@xxxxxxxxxx wrote: > > Nobody wants to send vfsmounts to the filesystem. But vfs_...() are > still part of the "upper layer", not the filesystem, so I'm not > convinced yet. For example: The layers within the VFS are probably not very clearly defined, but I think one can fairly clearly see a difference between a "filesystem" layer and a "namespace" layer. The vfs_XX function are (in my mind) the top of the filesystem layer. They primarily call the relevant xxx_operation and just add minimal support code to enforce standard policy, callouts, etc. vfsmnts are very much part of the "namespace" layer. The heart of this layer is probably link_path_walk. It allows mounts to tie filesystems together in all sorts of interesting knots :-) The readonly-bind-mount concept adds new functionality in the namespace layer. The filesystem layer already has a concept of readonly mounts, and this doesn't change (I hope). The superblock still has a readonly flag and the vfs_XXX operations must (possibly indirectly) still test this flag. readonly-bind-mounts adds a new 'readonly' flag in the namespace layer. So if you want to centralise the code for implementing this functionality (which seems like a good idea), it should probably go in link_path_walk, or one of it's friends, rather than in vfs_XXX. Maybe a new LOOKUP_WRITEACCESS flag which arranges that mnt_want_write gets called as appropriate, and that path_put will call mnt_drop_write if needed. Maybe some enhancement to the 'intent' structure with a similar effect could be done instead. Then you could, presumably, put a security hook somewhere in link_path_walk for those modules (like AppArmor) which want to do checks based on the namespace. ========= I just had a look at the places where mnt_want_write is currently called. There are quite a few of them. Two that surprised me were touch_atime and file_update_time. It is not clear to me that we should avoid updating 'atime' if the bindmount is marked readonly. The file is still being accessed, so updating the atime could still be appropriate. Possibly a "noatime" per-vfsmnt flag would be appropriate. But I'm not convinced that interpreting per-vfsmnt 'readonly' as "don't update atime" is correct. And the mnt_want_write call in file_update_time seems superfluous. This only gets called on files that were already opened for write ..... except for named pipes. We don't call mnt_want_write when we open those, but we do call file_update_time whenever we write to them. That is an awkward asymmetry. I suspect that mnt_want_write *Should* be called when a named pipe is opened for write. I think all other calls to mnt_want_write could be avoided by passing an appropriate flag to the relevant lookup routine, but I haven't checked thoroughly. NeilBrown (for those who are interested, the places I found that call mnt_want_write are: update atime update mtime create file mknod/mkdir/rmdir/unlink/symlink/link/rename truncate/fchmod/chown get_file_write_access utimes setxattr 'init_file' (which is only used for sockets and shared mem segments. As the comment near init_file says, it doesn't really need mnt_want_write, it is just there for balance). ) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html