On Sat, 2007-06-23 at 09:52 -0700, Andrew Morton wrote: > > On Fri, 22 Jun 2007 13:03:03 -0700 Dave Hansen <haveblue@xxxxxxxxxx> wrote: > > Why do we need r/o bind mounts? > > > > This feature allows a read-only view into a read-write filesystem. > > In the process of doing that, it also provides infrastructure for > > keeping track of the number of writers to any given mount. > > > > This has a number of uses. It allows chroots to have parts of > > filesystems writable. It will be useful for containers in the future > > because users may have root inside a container, but should not > > be allowed to write to somefilesystems. This also replaces > > patches that vserver has had out of the tree for several years. > > > > It allows security enhancement by making sure that parts of > > your filesystem read-only (such as when you don't trust your > > FTP server), when you don't want to have entire new filesystems > > mounted, or when you want atime selectively updated. > > I've been using the following script to test that the feature is > > working as desired. It takes a directory and makes a regular > > bind and a r/o bind mount of it. It then performs some normal > > filesystem operations on the three directories, including ones > > that are expected to fail, like creating a file on the r/o > > mount. > > Doesn't selinux do some of this? > > My overall reaction: owch. There's a ton of tricksy code here and great > potential for us to accidentally break it in the future by forgetting a > mnt_may_write() as the kernel evolves. This is definitely a tricky thing. It takes a static, single check and replaces it with a matched set of operations. But, it's not much different that adding a mutex to something. People can always miss one side of the lock pair. People won't miss the mnt_may_write() because it will become the only way that it is valid to check a mounted fs for the ability to write to it. IS_RDONLY() will not be available for these kinds of checks. > And then there's the added complexity and the added runtime overhead. > > Balance that against some pretty obscure-looking benefits and I'm > struggling to see how a merge is justifiable? One reason Al had me go through using these paired operations instead of just passing the mount all over the vfs is that this fixes some existing, fundamental problems: we do not properly track when writers are _finished_ to our filesystems, and may allow a remount-r/o operation to success when writes are still occurring. We needed to separate out the logical "users can write to this fs" from the physical "this fs is on r/o media" or "this fs is dying and writes will only kill it more". That's what these patches do in the end. One set of things that I'm going to tack on here once these go in is the ability to increment the writer count upon a decrement of i_nlink to zero. We'll drop the write count when the file is actually truncated. As it stands right now, since there is never an open filp on those files, you might unlink a file, do a r/o mount of the fs, then still write to it when the truncate occurs. I think fixing that was one of Al's long-term goals with this strategy. -- Dave - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html