On Tue, Apr 27, 2010 at 07:51:59AM -0400, Jeff Layton wrote: > On Mon, 26 Apr 2010 19:58:33 -0400 > Valerie Aurora <vaurora@xxxxxxxxxx> wrote: > > > I want to restart the discussion we had last July (!) about an NFS > > hard read-only mount option. A common use case of union mounts is a > > cluster with NFS mounted read-only root file systems, with a local fs > > union mounted on top. Here's the last discussion we had: > > > > http://kerneltrap.org/mailarchive/linux-fsdevel/2009/7/16/6211043/thread > > > > We can assume a local mechanism that lets the server enforce the > > read-only-ness of the file system on the local machine (the server can > > increment sb->s_hard_readonly_users on the local fs and the VFS will > > take care of the rest). > > > > The main question is what to do on the client side when the server > > changes its mind and wants to write to that file system. On the > > server side, there's a clear synchronization point: > > sb->s_hard_readonly_users needs to be decremented, and so we don't > > have to worry about a hard readonly exported file system going > > read-write willy-nilly. > > > > But the client has to cope with the sudden withdrawal of the read-only > > guarantee. A lowest common denominator starting point is to treat it > > as though the mount went away entirely, and force the client to > > remount and/or reboot. I also have vague ideas about doing something > > smart with stale file handles and generation numbers to avoid a > > remount. This looks a little bit like the forced umount patches too, > > where we could EIO any open file descriptors on the old file system. > > > > How long would it take to implement the dumb "NFS server not > > responding" version? > > > > -VAL > > Ok, so the problem is this: > > You have a client with the aforementioned union mount (r/o NFS layer > with a local r/w layer on top). "Something" changes on the server and > you need a way to cope with the change? > > What happens if you do nothing here and just expect the client to deal > with it? Obviously you have the potential for inconsistent data on the > clients until they remount along with problems like -ESTALE errors, etc. > > For the use case you describe however, an admin would have to be insane > to think that they could safely change the filesystem while it was > online and serving out data to clients. If I had a cluster like you > describe, my upgrade plan would look like this: > > 1) update a copy of the master r/o filesytem offline > 2) test it, test it, test it > 3) shut down the clients > 4) unexport the old filesystem, export the new one > 5) bring the clients back up > > ...anything else would be playing with fire. Yes, you are totally correct, that's the only scenario that would actually work. This feature is just detecting when someone tries to do this without step 3). > Unfortunately I haven't been keeping up with your patchset as well as I > probably should. What happens to the r/w layer when the r/o layer > changes? Does it become completely invalid and you have to rebuild it? > Or can it cope with a situation where the r/o filesystem is changed > while the r/w layer isn't mounted on top of it? The short version is that we can't cope with the r/o file system being changed while it's mounted as the bottom layer of a union mount on a client. This assumption is what makes a non-panicking union mount implementation possible. What I need can be summarized in the distinction between the following scenarios: Scenario A: The NFS server reboots while a client has the file system mounted as the r/o layer of a union mount. The server does not change the exported file system at all and re-exports it as hard read-only. This should work. Scenario B: The NFS server reboots as in the above scenario, but performs "touch /exports/client_root/a_file" before re-exporting the file system as hard read-only. This is _not_ okay and in some form will cause a panic on the client if the client doesn't detect it and stop accessing the mount. How to tell the difference between scenarios A and B? Thanks, -VAL -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html