I vaguely remember that the same problem can occur with nfsd on a local file system and that nfs clients have to be able to recover from ESTALE (e.g. lookup/delete/create/lookup would fail with ESTALE otherwise). Don't clients handle ESTALE by relooking up the file ala http://lwn.net/Articles/272684/ On Tue, Mar 1, 2011 at 2:09 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Tue, 1 Mar 2011 14:21:03 -0500 > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote: > >> On Tue, Mar 01, 2011 at 12:30:12PM -0600, Shirish Pargaonkar wrote: >> > On Tue, Mar 1, 2011 at 12:07 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> > > On Mon, Feb 28, 2011 at 10:28:08PM -0500, J. Bruce Fields wrote: >> > >> On Mon, Feb 28, 2011 at 08:33:08PM -0600, Steve French wrote: >> > >> > On Mon, Feb 28, 2011 at 5:59 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> > >> > > On Mon, Feb 28, 2011 at 05:52:09PM -0600, Steve French wrote: >> > >> > >> On Mon, Feb 28, 2011 at 5:42 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> > >> > >> > OK. Then as things stand we're stuck returning ESTALE to the client >> > >> > >> > unless we happen to have the inode they're looking for in our cache? >> > >> > >> >> > >> > >> Yes - that seems right and consistent with what I remember other file >> > >> > >> systems doing. >> > >> > > >> > >> > > No, other filesystems are able to look up the file on disk by inode >> > >> > > number (or by whatever identifier they use in the filehandle). They >> > >> > > don't depend on already having the inode in core. >> > >> > >> > >> > Grepping for ESTALE looks like there are dozens of places in various >> > >> > fs where ESTALE can get returned ... >> > >> >> > >> Certainly true. >> > >> >> > >> But they do have to be able to look up any inode, regardless of whether >> > >> it is currently in cache. >> > >> >> > >> Otherwise applications on the client would see ESTALE after any server >> > >> reboot, or any time an inode was forced out of cache (for whatever >> > >> reason). >> > >> >> > >> That would be quite painful. >> > > >> > > So if my understanding of the cifs behavior here is correct, then I >> > > don't believe nfs exports of cifs will be usable. >> > > >> > > In the long term perhaps it could be possible with changes to one or the >> > > other of the protocols: for example, perhaps future versions of the nfs >> > > protocol could be made less reliant on long-lived filehandles. But that >> > > would be a major change. >> > > >> > > --b. >> > > >> > >> > Bruce, I am not getting a picture of how NFS server would return ESTALE >> > error to nfs client and then on to the app for a filehandle fragment >> > that happens >> > to be coded as uniqueid by cifs during encode_fh if inode with that uniqueid >> > does not happen to exist in vfs cache on the server box. >> > >> > When nfs passes down filehandle fragment (uniqueid) to cifs in fh_to_dentry, >> > if the inode does not exist in cache, cifs will pick a new inode and copy this >> > uniqueid as inode number. >> >> Oh, OK, that's not what I'd imagined. >> > > Egads. Bruce is quite right and this is just plain not going to work. > Here's one way that this will go wrong and can never be fixed. I'm > fairly certain that there are others: > > Suppose that we set up a exported cifs mount. The NFS client mounts it, > opens a file and starts writing data to the inode. So far so good -- > the client did a LOOKUP operation at some point, which got translated > to a QPathInfo by cifs and a positive dentry ended up in cache. > > The client then buffers up some writes. Now, suppose that the server > then reboots (or that the inode got pushed out of the cache since it > wasn't in use). The server redoes the cifs mount and reexports it. The > client then tries to flush the buffered writes. The server gets a write > request but now the inode isn't in cache and we can't match up the > filehandle to anything. Server returns -ESTALE. > > At that point, you're screwed. There's virtually no way for the client > to recover. We can't guarantee that the NFS client will have a valid > path to the inode even if it did want to retry the lookup, which it > really shouldn't need to do anyway. > > The fact that the CIFS protocol has no way to look up inodes by > anything like an inode number really makes this whole idea dead on > arrival. > -- > Jeff Layton <jlayton@xxxxxxxxxx> > -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-cifs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html