On Wed, 9 Feb 2011, Brian Chrisman wrote: > On Mon, Feb 7, 2011 at 9:02 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Mon, 7 Feb 2011, Brian Chrisman wrote: > >> My goal is merely to push this problem back on the NFSv4 client. If I > >> can tell it "your filehandle is expired", it should request a full > >> path lookup to re-establish the filehandle, as far as I can tell from > >> the specs. > > > > Oh I see... > > > >> I converted the ESTALE returns to NFS4ERR_FHEXPIRED in export.c to > >> take a stab at accomplishing this, but there are still ESTALEs coming > >> across the wire to the client. That's why I was looking to see where > >> other things could be going wrong. > > > > Hmm. Generally speaking, the inode is in the ceph client's cache _only_ > > if it has an MDS capability, and as long as it holds the capability it has > > a stateful handle on it and will never get ESTALE. So my guess is it's > > coming from somewhere else in export.c. Can you crank up client debugging > > (see below) and reproduce? > > > > sage > > > > > > > > Crank up debugging on just about everything. Or you can just turn up > > export.c... > > > > $ cat /home/sage/ceph/src/script/kcon_most.sh > > #!/bin/sh -x > > Cranking up only export.c, I see mds communication troubles when I get > the NFS stale fh. > I also used your full-debug script and ran the same test (copying over > my build environment, building, deleting), but the copy operation > hosed the cluster with osds going down and all kinds of other stuff > (this is probably an artifact of the kernel client being on the same > node as osds etc). > The previous messages (just export.c debugging): > > ceph: export.c:171 : __cfh_to_dentry 10000005a56 > ffff880004176850 dentry ffff88003b788a80 > ceph: export.c:133 : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f) > ceph: export.c:171 : __cfh_to_dentry 10000005a56 > ffff880004176850 dentry ffff88003b788a80 > ceph: mds0 reconnect start > ceph: mds0 reconnect success > ceph: export.c:133 : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f) > ceph: export.c:171 : __cfh_to_dentry 10000005a56 > ffff880004176850 dentry ffff88003b788a80 > ceph: mds0 recovery completed > libceph: mds0 10.200.98.105:6800 socket closed > libceph: mds0 10.200.98.105:6800 connection failed Can you look at the mds logs to see why cmds is crashing? (Or going into an infinite loop, or whatever it is it's doing?) sage > ceph: mds0 reconnect start > ceph: mds0 reconnect success > ceph: mds0 recovery completed > libceph: mds0 10.200.98.111:6813 socket closed > libceph: mds0 10.200.98.111:6813 connection failed > libceph: mds0 10.200.98.111:6813 connection failed > libceph: mds0 10.200.98.111:6813 connection failed > libceph: mds0 10.200.98.111:6813 connection failed > libceph: mds0 10.200.98.111:6813 connection failed > libceph: mds0 10.200.98.111:6813 connection failed > ceph: export.c:133 : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f) > ceph: export.c:171 : __cfh_to_dentry 10000005a56 > ffff880004176850 dentry ffff88003b788a80 > libceph: mds0 10.200.98.111:6813 connection failed > ceph: mds0 caps stale > ceph: mds0 caps stale > libceph: mds0 10.200.98.111:6813 connection failed > ceph: export.c:133 : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f) > ceph: export.c:171 : __cfh_to_dentry 10000005a56 > ffff880004176850 dentry ffff88003b788a80 > libceph: mds0 10.200.98.111:6813 connection failed > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >