Re: Fixing NFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 9 Feb 2011, Brian Chrisman wrote:
> On Mon, Feb 7, 2011 at 9:02 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Mon, 7 Feb 2011, Brian Chrisman wrote:
> >> My goal is merely to push this problem back on the NFSv4 client.  If I
> >> can tell it "your filehandle is expired", it should request a full
> >> path lookup to re-establish the filehandle, as far as I can tell from
> >> the specs.
> >
> > Oh I see...
> >
> >> I converted the ESTALE returns to NFS4ERR_FHEXPIRED in export.c to
> >> take a stab at accomplishing this, but there are still ESTALEs coming
> >> across the wire to the client.  That's why I was looking to see where
> >> other things could be going wrong.
> >
> > Hmm.  Generally speaking, the inode is in the ceph client's cache _only_
> > if it has an MDS capability, and as long as it holds the capability it has
> > a stateful handle on it and will never get ESTALE.  So my guess is it's
> > coming from somewhere else in export.c.  Can you crank up client debugging
> > (see below) and reproduce?
> >
> > sage
> >
> >
> >
> > Crank up debugging on just about everything.  Or you can just turn up
> > export.c...
> >
> > $ cat /home/sage/ceph/src/script/kcon_most.sh
> > #!/bin/sh -x
> 
> Cranking up only export.c, I see mds communication troubles when I get
> the NFS stale fh.
> I also used your full-debug script and ran the same test (copying over
> my build environment, building, deleting), but the copy operation
> hosed the cluster with osds going down and all kinds of other stuff
> (this is probably an artifact of the kernel client being on the same
> node as osds etc).
> The previous messages (just export.c debugging):
> 
> ceph:         export.c:171  : __cfh_to_dentry 10000005a56
> ffff880004176850 dentry ffff88003b788a80
> ceph:         export.c:133  : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f)
> ceph:         export.c:171  : __cfh_to_dentry 10000005a56
> ffff880004176850 dentry ffff88003b788a80
> ceph: mds0 reconnect start
> ceph: mds0 reconnect success
> ceph:         export.c:133  : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f)
> ceph:         export.c:171  : __cfh_to_dentry 10000005a56
> ffff880004176850 dentry ffff88003b788a80
> ceph: mds0 recovery completed
> libceph: mds0 10.200.98.105:6800 socket closed
> libceph: mds0 10.200.98.105:6800 connection failed

Can you look at the mds logs to see why cmds is crashing?  (Or going into 
an infinite loop, or whatever it is it's doing?)

sage



> ceph: mds0 reconnect start
> ceph: mds0 reconnect success
> ceph: mds0 recovery completed
> libceph: mds0 10.200.98.111:6813 socket closed
> libceph: mds0 10.200.98.111:6813 connection failed
> libceph: mds0 10.200.98.111:6813 connection failed
> libceph: mds0 10.200.98.111:6813 connection failed
> libceph: mds0 10.200.98.111:6813 connection failed
> libceph: mds0 10.200.98.111:6813 connection failed
> libceph: mds0 10.200.98.111:6813 connection failed
> ceph:         export.c:133  : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f)
> ceph:         export.c:171  : __cfh_to_dentry 10000005a56
> ffff880004176850 dentry ffff88003b788a80
> libceph: mds0 10.200.98.111:6813 connection failed
> ceph: mds0 caps stale
> ceph: mds0 caps stale
> libceph: mds0 10.200.98.111:6813 connection failed
> ceph:         export.c:133  : __cfh_to_dentry 10000005a56 (1000000145a/1c60f9f)
> ceph:         export.c:171  : __cfh_to_dentry 10000005a56
> ffff880004176850 dentry ffff88003b788a80
> libceph: mds0 10.200.98.111:6813 connection failed
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux