On Tue, Mar 14, 2017 at 5:55 PM, John Spray <jspray@xxxxxxxxxx> wrote: > On Tue, Mar 14, 2017 at 2:10 PM, Andras Pataki > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: >> Hi John, >> >> I've checked the MDS session list, and the fuse client does appear on that >> with 'state' as 'open'. So both the fuse client and the MDS agree on an >> open connection. >> >> Attached is the log of the ceph fuse client at debug level 20. The MDS got >> restarted at 9:44:20, and it went through its startup, and was in an >> 'active' state in ceph -s by 9:45:20. As for the IP addresses in the logs, >> 10.128.128.110 is the MDS IP, the 10.128.128.1xy addresses are OSDs, >> 10.128.129.63 is the IP of the client the log is from. > > So it looks like the client is getting stuck waiting for some > capabilities (the 7fff9c3f7700 thread in that log, which eventually > completes a ll_write on inode 100024ebea8 after the MDS restart). > Hard to say whether the MDS failed to send it the proper messages, or > if the client somehow missed it. > > It would be useful to have equally verbose logs from the MDS side from > earlier on, at the point that the client started trying to do the > write. I wonder if you could see if your MDS+client can handle both > being run at "debug mds = 20", "debug client = 20" respectively for a > while, then when a client gets stuck, do the MDS restart, and follow > back in the client log to work out which inode it was stuck on, then > find log areas on the MDS side relating to that inode number. > Yesterday we had a network outage and afterwards had similarly stuck ceph-fuse's (v10.2.6), which were fixed by an mds flip. Here is debug_client=20 when we try to ls /cephfs (it's hanging forever): 2017-03-16 09:15:36.164051 7f9e7cf61700 20 client.9341013 trim_cache size 33 max 16384 2017-03-16 09:15:37.164169 7f9e7cf61700 20 client.9341013 trim_cache size 33 max 16384 2017-03-16 09:15:38.164258 7f9e7cf61700 20 client.9341013 trim_cache size 33 max 16384 2017-03-16 09:15:38.744447 7f9e515d8700 3 client.9341013 ll_getattr 1.head 2017-03-16 09:15:38.744491 7f9e515d8700 10 client.9341013 _getattr mask pAsLsXsFs issued=0 2017-03-16 09:15:38.744533 7f9e515d8700 15 inode.get on 0x7f9e8ea7c300 1.head now 56 2017-03-16 09:15:38.744558 7f9e515d8700 20 client.9341013 choose_target_mds starting with req->inode 1.head(faked_ino=0 re f=56 ll_ref=143935987 cap_refs={} open={} mode=40755 size=0/0 mtime=2017-03-15 05:50:02.430699 caps=-(0=pAsLsXsFs) has_dir _layout 0x7f9e8ea7c300) 2017-03-16 09:15:38.744584 7f9e515d8700 20 client.9341013 choose_target_mds 1.head(faked_ino=0 ref=56 ll_ref=143935987 cap _refs={} open={} mode=40755 size=0/0 mtime=2017-03-15 05:50:02.430699 caps=-(0=pAsLsXsFs) has_dir_layout 0x7f9e8ea7c300) i s_hash=0 hash=0 2017-03-16 09:15:38.744592 7f9e515d8700 10 client.9341013 choose_target_mds from caps on inode 1.head(faked_ino=0 ref=56 l l_ref=143935987 cap_refs={} open={} mode=40755 size=0/0 mtime=2017-03-15 05:50:02.430699 caps=-(0=pAsLsXsFs) has_dir_layou t 0x7f9e8ea7c300) 2017-03-16 09:15:38.744601 7f9e515d8700 20 client.9341013 mds is 0 2017-03-16 09:15:38.744608 7f9e515d8700 10 client.9341013 send_request rebuilding request 18992614 for mds.0 2017-03-16 09:15:38.744624 7f9e515d8700 20 client.9341013 encode_cap_releases enter (req: 0x7f9e8ef0a280, mds: 0) 2017-03-16 09:15:38.744627 7f9e515d8700 20 client.9341013 send_request set sent_stamp to 2017-03-16 09:15:38.744626 2017-03-16 09:15:38.744632 7f9e515d8700 10 client.9341013 send_request client_request(unknown.0:18992614 getattr pAsLsXsFs #1 2017-03-16 09:15:38.744538) v3 to mds.0 2017-03-16 09:15:38.744691 7f9e515d8700 20 client.9341013 awaiting reply|forward|kick on 0x7f9e515d6fa0 2017-03-16 09:15:39.164365 7f9e7cf61700 20 client.9341013 trim_cache size 33 max 16384 2017-03-16 09:15:40.164470 7f9e7cf61700 20 client.9341013 trim_cache size 33 max 16384 And the full log when we failover the mds at 2017-03-16 09:20:47.799250 is here: https://cernbox.cern.ch/index.php/s/sCYdvb9furqS64y I also have the ceph-fuse core dump if it would be useful. -- Dan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com