Re: CephFS fuse client users stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 14, 2017 at 5:55 PM, John Spray <jspray@xxxxxxxxxx> wrote:
> On Tue, Mar 14, 2017 at 2:10 PM, Andras Pataki
> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> Hi John,
>>
>> I've checked the MDS session list, and the fuse client does appear on that
>> with 'state' as 'open'.  So both the fuse client and the MDS agree on an
>> open connection.
>>
>> Attached is the log of the ceph fuse client at debug level 20.  The MDS got
>> restarted at 9:44:20, and it went through its startup, and was in an
>> 'active' state in ceph -s by 9:45:20.  As for the IP addresses in the logs,
>> 10.128.128.110 is the MDS IP, the 10.128.128.1xy addresses are OSDs,
>> 10.128.129.63 is the IP of the client the log is from.
>
> So it looks like the client is getting stuck waiting for some
> capabilities (the 7fff9c3f7700 thread in that log, which eventually
> completes a ll_write on inode 100024ebea8 after the MDS restart).
> Hard to say whether the MDS failed to send it the proper messages, or
> if the client somehow missed it.
>
> It would be useful to have equally verbose logs from the MDS side from
> earlier on, at the point that the client started trying to do the
> write.  I wonder if you could see if your MDS+client can handle both
> being run at "debug mds = 20", "debug client = 20" respectively for a
> while, then when a client gets stuck, do the MDS restart, and follow
> back in the client log to work out which inode it was stuck on, then
> find log areas on the MDS side relating to that inode number.
>

Yesterday we had a network outage and afterwards had similarly stuck
ceph-fuse's (v10.2.6), which were fixed by an mds flip.

Here is debug_client=20 when we try to ls /cephfs (it's hanging forever):

2017-03-16 09:15:36.164051 7f9e7cf61700 20 client.9341013 trim_cache
size 33 max 16384
2017-03-16 09:15:37.164169 7f9e7cf61700 20 client.9341013 trim_cache
size 33 max 16384
2017-03-16 09:15:38.164258 7f9e7cf61700 20 client.9341013 trim_cache
size 33 max 16384
2017-03-16 09:15:38.744447 7f9e515d8700  3 client.9341013 ll_getattr 1.head
2017-03-16 09:15:38.744491 7f9e515d8700 10 client.9341013 _getattr
mask pAsLsXsFs issued=0
2017-03-16 09:15:38.744533 7f9e515d8700 15 inode.get on 0x7f9e8ea7c300
1.head now 56
2017-03-16 09:15:38.744558 7f9e515d8700 20 client.9341013
choose_target_mds starting with req->inode 1.head(faked_ino=0 re
f=56 ll_ref=143935987 cap_refs={} open={} mode=40755 size=0/0
mtime=2017-03-15 05:50:02.430699 caps=-(0=pAsLsXsFs) has_dir
_layout 0x7f9e8ea7c300)
2017-03-16 09:15:38.744584 7f9e515d8700 20 client.9341013
choose_target_mds 1.head(faked_ino=0 ref=56 ll_ref=143935987 cap
_refs={} open={} mode=40755 size=0/0 mtime=2017-03-15 05:50:02.430699
caps=-(0=pAsLsXsFs) has_dir_layout 0x7f9e8ea7c300) i
s_hash=0 hash=0
2017-03-16 09:15:38.744592 7f9e515d8700 10 client.9341013
choose_target_mds from caps on inode 1.head(faked_ino=0 ref=56 l
l_ref=143935987 cap_refs={} open={} mode=40755 size=0/0
mtime=2017-03-15 05:50:02.430699 caps=-(0=pAsLsXsFs) has_dir_layou
t 0x7f9e8ea7c300)
2017-03-16 09:15:38.744601 7f9e515d8700 20 client.9341013 mds is 0
2017-03-16 09:15:38.744608 7f9e515d8700 10 client.9341013 send_request
rebuilding request 18992614 for mds.0
2017-03-16 09:15:38.744624 7f9e515d8700 20 client.9341013
encode_cap_releases enter (req: 0x7f9e8ef0a280, mds: 0)
2017-03-16 09:15:38.744627 7f9e515d8700 20 client.9341013 send_request
set sent_stamp to 2017-03-16 09:15:38.744626
2017-03-16 09:15:38.744632 7f9e515d8700 10 client.9341013 send_request
client_request(unknown.0:18992614 getattr pAsLsXsFs
 #1 2017-03-16 09:15:38.744538) v3 to mds.0
2017-03-16 09:15:38.744691 7f9e515d8700 20 client.9341013 awaiting
reply|forward|kick on 0x7f9e515d6fa0
2017-03-16 09:15:39.164365 7f9e7cf61700 20 client.9341013 trim_cache
size 33 max 16384
2017-03-16 09:15:40.164470 7f9e7cf61700 20 client.9341013 trim_cache
size 33 max 16384


And the full log when we failover the mds at 2017-03-16
09:20:47.799250 is here:
https://cernbox.cern.ch/index.php/s/sCYdvb9furqS64y

I also have the ceph-fuse core dump if it would be useful.

-- Dan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux