kernel client osdc ops stuck and mds slow reqs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

We are quite regularly (a couple times per week) seeing:

HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs
report slow requests
MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release
    mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond
to capability release client_id: 52919162
MDS_SLOW_REQUEST 1 MDSs report slow requests
    mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs

Which is being caused by osdc ops stuck in a kernel client, e.g.:

10:57:18 root hpc-be028 /root
→ cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc
REQUESTS 9 homeless 0
46559317    osd243    3.ee6ffcdb    3.cdb    [243,501,92]/243
[243,501,92]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057
 0x400014    1    read
46559322    osd243    3.ee6ffcdb    3.cdb    [243,501,92]/243
[243,501,92]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057
 0x400014    1    read
46559323    osd243    3.969cc573    3.573    [243,330,226]/243
[243,330,226]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
 0x400014    1    read
46559341    osd243    3.969cc573    3.573    [243,330,226]/243
[243,330,226]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
 0x400014    1    read
46559342    osd243    3.969cc573    3.573    [243,330,226]/243
[243,330,226]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
 0x400014    1    read
46559345    osd243    3.969cc573    3.573    [243,330,226]/243
[243,330,226]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056
 0x400014    1    read
46559621    osd243    3.6313e8ef    3.8ef    [243,330,521]/243
[243,330,521]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a
 0x400014    1    read
46559629    osd243    3.b280c852    3.852    [243,113,539]/243
[243,113,539]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f
 0x400014    1    read
46559928    osd243    3.1ee7bab4    3.ab4    [243,332,94]/243
[243,332,94]/243    e678697
fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f
 0x400024    1    write
LINGER REQUESTS
BACKOFFS


We can unblock those requests by doing `ceph osd down osd.243` (or
restarting osd.243).

This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64.

Are there a better way to debug this?

Best Regards,

Dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux