Re: Faulting MDS clients, HEALTH_OK

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Perhaps related, I was watching the active mds with debug_mds set to 5/5, when I saw this in the log:

2016-09-21 15:13:26.067698 7fbaec248700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.238:0/3488321578 pipe(0x55db000 sd=49 :6802 s=2 pgs=2 cs=1 l=0 c=0x5631ce0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067717 7fbaf64ea700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.214:0/3252234463 pipe(0x54d1000 sd=76 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e8420).fault with nothing to send, going to standby
2016-09-21 15:13:26.067725 7fbb0098e700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.204:0/2963585795 pipe(0x3bf1000 sd=55 :6802 s=2 pgs=2 cs=1 l=0 c=0x15c29020).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067743 7fbb026ab700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.192:0/4235516229 pipe(0x562b000 sd=83 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e91e0).fault, server,
going to standby
2016-09-21 15:13:26.067749 7fbae840a700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.214:0/3290400005 pipe(0x2a38a000 sd=74 :6802 s=2 pgs=2 cs=1 l=0 c=0x13b6c160).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067783 7fbadb239700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.141:0/229472938 pipe(0x268d2000 sd=87 :6802 s=2 pgs=2 cs=1 l=0 c=0x28e24f20).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067803 7fbafe66b700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.193:0/2637929639 pipe(0x29582000 sd=80 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e9760).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067876 7fbb01a9f700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.228:0/581679898 pipe(0x2384f000 sd=103 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92f5a0).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067886 7fbb01ca1700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.145:0/586636299 pipe(0x25806000 sd=101 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92cc60).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067865 7fbaf43c9700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.234:0/3131612847 pipe(0x2fbe5000 sd=120 :6802 s=2 pgs=2 cs=1 l=0 c=0x37c902c0).fault with no
thing to send, going to standby
2016-09-21 15:13:26.067910 7fbaf4ed4700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.236:0/650394434 pipe(0x2fbe0000 sd=116 :6802 s=2 pgs=2 cs=1 l=0 c=0x56a5440).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067911 7fbb01196700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.149:0/821983967 pipe(0x1420b000 sd=104 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92cf20).fault with not
hing to send, going to standby
2016-09-21 15:13:26.068076 7fbafc64b700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.190:0/1817596579 pipe(0x36829000 sd=124 :6802 s=2 pgs=2 cs=1 l=0 c=0x31f7a100).fault with no
thing to send, going to standby
2016-09-21 15:13:26.067717 7fbaf64ea700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.214:0/3252234463 pipe(0x54d1000 sd=76 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e8420).fault w[0/9326]ing to send, going to standby
2016-09-21 15:13:26.067725 7fbb0098e700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.204:0/2963585795 pipe(0x3bf1000 sd=55 :6802 s=2 pgs=2 cs=1 l=0 c=0x15c29020).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067743 7fbb026ab700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.192:0/4235516229 pipe(0x562b000 sd=83 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e91e0).fault, server,
going to standby
2016-09-21 15:13:26.067749 7fbae840a700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.214:0/3290400005 pipe(0x2a38a000 sd=74 :6802 s=2 pgs=2 cs=1 l=0 c=0x13b6c160).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067783 7fbadb239700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.141:0/229472938 pipe(0x268d2000 sd=87 :6802 s=2 pgs=2 cs=1 l=0 c=0x28e24f20).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.067803 7fbafe66b700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.193:0/2637929639 pipe(0x29582000 sd=80 :6802 s=2 pgs=2 cs=1 l=0 c=0x237e9760).fault with not
hing to send, going to standby
2016-09-21 15:13:26.067876 7fbb01a9f700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.228:0/581679898 pipe(0x2384f000 sd=103 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92f5a0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067886 7fbb01ca1700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.145:0/586636299 pipe(0x25806000 sd=101 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92cc60).fault with nothing to send, going to standby
2016-09-21 15:13:26.067865 7fbaf43c9700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.234:0/3131612847 pipe(0x2fbe5000 sd=120 :6802 s=2 pgs=2 cs=1 l=0 c=0x37c902c0).fault with nothing to send, going to standby
2016-09-21 15:13:26.067910 7fbaf4ed4700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.236:0/650394434 pipe(0x2fbe0000 sd=116 :6802 s=2 pgs=2 cs=1 l=0 c=0x56a5440).fault with nothing to send, going to standby
2016-09-21 15:13:26.067911 7fbb01196700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.149:0/821983967 pipe(0x1420b000 sd=104 :6802 s=2 pgs=2 cs=1 l=0 c=0x2f92cf20).fault with nothing to send, going to standby
2016-09-21 15:13:26.068076 7fbafc64b700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.190:0/1817596579 pipe(0x36829000 sd=124 :6802 s=2 pgs=2 cs=1 l=0 c=0x31f7a100).fault with nothing to send, going to standby
2016-09-21 15:13:26.068095 7fbafff84700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.140:0/1112150414 pipe(0x5679000 sd=125 :6802 s=2 pgs=2 cs=1 l=0 c=0x41bc7e0).fault with nothing to send, going to standby
2016-09-21 15:13:26.068108 7fbb0de0e700  5 mds.0.953 handle_mds_map epoch 8471 from mon.3
2016-09-21 15:13:26.068114 7fbaf890e700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.238:0/1422203298 pipe(0x29630000 sd=44 :6802 s=2 pgs=2 cs=1 l=0 c=0x3a740dc0).fault with not
hing to send, going to standby
2016-09-21 15:13:26.068143 7fbae860c700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.217:0/1120082018 pipe(0x2a724000 sd=121 :6802 s=2 pgs=2 cs=1 l=0 c=0x31f79e40).fault with no
thing to send, going to standby
2016-09-21 15:13:26.068190 7fbb040c5700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.218:0/3945638891 pipe(0x50c0000 sd=53 :6802 s=2 pgs=2 cs=1 l=0 c=0x56f4420).fault with nothi
ng to send, going to standby
2016-09-21 15:13:26.068200 7fbaf961b700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.144:0/2952053583 pipe(0x318dc000 sd=81 :6802 s=2 pgs=2 cs=1 l=0 c=0x286fa840).fault with not
hing to send, going to standby
2016-09-21 15:13:26.068232 7fbaf981d700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.159:0/1872775873 pipe(0x268d7000 sd=38 :6802 s=2 pgs=2 cs=1 l=0 c=0x56f6940).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.068253 7fbaeac32700  0 -- 192.168.1.196:6802/13581 >> 192.168.1.186:0/4141441999 pipe(0x54e7000 sd=86 :6802 s=2 pgs=2 cs=1 l=0 c=0x286fb760).fault with noth
ing to send, going to standby
2016-09-21 15:13:26.068275 7fbb0de0e700  1 mds.-1.-1 handle_mds_map i (192.168.1.196:6802/13581) dne in the mdsmap, respawning myself
2016-09-21 15:13:26.068289 7fbb0de0e700  1 mds.-1.-1 respawn
2016-09-21 15:13:26.068294 7fbb0de0e700  1 mds.-1.-1  e: 'ceph-mds'
2016-09-21 15:13:26.173095 7f689baa8780  0 ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432), process ceph-mds, pid 13581
2016-09-21 15:13:26.175664 7f689baa8780 -1 mds.-1.0 log_to_monitors {default=true}
2016-09-21 15:13:27.329181 7f68969e9700  1 mds.-1.0 handle_mds_map standby
2016-09-21 15:13:28.484148 7f68969e9700  1 mds.-1.0 handle_mds_map standby
2016-09-21 15:13:33.280376 7f68969e9700  1 mds.-1.0 handle_mds_map standby

On 9/21/16, 10:48 AM, "Heller, Chris" <cheller@xxxxxxxxxx> wrote:

    I’ll see if I can capture the output the next time this issue arises, but in general the output looks as if nothing is wrong. No OSD are down, a ‘ceph health detail’ results in HEALTH_OK, the mds server is in the up:active state, in general it’s as if nothing is wrong server side (at least from the summary).
    
    -Chris
    
    On 9/21/16, 10:46 AM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:
    
        On Wed, Sep 21, 2016 at 6:30 AM, Heller, Chris <cheller@xxxxxxxxxx> wrote:
        > I’m running a production 0.94.7 Ceph cluster, and have been seeing a
        > periodic issue arise where in all my MDS clients will become stuck, and the
        > fix so far has been to restart the active MDS (sometimes I need to restart
        > the subsequent active MDS as well).
        >
        >
        >
        > These clients are using the cephfs-hadoop API, so there is no kernel client,
        > or fuse api involved. When I see clients get stuck, there are messages
        > printed to stderr like the following:
        >
        >
        >
        > 2016-09-21 10:31:12.285030 7fea4c7fb700  0 – 192.168.1.241:0/1606648601 >>
        > 192.168.1.195:6801/1674 pipe(0x7feaa0a1e0f0 sd=206 :0 s=1 pgs=0 cs=0 l=0
        > c=0x7feaa0a0c500).fault
        >
        >
        >
        > I’m at somewhat of a loss on where to begin debugging this issue, and wanted
        > to ping the list for ideas.
        
        What's the full output of "ceph -s" when this happens? Have you looked
        at the MDS' admin socket's ops-in-flight, and that of the clients?j
        
        http://docs.ceph.com/docs/master/cephfs/troubleshooting/ may help some as well.
        
        >
        >
        >
        > I managed to dump the mds cache during one of the stalled moments, which
        > hopefully is a useful starting point:
        >
        >
        >
        > e51bed37327a676e9974d740a13e173f11d1a11fdba5fbcf963b62023b06d7e8
        > mdscachedump.txt.gz (https://filetea.me/t1sz3XPHxEVThOk8tvVTK5Bsg)
        >
        >
        >
        >
        >
        > -Chris
        >
        >
        >
        >
        > _______________________________________________
        > ceph-users mailing list
        > ceph-users@xxxxxxxxxxxxxx
        > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
        >
        
    
    _______________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
    

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux