MDS crash when client goes to sleep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



When CephFS is mounted on a client and when client decides to go to sleep, MDS segfaults.  Has anyone seen this?  Below is a part of MDS log.  This happened in emperor and recent 0.77 release.  I am running Debian Wheezy with testing kernels 3.13.  What can I do to not crash the whole system if a client goes to sleep (and looks like disconnect may do the same)? Let me know if you need any more info.

Regards,
Hong

   -43> 2014-03-20 20:08:42.463357 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 --> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 -- ?+0 0x1ee9f080 con 0x2e56580
   -42> 2014-03-20 20:08:42.463787 7fee411d4700  1 -- 192.168.1.20:6801/17079 <== mon.0 192.168.1.20:6789/0 21764 ==== mdsbeacon(6798/MDS1.2 up:active seq 21120 v6970) v2 ==== 108+0+0 (266728949 0 0) 0x1ee88dc0 con 0x2e56580
   -41> 2014-03-20 20:08:43.373099 7fee3f0cf700  2 mds.0.cache check_memory_usage total 665384, rss 503156, heap 24656, malloc 463874 mmap 0, baseline 16464, buffers 0, max 1048576, 0 / 62380 inodes have caps, 0 caps, 0 caps per inode
   -40> 2014-03-20 20:08:44.494963 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 >> :/0 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept sd=18 192.168.1.101:52026/0
   -39> 2014-03-20 20:08:44.495033 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 192.168.1.101:52026/0)
   -38> 2014-03-20 20:08:44.495565 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e2160).accept we reset (peer sent cseq 2), sending RESETSESSION
   -37> 2014-03-20 20:08:44.496015 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3f03b80 sd=18 :6801 s=4 pgs=0 cs=0 l=0 c=0x1f0e2160).fault 0: Success
   -36> 2014-03-20 20:08:44.496099 7fee411d4700  5 mds.0.35 ms_handle_reset on 192.168.1.101:0/2113152127
   -35> 2014-03-20 20:08:44.496120 7fee411d4700  3 mds.0.35 ms_handle_reset closing connection for session client.6019 192.168.1.101:0/2113152127
   -34> 2014-03-20 20:08:44.496207 7fee411d4700  1 -- 192.168.1.20:6801/17079 mark_down 0x1f0e2160 -- pipe dne
   -33> 2014-03-20 20:08:44.653628 7fee3d7c4700  1 -- 192.168.1.20:6801/17079 >> :/0 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept sd=18 192.168.1.101:52027/0
   -32> 2014-03-20 20:08:44.653677 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e22c0).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 192.168.1.101:52027/0)
   -31> 2014-03-20 20:08:44.925618 7fee411d4700  1 -- 192.168.1.20:6801/17079 <== client.6019 192.168.1.101:0/2113152127 1 ==== client_reconnect(77349 caps) v2 ==== 0+0+11032578 (0 0 3293767716) 0x2e92780 con 0x1f0e22c0
   -30> 2014-03-20 20:08:44.925682 7fee411d4700  1 mds.0.server  no longer in reconnect state, ignoring reconnect, sending close
   -29> 2014-03-20 20:08:44.925735 7fee411d4700  0 log [INF] : denied reconnect attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 2014-03-20 20:08:44.925679 (allowed interval 45)
   -28> 2014-03-20 20:08:44.925748 7fee411d4700  1 -- 192.168.1.20:6801/17079 --> 192.168.1.101:0/2113152127 -- client_session(close) v1 -- ?+0 0x3ea6540 con 0x1f0e22c0
   -27> 2014-03-20 20:08:44.927727 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 c=0x1f0e22c0).reader couldn't read tag, Success
   -26> 2014-03-20 20:08:44.927797 7fee3d7c4700  2 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 c=0x1f0e22c0).fault 0: Success
   -25> 2014-03-20 20:08:44.927849 7fee3d7c4700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3d8e000 sd=18 :6801 s=2 pgs=135 cs=1 l=0 c=0x1f0e22c0).fault, server, going to standby
   -24> 2014-03-20 20:08:46.372279 7fee401d2700 10 monclient: tick
   -23> 2014-03-20 20:08:46.372339 7fee401d2700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2014-03-20 20:08:16.372333)
   -22> 2014-03-20 20:08:46.372373 7fee401d2700 10 monclient: renew subs? (now: 2014-03-20 20:08:46.372372; renew after: 2014-03-20 20:09:56.370811) -- no
   -21> 2014-03-20 20:08:46.372403 7fee401d2700 10  log_queue is 1 last_log 2 sent 1 num 1 unsent 1 sending 1
   -20> 2014-03-20 20:08:46.372421 7fee401d2700 10  will send 2014-03-20 20:08:44.925741 mds.0 192.168.1.20:6801/17079 2 : [INF] denied reconnect attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 2014-03-20 20:08:44.925679 (allowed interval 45)
   -19> 2014-03-20 20:08:46.372466 7fee401d2700 10 monclient: _send_mon_message to mon.MDS1 at 192.168.1.20:6789/0
   -18> 2014-03-20 20:08:46.372483 7fee401d2700  1 -- 192.168.1.20:6801/17079 --> 192.168.1.20:6789/0 -- log(1 entries) v1 -- ?+0 0x71d8d80 con 0x2e56580
   -17> 2014-03-20 20:08:46.463496 7fee3f0cf700 10 monclient: _send_mon_message to mon.MDS1 at 192.168.1.20:6789/0
   -16> 2014-03-20 20:08:46.463524 7fee3f0cf700  1 -- 192.168.1.20:6801/17079 --> 192.168.1.20:6789/0 -- mdsbeacon(6798/MDS1.2 up:active seq 21121 v6970) v2 -- ?+0 0x1ee91340 con 0x2e56580
   -15> 2014-03-20 20:08:46.499688 7fee411d4700  1 -- 192.168.1.20:6801/17079 <== mon.0 192.168.1.20:6789/0 21765 ==== log(last 2) v1 ==== 24+0+0 (1174756693 0 0) 0x3ea7a40 con 0x2e56580
   -14> 2014-03-20 20:08:46.499751 7fee411d4700 10 handle_log_ack log(last 2) v1
   -13> 2014-03-20 20:08:46.499757 7fee411d4700 10  logged 2014-03-20 20:08:44.925741 mds.0 192.168.1.20:6801/17079 2 : [INF] denied reconnect attempt (mds is up:active) from client.6019 192.168.1.101:0/2113152127 after 2014-03-20 20:08:44.925679 (allowed interval 45)
   -12> 2014-03-20 20:08:46.499825 7fee411d4700  1 -- 192.168.1.20:6801/17079 <== mon.0 192.168.1.20:6789/0 21766 ==== mdsbeacon(6798/MDS1.2 up:active seq 21121 v6970) v2 ==== 108+0+0 (51773011 0 0) 0x1ee25600 con 0x2e56580
   -11> 2014-03-20 20:08:47.303323 7fee3ddca700  1 -- 192.168.1.20:6801/17079 >> :/0 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e3080).accept sd=21 192.168.1.101:52029/0
   -10> 2014-03-20 20:08:47.308830 7fee3ddca700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e3080).accept peer addr is really 192.168.1.101:0/2113152127 (socket is 192.168.1.101:52029/0)
    -9> 2014-03-20 20:08:47.309221 7fee3ddca700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e3080).accept connect_seq 0 vs existing 1 state standby
    -8> 2014-03-20 20:08:47.309251 7fee3ddca700  0 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=0 pgs=0 cs=0 l=0 c=0x1f0e3080).accept peer reset, then tried to connect to us, replacing
    -7> 2014-03-20 20:08:47.309337 7fee411d4700  5 mds.0.35 ms_handle_remote_reset on 192.168.1.101:0/2113152127
    -6> 2014-03-20 20:08:47.309357 7fee411d4700  3 mds.0.35 ms_handle_remote_reset closing connection for session client.6019 192.168.1.101:0/2113152127
    -5> 2014-03-20 20:08:47.309385 7fee411d4700  1 -- 192.168.1.20:6801/17079 mark_down 0x1f0e22c0 -- 0x3e92280
    -4> 2014-03-20 20:08:47.309478 7fee411d4700  5 mds.0.35 ms_handle_reset on 192.168.1.101:0/2113152127
    -3> 2014-03-20 20:08:47.309490 7fee411d4700  3 mds.0.35 ms_handle_reset closing connection for session client.6019 192.168.1.101:0/2113152127
    -2> 2014-03-20 20:08:47.309501 7fee411d4700  1 -- 192.168.1.20:6801/17079 mark_down 0x1f0e3080 -- 0x3e92280
    -1> 2014-03-20 20:08:47.309485 7fee3ddca700  2 -- 192.168.1.20:6801/17079 >> 192.168.1.101:0/2113152127 pipe(0x3e92280 sd=21 :6801 s=4 pgs=136 cs=1 l=0 c=0x1f0e22c0).accept read error on newly_acked_seq
     0> 2014-03-20 20:08:47.310724 7fee411d4700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fee411d4700

 ceph version 0.77 (1bca9c5c412b3af722d5250f07fd562a23cf35ff)
 1: /usr/bin/ceph-mds() [0x8baf22]
 2: (()+0xf880) [0x7fee458ef880]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-mds.MDS1.2.log
--- end dump of recent events ---


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux