I found the clients(1 local,2 remote) can’t access ceph today. root@ceph01:/ # ceph -s 10.09.22_20:05:48.344485 pg v24138: 1320 pgs: 1320 active+clean; 111 GB data, 286 GB used, 924 GB / 1210 GB avail 10.09.22_20:05:48.352327 mds e28: 1/1/1 up {0=up:active(laggy or crashed)} 10.09.22_20:05:48.375330 osd e255: 5 osds: 5 up, 5 in 10.09.22_20:05:48.388638 log 10.09.21_10:08:33.761798 mds0 ***.***.248.176:6800/3780 11 : [INF] closing stale session client4370 1**.***.229.105:0/3599534167 after 300.097931 10.09.22_20:05:48.419286 mon e1: 1 mons at ***.***.248.177:6789/0 root@ceph01/# ceph mds dump -o - 10.09.22_20:32:13.770455 mon <- [mds,dump] 10.09.22_20:32:13.788468 mon0 -> 'dumped mdsmap epoch 28' (0) epoch 28 client_epoch 0 created 10.08.26_03:27:01.753124 modified 10.09.21_23:40:05.176168 tableserver 0 root 0 session_timeout 60 session_autoclose 300 compat compat={},rocompat={},incompat={1=base v0.20} max_mds 1 in 0 up {0=4298} failed stopped 4298: ???.???.248.176:6800/3780 'ceph02' mds0.6 up:active seq 260551 laggy since 10.09.21_23:40:05.160654 10.09.22_20:32:13.788619 wrote 358 byte payload to – The core dump file of ceph02(???.???.248.176) is as following: … Core was generated by `/usr/bin/cmds -i cep02 -c /tmp/ceph.conf.19923'. Program terminated with signal 6, Aborted. #0 0x004ca422 in __kernel_vsyscall () (gdb) bt #0 0x004ca422 in __kernel_vsyscall () #1 0x007c9651 in raise () from /lib/tls/i686/cmov/libc.so.6 #2 0x007cca82 in abort () from /lib/tls/i686/cmov/libc.so.6 #3 0x00cab52f in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/libstdc++.so.6 #4 0x00ca9465 in ?? () from /usr/lib/libstdc++.so.6 #5 0x00ca94a2 in std::terminate() () from /usr/lib/libstdc++.so.6 #6 0x00ca95e1 in __cxa_throw () from /usr/lib/libstdc++.so.6 #7 0x08312a7b in ceph::__ceph_assert_fail(char const*, char const*, int, char const*) () #8 0x080c3eeb in SimpleMessenger::Pipe::accept() () #9 0x080c4ba0 in SimpleMessenger::Pipe::reader() () #10 0x080b7d14 in SimpleMessenger::Pipe::Reader::entry() () #11 0x080ca7c1 in Thread::_entry_func(void*) () #12 0x004a896e in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #13 0x0086ca4e in clone () from /lib/tls/i686/cmov/libc.so.6 (gdb) The tail of mds log on ceph02 is as following: 10.09.21_23:37:20.331651 b4d43b70 -- ???.???.248.176:6800/3780 --> mon0 ???.???.248.177:6789/0 -- mdsbeacon(4298/lz05 up:active seq 497179 v27) v1 -- ?+0 0x8be0de0 10.09.21_23:37:20.332446 b6347b70 -- ???.???.248.176:6800/3780 <== mon0 ???.???.248.177:6789/0 511955 ==== mdsbeacon(4298/ceph02 up:active seq 497179 v27) v1 ==== 70+0+0 (116627713 0 0) 0xb2c967a8 10.09.21_23:37:22.602613 af9ffb70 -- ???.???.248.176:6800/3780 >> ???.??.229.124:0/2562359250 pipe(0x8c26b80 sd=23 pgs=0 cs=0 l=0).accept peer addr is really ???.???.229.124:0/2562359250 (socket is ???.???.229.124:52272/0) 10.09.21_23:37:22.602813 af9ffb70 -- ???.???.248.176:6800/3780 >> ???.???.229.124:0/2562359250 pipe(0x8c26b80 sd=23 pgs=0 cs=0 l=0).accept connect_seq 1 vs existing 1 state 2 msg/SimpleMessenger.cc: In function 'int SimpleMessenger::Pipe::accept()': msg/SimpleMessenger.cc:740: FAILED assert(existing->state == STATE_CONNECTING || existing->state == STATE_STANDBY || existing->state == STATE_WAIT) 1: (SimpleMessenger::Pipe::reader()+0x830) [0x80c4ba0] 2: (SimpleMessenger::Pipe::Reader::entry()+0x14) [0x80b7d14] 3: (Thread::_entry_func(void*)+0x11) [0x80ca7c1] 4: (()+0x596e) [0x4a896e] 5: (clone()+0x5e) [0x86ca4e] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html