Hi all, I upgraded my cluster from 0.80.11 to 0.94.6, everything is ok except that rbd cmd cord dump on one host and success on others. I have disabled auth in ceph.conf: auth_cluster_required = none auth_service_required = none auth_client_required = none here is the core message. $ sudo rbd ls 2016-03-25 16:00:43.043000 7f3ae6c13780 1 -- :/0 messenger.start 2016-03-25 16:00:43.043329 7f3ae6c13780 1 -- :/1008171 --> 10.180.0.46:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x434a330 con 0x4349fc0 2016-03-25 16:00:43.043377 7f3ae6c13780 0 -- :/1008171 submit_message auth(proto 0 30 bytes epoch 0) v1 0000 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : ................ 0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : ................ 0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...........admin 0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : ............ 2016-03-25 16:00:43.043450 7f3adb7fe700 1 monclient(hunting): continuing hunt 2016-03-25 16:00:43.043489 7f3adb7fe700 1 -- :/1008171 mark_down 0x4349fc0 -- 0x4349d30 2016-03-25 16:00:43.043614 7f3adb7fe700 1 -- :/1008171 --> 10.180.0.31:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f39cc001060 con 0x7f39cc000cf0 2016-03-25 16:00:43.043648 7f3adb7fe700 0 -- :/1008171 submit_message auth(proto 0 30 bytes epoch 0) v1 0000 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : ................ 0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : ................ 0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...........admin 0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : ............ 2016-03-25 16:00:43.043694 7f3ae6c13780 0 monclient(hunting): authenticate timed out after 2.47033e-321 *** Caught signal (Segmentation fault) ** in thread 7f3adbfff700 2016-03-25 16:00:43.043756 7f3adb7fe700 1 monclient(hunting): continuing hunt 2016-03-25 16:00:43.043749 7f3ae6c13780 0 librados: client.admin authentication error (110) Connection timed out ceph version 0.94.6-2-gbb98b8f (bb98b8fcb0bb0bd3688310f6a1688736ef422b25) 1: rbd() [0x60408c] 2: (()+0xf8d0) [0x7f3ae4ea88d0] 3: rbd() [0x52b841] 4: (Mutex::~Mutex()+0x9b) [0x562a6b] 5: (Connection::~Connection()+0x6e) [0x7f3ae5550fce] 6: (Connection::~Connection()+0x9) [0x7f3ae5551049] 7: (Pipe::~Pipe()+0x90) [0x7f3ae553f330] 8: (Pipe::~Pipe()+0x9) [0x7f3ae553f4e9] 9: (SimpleMessenger::reaper()+0x8a9) [0x7f3ae5555bf9] 10: (SimpleMessenger::reaper_entry()+0x88) [0x7f3ae5556b38] 11: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f3ae555ba8d] 12: (()+0x80a4) [0x7f3ae4ea10a4] 13: (clone()+0x6d) [0x7f3ae3a2d04d] 2016-03-25 16:00:43.045278 7f3adbfff700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3adbfff700 ceph version 0.94.6-2-gbb98b8f (bb98b8fcb0bb0bd3688310f6a1688736ef422b25) 1: rbd() [0x60408c] 2: (()+0xf8d0) [0x7f3ae4ea88d0] 3: rbd() [0x52b841] 4: (Mutex::~Mutex()+0x9b) [0x562a6b] 5: (Connection::~Connection()+0x6e) [0x7f3ae5550fce] 6: (Connection::~Connection()+0x9) [0x7f3ae5551049] 7: (Pipe::~Pipe()+0x90) [0x7f3ae553f330] 8: (Pipe::~Pipe()+0x9) [0x7f3ae553f4e9] 9: (SimpleMessenger::reaper()+0x8a9) [0x7f3ae5555bf9] 10: (SimpleMessenger::reaper_entry()+0x88) [0x7f3ae5556b38] 11: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f3ae555ba8d] 12: (()+0x80a4) [0x7f3ae4ea10a4] 13: (clone()+0x6d) [0x7f3ae3a2d04d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -39> 2016-03-25 16:00:43.036565 7f3ae6c13780 5 asok(0x42f1830) register_command perfcounters_dump hook 0x42f5000 -38> 2016-03-25 16:00:43.036596 7f3ae6c13780 5 asok(0x42f1830) register_command 1 hook 0x42f5000 -37> 2016-03-25 16:00:43.036608 7f3ae6c13780 5 asok(0x42f1830) register_command perf dump hook 0x42f5000 -36> 2016-03-25 16:00:43.036621 7f3ae6c13780 5 asok(0x42f1830) register_command perfcounters_schema hook 0x42f5000 -35> 2016-03-25 16:00:43.036630 7f3ae6c13780 5 asok(0x42f1830) register_command 2 hook 0x42f5000 -34> 2016-03-25 16:00:43.036634 7f3ae6c13780 5 asok(0x42f1830) register_command perf schema hook 0x42f5000 -33> 2016-03-25 16:00:43.036639 7f3ae6c13780 5 asok(0x42f1830) register_command perf reset hook 0x42f5000 -32> 2016-03-25 16:00:43.036643 7f3ae6c13780 5 asok(0x42f1830) register_command config show hook 0x42f5000 -31> 2016-03-25 16:00:43.036651 7f3ae6c13780 5 asok(0x42f1830) register_command config set hook 0x42f5000 -30> 2016-03-25 16:00:43.036654 7f3ae6c13780 5 asok(0x42f1830) register_command config get hook 0x42f5000 -29> 2016-03-25 16:00:43.036659 7f3ae6c13780 5 asok(0x42f1830) register_command config diff hook 0x42f5000 -28> 2016-03-25 16:00:43.036662 7f3ae6c13780 5 asok(0x42f1830) register_command log flush hook 0x42f5000 -27> 2016-03-25 16:00:43.036667 7f3ae6c13780 5 asok(0x42f1830) register_command log dump hook 0x42f5000 -26> 2016-03-25 16:00:43.036670 7f3ae6c13780 5 asok(0x42f1830) register_command log reopen hook 0x42f5000 -25> 2016-03-25 16:00:43.042648 7f3ae6c13780 5 asok(0x42f1830) init /var/run/ceph/guests/ceph-client.admin.8171.70195824.asok -24> 2016-03-25 16:00:43.042662 7f3ae6c13780 5 asok(0x42f1830) bind_and_listen /var/run/ceph/guests/ceph-client.admin.8171.70195824.asok -23> 2016-03-25 16:00:43.042708 7f3ae6c13780 5 asok(0x42f1830) register_command 0 hook 0x4345200 -22> 2016-03-25 16:00:43.042715 7f3ae6c13780 5 asok(0x42f1830) register_command version hook 0x4345200 -21> 2016-03-25 16:00:43.042718 7f3ae6c13780 5 asok(0x42f1830) register_command git_version hook 0x4345200 -20> 2016-03-25 16:00:43.042728 7f3ae6c13780 5 asok(0x42f1830) register_command help hook 0x42f7230 -19> 2016-03-25 16:00:43.042731 7f3ae6c13780 5 asok(0x42f1830) register_command get_command_descriptions hook 0x4345ee0 -18> 2016-03-25 16:00:43.042766 7f3ae131d700 5 asok(0x42f1830) entry start -17> 2016-03-25 16:00:43.042931 7f3ae6c13780 1 librados: starting msgr at :/0 -16> 2016-03-25 16:00:43.042948 7f3ae6c13780 1 librados: starting objecter -15> 2016-03-25 16:00:43.043000 7f3ae6c13780 1 -- :/0 messenger.start -14> 2016-03-25 16:00:43.043085 7f3ae6c13780 1 librados: setting wanted keys -13> 2016-03-25 16:00:43.043088 7f3ae6c13780 1 librados: calling monclient init -12> 2016-03-25 16:00:43.043106 7f3ae6c13780 5 adding auth protocol: none -11> 2016-03-25 16:00:43.043329 7f3ae6c13780 1 -- :/1008171 --> 10.180.0.46:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x434a330 con 0x4349fc0 -10> 2016-03-25 16:00:43.043377 7f3ae6c13780 0 -- :/1008171 submit_message auth(proto 0 30 bytes epoch 0) v1 0000 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : ................ 0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : ................ 0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...........admin 0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : ............ -9> 2016-03-25 16:00:43.043450 7f3adb7fe700 1 monclient(hunting): continuing hunt -8> 2016-03-25 16:00:43.043489 7f3adb7fe700 1 -- :/1008171 mark_down 0x4349fc0 -- 0x4349d30 -7> 2016-03-25 16:00:43.043516 7f3ad3fff700 2 -- :/1008171 >> 10.180.0.46:6789/0 pipe(0x4349d30 sd=7 :0 s=4 pgs=0 cs=0 l=1 c=0x4349fc0).connect couldn't read banner, (0) Success -6> 2016-03-25 16:00:43.043614 7f3adb7fe700 1 -- :/1008171 --> 10.180.0.31:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- ?+0 0x7f39cc001060 con 0x7f39cc000cf0 -5> 2016-03-25 16:00:43.043637 7f3ad3fff700 3 -- :/1008171 >> 10.180.0.46:6789/0 pipe(0x4349d30 sd=7 :0 s=4 pgs=0 cs=0 l=1 c=0x4349fc0).connect fault, but state = closed != connecting, stopping -4> 2016-03-25 16:00:43.043648 7f3adb7fe700 0 -- :/1008171 submit_message auth(proto 0 30 bytes epoch 0) v1 0000 : 00 00 00 00 00 00 00 00 ff ff 00 00 00 00 00 00 : ................ 0010 : 00 00 00 00 00 00 1e 00 00 00 01 01 00 00 00 01 : ................ 0020 : 00 00 00 08 00 00 00 05 00 00 00 61 64 6d 69 6e : ...........admin 0030 : 00 00 00 00 00 00 00 00 00 00 00 00 : ............ -3> 2016-03-25 16:00:43.043694 7f3ae6c13780 0 monclient(hunting): authenticate timed out after 2.47033e-321 -2> 2016-03-25 16:00:43.043756 7f3adb7fe700 1 monclient(hunting): continuing hunt -1> 2016-03-25 16:00:43.043749 7f3ae6c13780 0 librados: client.admin authentication error (110) Connection timed out 0> 2016-03-25 16:00:43.045278 7f3adbfff700 -1 *** Caught signal (Segmentation fault) ** in thread 7f3adbfff700 ceph version 0.94.6-2-gbb98b8f (bb98b8fcb0bb0bd3688310f6a1688736ef422b25) 1: rbd() [0x60408c] 2: (()+0xf8d0) [0x7f3ae4ea88d0] 3: rbd() [0x52b841] 4: (Mutex::~Mutex()+0x9b) [0x562a6b] 5: (Connection::~Connection()+0x6e) [0x7f3ae5550fce] 6: (Connection::~Connection()+0x9) [0x7f3ae5551049] 7: (Pipe::~Pipe()+0x90) [0x7f3ae553f330] 8: (Pipe::~Pipe()+0x9) [0x7f3ae553f4e9] 9: (SimpleMessenger::reaper()+0x8a9) [0x7f3ae5555bf9] 10: (SimpleMessenger::reaper_entry()+0x88) [0x7f3ae5556b38] 11: (SimpleMessenger::ReaperThread::entry()+0xd) [0x7f3ae555ba8d] 12: (()+0x80a4) [0x7f3ae4ea10a4] 13: (clone()+0x6d) [0x7f3ae3a2d04d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio -2/-2 (syslog threshold) 99/99 (stderr threshold) max_recent 500 max_new 1000 log_file /var/log/qemu/qemu-guest-8171.log --- end dump of recent events --- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com