When i add a new host (with osd's) to my existing cluster, 1 or 2 previous osd(s) goes down for about 2 minutes and then they come back up. [root@h1ct ~]# ceph osd tree # id weight type name up/down reweight -1 3 root default -3 3 rack unknownrack -2 3 host h1 0 1 osd.0 up 1 1 1 osd.1 up 1 2 1 osd.2 up 1 For example, after adding host h2 (with 3 new osd) to the above cluster and running the "ceph osd tree" command, i see this: [root@h1 ~]# ceph osd tree # id weight type name up/down reweight -1 6 root default -3 6 rack unknownrack -2 3 host h1 0 1 osd.0 up 1 1 1 osd.1 down 1 2 1 osd.2 up 1 -4 3 host h2 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 The down osd always come back up after 2 minutes or less andi see the following error message in the respective osd log file: 2013-01-07 04:40:17.613028 7fec7f092760 1 journal _open /ceph_journal/journals/journal_2 fd 26: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-01-07 04:40:17.613122 7fec7f092760 1 journal _open /ceph_journal/journals/journal_2 fd 26: 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0 2013-01-07 04:42:10.006533 7fec746f7710 0 -- 192.168.0.124:6808/19449 >> 192.168.1.123:6800/18287 pipe(0x7fec20000e10 sd=31 :6808 pgs=0 cs=0 l=0).accept connect_seq 0 vs existing 0 state connecting 2013-01-07 04:45:29.834341 7fec743f4710 0 -- 192.168.1.124:6808/19449 >> 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45438 pgs=7 cs=1 l=0).fault, initiating reconnect 2013-01-07 04:45:29.835748 7fec743f4710 0 -- 192.168.1.124:6808/19449 >> 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45439 pgs=15 cs=3 l=0).fault, initiating reconnect 2013-01-07 04:45:30.835219 7fec743f4710 0 -- 192.168.1.124:6808/19449 >> 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45894 pgs=482 cs=903 l=0).fault, initiating reconnect 2013-01-07 04:45:30.837318 7fec743f4710 0 -- 192.168.1.124:6808/19449 >> 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45895 pgs=483 cs=905 l=0).fault, initiating reconnect 2013-01-07 04:45:30.851984 7fec637fe710 0 log [ERR] : map e27 had wrong cluster addr (192.168.0.124:6808/19449 != my 192.168.1.124:6808/19449) Also, this only happens only when the cluster ip address and the public ip address are different for example .... .... .... [osd.0] host = g8ct public address = 192.168.0.124 cluster address = 192.168.1.124 btrfs devs = /dev/sdb .... .... but does not happen when they are the same. Any idea what may be the issue? Isaac -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html