Hi, i face here some trouble with the cluster. Suddenly "random" OSD's are getting marked out. After restarting the OSD on the specific node, its working again. This happens usually during activated scrubbing/deep scrubbing. In the logs i can see: 2016-02-29 06:08:58.130376 7fd5dae75700 0 -- 10.0.1.2:0/36459 >> 10.0.0.4:6807/9051245 pipe(0x27488000 sd=58 :60473 s=1 pgs=0 cs=0 l=1 c=0x28b39440).connect claims to be 10.0.0.4:6807/12051245 not 10.0.0.4:6807/9051245 - wrong node! 2016-02-29 06:08:58.130417 7fd5d9961700 0 -- 10.0.1.2:0/36459 >> 10.0.1.4:6803/6002429 pipe(0x2a6c9000 sd=75 :37736 s=1 pgs=0 cs=0 l=1 c=0x2420be40).connect claims to be 10.0.1.4:6803/10002429 not 10.0.1.4:6803/6002429 - wrong node! 2016-02-29 06:08:58.130918 7fd5b1c17700 0 -- 10.0.1.2:0/36459 >> 10.0.0.1:6800/8050402 pipe(0x26834000 sd=74 :37605 s=1 pgs=0 cs=0 l=1 c=0x1f7a9020).connect claims to be 10.0.0.1:6800/9050770 not 10.0.0.1:6800/8050402 - wrong node! 2016-02-29 06:08:58.131266 7fd5be141700 0 -- 10.0.1.2:0/36459 >> 10.0.0.3:6806/9059302 pipe(0x27f07000 sd=76 :48347 s=1 pgs=0 cs=0 l=1 c=0x2371adc0).connect claims to be 10.0.0.3:6806/11059302 not 10.0.0.3:6806/9059302 - wrong node! 2016-02-29 06:08:58.131299 7fd5c1914700 0 -- 10.0.1.2:0/36459 >> 10.0.1.4:6801/9051245 pipe(0x2d288000 sd=100 :33848 s=1 pgs=0 cs=0 l=1 c=0x28b37760).connect claims to be 10.0.1.4:6801/12051245 not 10.0.1.4:6801/9051245 - wrong node! and 2016-02-29 06:08:59.230754 7fd5c5425700 -1 osd.3 14877 heartbeat_check: no reply from osd.0 since back 2016-02-29 05:55:26.351951 front 2016-02-29 05:55:26.351951 (cutoff 2016-02-29 06:08:39.230753) 2016-02-29 06:08:59.230761 7fd5c5425700 -1 osd.3 14877 heartbeat_check: no reply from osd.1 since back 2016-02-29 05:41:59.191341 front 2016-02-29 05:41:59.191341 (cutoff 2016-02-29 06:08:39.230753) 2016-02-29 06:08:59.230765 7fd5c5425700 -1 osd.3 14877 heartbeat_check: no reply from osd.2 since back 2016-02-29 05:41:59.191341 front 2016-02-29 05:41:59.191341 (cutoff 2016-02-29 06:08:39.230753) 2016-02-29 06:08:59.230769 7fd5c5425700 -1 osd.3 14877 heartbeat_check: no reply from osd.4 since back 2016-02-29 05:55:30.452505 front 2016-02-29 05:55:30.452505 (cutoff 2016-02-29 06:08:39.230753) 2016-02-29 06:08:59.230773 7fd5c5425700 -1 osd.3 14877 heartbeat_check: no reply from osd.7 since back 2016-02-29 05:41:52.790422 front 2016-02-29 05:41:52.790422 (cutoff 2016-02-29 06:08:39.230753) Any idea what could be the trouble of the cluster ? Thank you ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com