osd suddenly down / connect claims to be / heartbeat_check: no reply

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

i face here some trouble with the cluster.

Suddenly "random" OSD's are getting marked out.

After restarting the OSD on the specific node, its working again.

This happens usually during activated scrubbing/deep scrubbing.

In the logs i can see:

2016-02-29 06:08:58.130376 7fd5dae75700  0 -- 10.0.1.2:0/36459 >>
10.0.0.4:6807/9051245 pipe(0x27488000 sd=58 :60473 s=1 pgs=0 cs=0 l=1
c=0x28b39440).connect claims to be 10.0.0.4:6807/12051245 not
10.0.0.4:6807/9051245 - wrong node!
2016-02-29 06:08:58.130417 7fd5d9961700  0 -- 10.0.1.2:0/36459 >>
10.0.1.4:6803/6002429 pipe(0x2a6c9000 sd=75 :37736 s=1 pgs=0 cs=0 l=1
c=0x2420be40).connect claims to be 10.0.1.4:6803/10002429 not
10.0.1.4:6803/6002429 - wrong node!
2016-02-29 06:08:58.130918 7fd5b1c17700  0 -- 10.0.1.2:0/36459 >>
10.0.0.1:6800/8050402 pipe(0x26834000 sd=74 :37605 s=1 pgs=0 cs=0 l=1
c=0x1f7a9020).connect claims to be 10.0.0.1:6800/9050770 not
10.0.0.1:6800/8050402 - wrong node!
2016-02-29 06:08:58.131266 7fd5be141700  0 -- 10.0.1.2:0/36459 >>
10.0.0.3:6806/9059302 pipe(0x27f07000 sd=76 :48347 s=1 pgs=0 cs=0 l=1
c=0x2371adc0).connect claims to be 10.0.0.3:6806/11059302 not
10.0.0.3:6806/9059302 - wrong node!
2016-02-29 06:08:58.131299 7fd5c1914700  0 -- 10.0.1.2:0/36459 >>
10.0.1.4:6801/9051245 pipe(0x2d288000 sd=100 :33848 s=1 pgs=0 cs=0 l=1
c=0x28b37760).connect claims to be 10.0.1.4:6801/12051245 not
10.0.1.4:6801/9051245 - wrong node!

and

2016-02-29 06:08:59.230754 7fd5c5425700 -1 osd.3 14877 heartbeat_check:
no reply from osd.0 since back 2016-02-29 05:55:26.351951 front
2016-02-29 05:55:26.351951 (cutoff 2016-02-29 06:08:39.230753)
2016-02-29 06:08:59.230761 7fd5c5425700 -1 osd.3 14877 heartbeat_check:
no reply from osd.1 since back 2016-02-29 05:41:59.191341 front
2016-02-29 05:41:59.191341 (cutoff 2016-02-29 06:08:39.230753)
2016-02-29 06:08:59.230765 7fd5c5425700 -1 osd.3 14877 heartbeat_check:
no reply from osd.2 since back 2016-02-29 05:41:59.191341 front
2016-02-29 05:41:59.191341 (cutoff 2016-02-29 06:08:39.230753)
2016-02-29 06:08:59.230769 7fd5c5425700 -1 osd.3 14877 heartbeat_check:
no reply from osd.4 since back 2016-02-29 05:55:30.452505 front
2016-02-29 05:55:30.452505 (cutoff 2016-02-29 06:08:39.230753)
2016-02-29 06:08:59.230773 7fd5c5425700 -1 osd.3 14877 heartbeat_check:
no reply from osd.7 since back 2016-02-29 05:41:52.790422 front
2016-02-29 05:41:52.790422 (cutoff 2016-02-29 06:08:39.230753)


Any idea what could be the trouble of the cluster ?

Thank you !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux