OSD hearbeat_check failure while using 10Gb/s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

6 host 16 OSD cluster here, all SATA SSDs. All Ceph daemons version 18.2.2. Host OS is Ubuntu 24.04. Intel X540 10Gb/s interfaces for cluster network. All is fine while using a 1Gb/s switch. When moved to 10Gb/s switch (Netgear XS712T), OSDs, one-by-one start failing heartbeat checks and are marked as 'down' until only 3 or 4 OSDs remain up. By then cluster is unusable (slow ops, PGs inactive).

Here is a sample sequence from the log of one of the OSDs:

ceph-osd[23402]: osd.3 77434 heartbeat_check: no reply from 129.170.x.x:6802 osd.13 ever on either front or back

ceph-osd[23402]: log_channel(cluster) log [WRN] : 101 slow requests (by type [ 'delayed' : 101 ] most affected pool [ 'default.rgw.log' : 96 ])

ceph-osd[23402]: log_channel(cluster) log [WRN] : Monitor daemon marked osd.3 down, but it is still running

ceph-osd[23402]: log_channel(cluster) log [DBG] : map e77442 wrongly marked me down at e77441

ceph-osd[23402]: osd.3 77442 start_waiting_for_healthy

ceph-osd[23402]: osd.3 77434 is_healthy false -- only 0/10 up peers (less than 33%)

ceph-osd[23402]: osd.3 77434 not healthy; waiting to boot

OSD service container keeps running, but it is not booted.

Has anyone experienced this? Any ideas on what should be fixed? Please let me know what other info would be useful.

Best regards,
--
Sarunas Burdulis
Dartmouth Mathematics
math.dartmouth.edu/~sarunas

· https://useplaintext.email ·

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux