OSD's addrvec, not getting msgr v2 address, PGs stuck unknown or peering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Running 14.2.4 (but same issue observed on 14.2.2) we have a problem with, thankfully a testing cluster, where all pgs are failing to peer and are stuck in peering or unknown stale etc states. 

My working theory is that this is because the OSDs dont seem to be utilizing msgr v2 as "ceph osd find osd.NN" only lists the v1 in the addrvec. This is in contrast to our working 14.2.4 clusters where both v1 and v2 are listed.

Our monitors via `ceph mon dump` show each mon running on v1 and v2 on the default ports (3300/6789) and I able to reach each of those ports on all the mons from a few test OSD nodes.

OSD logs are filled with heartbeat_check: no reply from <IP> <OSD.XY> ever on either front or back

I have attempted to modify the ceph.conf mon_host on the OSDs to use either the standard comma separate ip list and the new bracketed format and then restarting OSD daemons on a number of OSDs but it doesnt seem to impact the addrvec. 

My desire is to get the OSDs working on V2 and see if they are able to begin peering. How can I force the addrvec to update?  Thanks.

Respectfully,

Wes Dillingham
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux