Hi, Last night I've spend a couple of hours debugging a issue where OSDs would be marked as 'up', but then PGs stayed in the 'peering' state. Looking through the admin socket I saw these OSDs were in the 'booting' state. Looking at the OSDMap I saw this: osd.3 up in weight 1 up_from 26 up_thru 700 down_at 0 last_clean_interval [0,0) [v2:[2a05:xx0:700:2::7]:6816/7923,v1:[2a05:xx:700:2::7]:6817/7923,v2:0.0.0.0:6818/7923,v1:0.0.0.0:6819/7923] [v2:[2a05:xx:700:2::7]:6820/7923,v1:[2a05:1500:700:2::7]:6821/7923,v2:0.0.0.0:6822/7923,v1:0.0.0.0:6823/7923] exists,up 786d3e9d-047f-4b09-b368-db9e8dc0805d In ceph.conf this was set: ms_bind_ipv6 = true public_addr = 2a05:xx:700:2::6 On true IPv6-only nodes this works fine. But on nodes where there is also IPv4 present this can (and will?) cause problems. It did not use tcpdump/wireshark to investigate, but it seems that the OSDs tried to contact each other. Using the 0.0.0.0 IPv4 address. After adding these settings the problems were resolved: ms_bind_msgr1 = false ms_bind_ipv4 = false This also disables msgrv1 as we didn't need it here. A cluster and clients all running Octopus. The OSDMap now showed: osd.3 up in weight 1 up_from 704 up_thru 712 down_at 702 last_clean_interval [26,701) v2:[2a05:xx:700:2::7]:6804/791503 v2:[2a05:xx:700:2::7]:6805/791503 exists,up 786d3e9d-047f-4b09-b368-db9e8dc0805d OSDs can back right away, PGs peered and the problems were resolved. Wido _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx