Thanks, Dan. I thought so but wanted to verify. I'll see if I can work up a doc PR to clarify this. >> The documentation here: >> >> http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ >> >> says >> >> "Each Ceph OSD Daemon checks the heartbeat of other Ceph OSD Daemons every 6 seconds" >> >> and >> >> " If a neighboring Ceph OSD Daemon doesn’t show a heartbeat within a 20 second grace period, the Ceph OSD Daemon may consider the neighboring Ceph OSD Daemon down and report it back to a Ceph Monitor," >> >> I've always thought that each OSD heartbeats with *every* other OSD, which of course means that total heartbeat traffic grows ~ quadratically. However in extending test we've observed that the number of other OSDs that a subject heartbeat (heartbeated?) was < N, which has us wondering if perhaps only OSDs with which a given OSD shares are contacted -- or some other subset. >> > > OSDs heartbeat with their peers, the set of osds with whom they share > at least one PG. > You can see the heartbeat peers (HB_PEERS) in ceph pg dump -- after > the header "OSD_STAT USED AVAIL TOTAL HB_PEERS..." > > This is one of the nice features of the placement group concept -- > heartbeats and peering in general stays constant with the number of > PGs per OSD, rather than scaling up with the total number of OSDs in a > cluster. > > Cheers, Dan > > >> I plan to submit a doc fix for mon_osd_min_down_reporters and wanted to resolve this FUD first. >> >> -- aad >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com