On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl <johannes.liebl@xxxxxxxx> wrote: > > Hi Ceph Users, > > > I am currently evaluating different cluster layouts and as a test I stopped two of my three monitors while client traffic was running on the nodes.? > > > Only when I restartet an OSD all PGs which were related to that OSD went down, but the rest were still active and serving requests. > > > A second try ran for 5:30 Hours without a hitch after which I aborted the Test since nothing was happening. > > > Now I want to know; Is this behavior by design? > > It strikes me as odd that this more or less undefined state is still operational. Yep, it's on purpose! I would not count on this behavior because a lot of routine operations can disturb it[1], but Ceph does its best to continue operating as it can by not relying on the other daemons whenever possible. Monitors are required for updates to the cluster maps, but as long as the cluster is stable and no new maps need to be generated, things will keep operating until something requires an update and that gets blocked. As you saw, when an OSD got restarted, that changed the cluster state and required updates which couldn't get processed, so the affected PGs couldn't go active. -Greg [1]: RBD snapshots go through the monitors; MDSes send beacons to the monitors and will shut down if those don't get acknowledged so I don't think CephFS will keep running in this case; CephX does key rotations which will eventually block access to the OSDs as keys time out; any kind of PG peering or recovery needs the monitors to update values; etc. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx