Re: Cluster running without monitors

Gregory Farnum <gfarnum@xxxxxxxxxx> · Thu, 28 Jul 2022 07:47:16 -0700

On Thu, Jul 28, 2022 at 5:32 AM Johannes Liebl <johannes.liebl@xxxxxxxx> wrote:
>
> Hi Ceph Users,
>
>
> I am currently evaluating different cluster layouts and as a test I stopped two of my three monitors while client traffic was running on the nodes.?
>
>
> Only when I restartet an OSD all PGs which were related to that OSD went down, but the rest were still active and serving requests.
>
>
> A second try ran for 5:30 Hours without a hitch after which I aborted the Test since nothing was happening.
>
>
> Now I want to know; Is this behavior by design?
>
> It strikes me as odd that this more or less undefined state is still operational.

Yep, it's on purpose! I would not count on this behavior because a lot
of routine operations can disturb it[1], but Ceph does its best to
continue operating as it can by not relying on the other daemons
whenever possible.

Monitors are required for updates to the cluster maps, but as long as
the cluster is stable and no new maps need to be generated, things
will keep operating until something requires an update and that gets
blocked. As you saw, when an OSD got restarted, that changed the
cluster state and required updates which couldn't get processed, so
the affected PGs couldn't go active.
-Greg
[1]: RBD snapshots go through the monitors; MDSes send beacons to the
monitors and will shut down if those don't get acknowledged so I don't
think CephFS will keep running in this case; CephX does key rotations
which will eventually block access to the OSDs as keys time out; any
kind of PG peering or recovery needs the monitors to update values;
etc.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx