Thank you Eugen! After finding what the target name actually was it all worked like a charm. Best regards, Mikael On Wed, Jun 21, 2023 at 11:05 AM Eugen Block <eblock@xxxxxx> wrote: > Hi, > > > Will that try to be smart and just restart a few at a time to keep things > > up and available. Or will it just trigger a restart everywhere > > simultaneously. > > basically, that's what happens for example during an upgrade if > services are restarted. It's designed to be a rolling upgrade > procedure so restarting all daemons of a specific service at the same > time would cause an interruption. So the daemons are scheduled to > restart and the mgr decides when it's safe to restart the next (this > is a test cluster started on Nautilus, but it's on Quincy now): > > nautilus:~ # ceph orch restart osd.osd-hdd-ssd > Scheduled to restart osd.5 on host 'nautilus' > Scheduled to restart osd.0 on host 'nautilus' > Scheduled to restart osd.2 on host 'nautilus' > Scheduled to restart osd.1 on host 'nautilus2' > Scheduled to restart osd.4 on host 'nautilus2' > Scheduled to restart osd.7 on host 'nautilus2' > Scheduled to restart osd.3 on host 'nautilus3' > Scheduled to restart osd.8 on host 'nautilus3' > Scheduled to restart osd.6 on host 'nautilus3' > > When it comes to OSDs it's possible (or even likely) that multiple > OSDs are restarted at the same time, depending on the pools (and their > replication size) they are part of. But ceph tries to avoid "inactive > PGs" which is critical, of course. An edge case would be a pool with > size 1 where restarting an OSD would cause an inactive PG until the > OSD is up again. But since size 1 would be a bad idea anyway (except > for testing purposes) you'd have to live with that. > If you have the option I'd recommend to create a test cluster and play > around with these things to get a better understanding, especially > when it comes to upgrade tests etc. > > > I guess in my current scenario, restarting one host at the time makes > most > > sense, with a > > systemctl restart ceph-{fsid}.target > > and then checking that "ceph -s" says OK before proceeding to the next > > Yes, if your crush-failure-domain is host that should be safe, too. > > Regards, > Eugen > > Zitat von Mikael Öhman <micketeer@xxxxxxxxx>: > > > The documentation very briefly explains a few core commands for > restarting > > things; > > > https://docs.ceph.com/en/quincy/cephadm/operations/#starting-and-stopping-daemons > > but I feel I'm lacking quite some details of what is safe to do. > > > > I have a system in production, clusters connected via CephFS and some > > shared block devices. > > We would like to restart some things due to some new network > > configurations. Going daemon by daemon would take forever, so I'm curious > > as to what happens if one tries the command; > > > > ceph orch restart osd > > > > Will that try to be smart and just restart a few at a time to keep things > > up and available. Or will it just trigger a restart everywhere > > simultaneously. > > > > I guess in my current scenario, restarting one host at the time makes > most > > sense, with a > > systemctl restart ceph-{fsid}.target > > and then checking that "ceph -s" says OK before proceeding to the next > > host, but I'm still curious as to what the "ceph orch restart xxx" > command > > would do (but not enough to try it out in production) > > > > Best regards, Mikael > > Chalmers University of Technology > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx