> Op 11 augustus 2016 om 13:04 schreef Nick Fisk <nick@xxxxxxxxxx>: > > > > -----Original Message----- > > From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Wido den Hollander > > Sent: 11 August 2016 11:54 > > To: Dan van der Ster <dan@xxxxxxxxxxxxxx> > > Cc: ceph-devel <ceph-devel@xxxxxxxxxxxxxxx> > > Subject: Re: Delayed start of OSDs with systemd to prevent slowdowns > > > > > > > Op 11 augustus 2016 om 1:05 schreef Dan van der Ster <dan@xxxxxxxxxxxxxx>: > > > > > > > > > Hi Wido, > > > > > > I can't help you with the systemd foo, but I have two quick remarks ... > > > > > > > Nobody seems to be able :) systemd is new to everybody. > > > > > 1. I've always thought that this was a nice feature of systemd... To > > > get the osds booted and online as quickly as possible. And we've not > > > seen the peering issues you described. > > > > > > > Yes, it's a nice feature. But these machines have 24 OSDs on a Dual Xeon E5 and the have a difficult time in coming up. > > Would something like this do what you want? > > https://www.freedesktop.org/software/systemd/man/systemd.timer.html#RandomizedDelaySec= > > You have to set up a system timer for the service, a bit of research on that is probably needed. > Good pointer! I'll test with that a bit and report back. Might take a while as I'm traveling a lot these weeks. Wido > I've found the delay seems to grow with the size of the OSD as well. Ie 6TB disks take a lot longer to start up than 2TB ones. > > > > > Have to note that it's a mixed 0.94.7 <> 10.2.2 cluster, so it might be something there. > > > > > 2. The crush weight change shouldn't cause problems, if the weight > > > isn't actually changing. IIRC it shouldn't even change the osdmap if > > > the value doesn't change. > > > > > > > True, my bad indeed. That doesn't trigger a change, but it keeps the MONs busy however. > > > > Wido > > > > > Cheers, Dan > > > > > > > > > On 10 Aug 2016 11:01 a.m., "Wido den Hollander" <wido@xxxxxxxx> wrote: > > > > > > > > Hi, > > > > > > > > Currently with the systemd design a booting system will start all > > > > OSDs at > > > the same time. This means that a cluster suddenly gets a bunch of > > > CRUSH updates (if update on start is enabled), booting OSDs and PGs > > > which go into peering state. > > > > > > > > My systemd-foo isn't that good, but I was wondering if there is a > > > > way to > > > modify ceph-osd.target in such a way that it doesn't start all the > > > OSDs in parallel? > > > > > > > > I would like 1 or maybe 2 OSDs to start at the same time with a > > > > delay of > > > 120 seconds in between. This way the boot will take longer, but the > > > impact on the cluster will be less. > > > > > > > > Any ideas on how we might achieve this with systemd? > > > > > > > > Wido > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe > > > > ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More > > majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html