This exact problem with the OS disk and problems deploying lots of servers in an efficient way was the main motivator for developing our croit orchestration product: https://croit.io I've talked about this on a few Ceph days, but the short summary is: We started with Ceph in 2013 and decided to use SATA DOMs with a custom installer for Ubuntu (derived from an existing internal tool; but it's just a live image that calls debootstrap and creates a few config files). That worked reasonably well for some time. But a year later or so more and more of the SATA DOMs started to fail and Servers failed in the most annoying way possible: Byzantine failures -- locking up some random CPU cores and no longer replying to some services/requests while appearing perfectly healthy to others... We thought we didn't write anything to the sticks (logs send via rsyslogd), but I guess we missed something. I suspect the ntp drift while might have been one of the problems. We had several nodes fail within a week when we decided that this approach clearly wasn't working out for us. Maybe SATA DOMs are better today? This was in 2013. Maybe we should have caught what was writing to these disks. Anyways, there is no point in installing an operating system on something like a Ceph OSD server. You got lots of them and they all look the same. So I wrote a quick and dirty PXE live boot system based on Debian. It was really just a collection of shell scripts that creates the image and DHCP server configuration. Debian (and Ubuntu) make that *really easy*. You basically just run deboostrap, add initramfs-live, customize the chroot and put the result in a squashfs image, that's it. (CentOS/RHEL is significantly more complicated because of dracut. I do like dracut, but the way it does live boot is unnecessarily complicated.) The initial prototype running on one of the OSD servers took a few hours to create. It then grew into an unmaintanable mess of bash scripts over the coming years... We started croit based on this idea in early 2017. It's based on the same concept, but the whole implementation behind it is completely new. Dockerized deployment, fully featured REST API on a Kotlin/Java stack for management, vue.js HTML5 UI, ... Also, we are still planning to open source it later this year (working on separating some componenents to release it as 'open core') What I'm saying is: there are only very few circumstances under which I would consider installing an operating system on a server that is used as a "cattle server". It makes no sense in most setups, you just add a point of failure, management overhead and you waste time when deploying a server. Adding a new OSD server on Ceph deployments that we manage is this simple: put server into the rack, plug it in, boot it. That's it. No need to install or configure *anything* on the server. I also really like the "immutable infrastructure" part of these deployments. I can easily get back to clean slate by rebooting servers. I can upgrade lots of servers by running a rolling reboot task. Paul 2018-08-17 11:01 GMT+02:00 Daznis <daznis@xxxxxxxxx>: > Hi, > > We used a PXE boot with NFS server, but had some issues if NFS server > crapped out and dropped connections or needed a reboot for > maintenance. If I remember it correctly it sometimes took out some of > the rebooted servers. So we switched to PXE with livecd based images. > You basically create a livecd image, then boot it with specially > prepared initramfs image and it uses a copy on write disk for basic > storage. With mimic osd's are started automatically, just need to feed > some basic settings for that server. > On Fri, Aug 17, 2018 at 11:31 AM Florian Florensa <florian@xxxxxxxxxxx> wrote: >> >> What about PXE booting the OSD's server ? I am considering doing these >> sort of things as it doesn't seem that complicated. >> A simple script could easily bring the osd back onine using some lvm >> commands to bring the lvm back online and then some ceph-lvm activate >> command to fire the osd's back up. >> >> >> 2018-08-15 16:09 GMT+02:00 Götz Reinicke <goetz.reinicke@xxxxxxxxxxxxxxx>: >> > Hi, >> > >> >> Am 15.08.2018 um 15:11 schrieb Steven Vacaroaia <stef97@xxxxxxxxx>: >> >> >> >> Thank you all >> >> >> >> Since all concerns were about reliability I am assuming performance impact of having OS running on SD card is minimal / negligible >> > >> > some time ago we had a some Cisco Blades booting VMware esxi from SD cards and hat no issue for month …till after an update the blade was rebooted and the SD failed …and then an other on an other server … From my POV at that time the „server" SDs where not close as reliable as SSDs or rotating disks. My experiences from some years ago. >> > >> >> >> >> In other words, an OSD server is not writing/reading from Linux OS partitions too much ( especially with logs at minimum ) >> >> so its performance is not dependent on what type of disk OS resides on >> > >> > Regarding performance: What kind of SDs are supported? You can get some "SDXCTM | UHS-II | U3 | Class 10 | V90“ which can handle up to 260 MBytes/sec; like „Angelbird Matchpack EVA1“ ok they are Panasonic 4K Camera certified (and we use them currently to record 4K video) >> > >> > https://www.angelbird.com/prod/match-pack-for-panasonic-eva1-1836/ >> > >> > My2cents . Götz >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com