> Op 25 oktober 2017 om 5:58 schreef Christian Sarrasin <c.nntp@xxxxxxxxxxxxxxxxxx>: > > > I'm planning to migrate an existing Filestore cluster with (SATA) > SSD-based journals fronting multiple HDD-hosted OSDs - should be a > common enough setup. So I've been trying to parse various contributions > here and Ceph devs' blog posts (for which, thanks!) > > Seems the best way to repurpose that hardware would basically be to use > those SSDs as DB partitions for Bluestore. > > The one thing I'm still wondering about is failure domains. With > Filestore and SSD-backed journals, an SSD failure would kill writes but > OSDs were otherwise still whole. Replacing the failed SSD quickly would > get you back on your feet with relatively little data movement. > Not true. If you loose your OSD's journal with FileStore without a clean shutdown of the OSD you loose the OSD. You'd have to rebalance the complete OSD. > Hence the question: what happens if a SSD that contains several > partitions hosting DBs for multiple OSDs fails? Is OSDs data still > recoverable upon replacing the SSD or is the entire lot basically toast? > It's lost. You need both the WAL+DB for a BlueStore OSD. So if the SSD dies where those reside on you have to wipe the OSDs and rebuild them. > If so, might this warrant revisiting the old debate about RAID-1'ing > SSDs in such as setup? Or I suppose at least not being too ambitious > with the number of DBs hosted on a single SSD? > I would not use RAID-1. Let's say you have 8 OSDs in a machine. Put 4 OSDs on each SSD. If you loose the SSD you loose 4 OSDs. Don't make this a too big deal. Make sure you failure domains are small enough so that your system can handle loosing a OSD. If your system can't handle a OSD rebuild you already have a problem in your Ceph cluster. Instead of using 8TB drives consider using 4TB or even 2TB but have more spindles. That way the impact of a single disk rebuild is less. Wido > Thoughts much appreciated! > > PS: It's not fully clear whether a separate WAL partition is useful in > that setup? Sage posted about a month back: "[WAL] will always just > spill over onto the next fastest device (wal -> db -> main)". I'll take > that as meaning that a separate WAL partition would be > counter-productive if hosted on the same SSD. Please correct me if I'm > wrong? > > Cheers > Christian > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com