Re: Dear Abby: Why Is Architecting CEPH So Hard?

Brian Topping <brian.topping@xxxxxxxxx> · Wed, 22 Apr 2020 16:32:33 -0600

Great set of suggestions, thanks! One to consider:

> On Apr 22, 2020, at 4:14 PM, Jack <ceph@xxxxxxxxxxxxxx> wrote:
> 
> I use 32GB flash-based satadom devices for root device
> They are basically SSD, and do not take front slots
> As they are never burning up, we never replace them
> Ergo, the need to "open" the server is not an issue

This is probably the wrong forum to understand how you are not burning them out. Any kind of logs or monitor databases on a small SATADOM will cook them quick, especially an MLC. There is no extra space for wear leveling and the like. I tried making it work with fancy systemd logging to memory and having those logs pulled by a log scraper storing to the actual data drives, but there was no place for the monitor DB. No monitor DB means Ceph doesn’t load, and if a monitor DB gets corrupted, it’s perilous for the cluster and instant death if the monitors aren’t replicated.

My node chassis have two motherboards and each is hard limited to four SSDs. On each node, `/boot` is mirrored (RAID1) on partition 1, `/` is stripe/mirrored (RAID10) on p2, then used whatever was left for ceph data on partition 3 of each disk. This way any disk could fail and I could still boot. Merging the volumes (ie no SATADOM), wear leveling was statistically more effective. And I don’t have to get into crazy system configurations that nobody would want to maintain or document.

$0.02…

Brian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx