Re: SSD recommendations for RBD and VM's

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Sat, 5 Jun 2021 17:10:51 -0700

>> I wonder that when a osd came back from power-lost, all the data
>> scrubbing and there are 2 other copies.
>> PLP is important on mostly Block Storage, Ceph should easily recover
>> from that situation.
>> That's why I don't understand why I should pay more for PLP and other
>> protections.
> 
> I'm no expert (or power user) al all, but my reasoning is: if something power-related can take down one of my servers it can just as easily take down *all* my ceph servers at once.
> 
> And that could just as easily render all three copies inacessible.

Or even two.  I’ve been through a protracted outage (not power related) that involved widespread OSD flapping.  Despite having not lost OSDs in the end, somehow a single RADOS object ended up lost, in an RBD head.  Very much a corner case, but if we’d been using 2R it would have been gruesome.

On another occasion I saw a power inductor / PSU failure take down power in an entire DC row.  Fortunately we were using redundant PSUs on different circuits.  One node went down nonetheless — the PSU on the surviving power feed had a previous issue that wasn’t caught because PSUs weren’t monitored.  As with active/passive network bonds, this showed the importance of monitoring and addressing latent faults so you don’t find them at exactly the wrong time.

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx