Re: BlueStore not surviving power outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




>> If the raid card was good, it would change caching strategy when/if
>> the BBU has no power left, but if it didn't and "it was good when we
>> last booted up", then it is possible that it 'promised' that writing
>> to BBU-backed RAM was ok for ack'ing the writes even if they are not
>> on disk yet, and when the BBU failed (for whatever reason), then this
>> promise was not honored and lots of writes were lost.
> 
> In addition: In 2020 I have seen two cases (as a Ceph consultant) of severe data corruption with BlueStore after a power failure.
> 
> In both cases this happened on systems where an HBA was involved. In the end we blaimed the HBAs which were in RAID mode.
> 
> I have done extensive power failure testing afterwards on NVMe-only and on systems with HBAs in JBOD mode and I was never able to reproduce the data corruption after a power failure.
> 
> My suspicion is still that the HBAs were caching some data and it was not written to the medium before the power failed although BlueStore was told it was.
> 
> My bet: This is the HBA, not BlueStore's fault.
> 
> Wido


I fully agree.  cf. a post I made … nearly two years ago.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036237.html

I had all manner of similar experiences with RoC HBAs:

- cache retention and flushing bugs
- hardware issues requiring cards (300+ just in my dept) to be reworked that — you guessed it — prevented cache preservation from working properly
- firmware and management utility that silently enabled the drives’ volatile cache — and lied about it.
- Flaky BBU/supercap modules with rather finicky connectors.

I would not be surprised if the HBAs in question are a few years behind in firmware updates.  The wrap-every-drive-in-a-VD strategy is all too familiar.  Setting the HBA into the JBOD personality, enabling passthrough, or flashing with IT firmware are what I suggest, depending on the model in question.



_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux