>> If the raid card was good, it would change caching strategy when/if >> the BBU has no power left, but if it didn't and "it was good when we >> last booted up", then it is possible that it 'promised' that writing >> to BBU-backed RAM was ok for ack'ing the writes even if they are not >> on disk yet, and when the BBU failed (for whatever reason), then this >> promise was not honored and lots of writes were lost. > > In addition: In 2020 I have seen two cases (as a Ceph consultant) of severe data corruption with BlueStore after a power failure. > > In both cases this happened on systems where an HBA was involved. In the end we blaimed the HBAs which were in RAID mode. > > I have done extensive power failure testing afterwards on NVMe-only and on systems with HBAs in JBOD mode and I was never able to reproduce the data corruption after a power failure. > > My suspicion is still that the HBAs were caching some data and it was not written to the medium before the power failed although BlueStore was told it was. > > My bet: This is the HBA, not BlueStore's fault. > > Wido I fully agree. cf. a post I made … nearly two years ago. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-July/036237.html I had all manner of similar experiences with RoC HBAs: - cache retention and flushing bugs - hardware issues requiring cards (300+ just in my dept) to be reworked that — you guessed it — prevented cache preservation from working properly - firmware and management utility that silently enabled the drives’ volatile cache — and lied about it. - Flaky BBU/supercap modules with rather finicky connectors. I would not be surprised if the HBAs in question are a few years behind in firmware updates. The wrap-every-drive-in-a-VD strategy is all too familiar. Setting the HBA into the JBOD personality, enabling passthrough, or flashing with IT firmware are what I suggest, depending on the model in question. _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx