Re: BlueStore not surviving power outage

Wido den Hollander <wido@xxxxxxxx> · Wed, 28 Apr 2021 09:34:30 +0200

On 28/04/2021 09:13, Janne Johansson wrote:
Den ons 28 apr. 2021 kl 04:25 skrev Xuehan Xu <xxhdx1985126@xxxxxxxxx>:
There is a RAID HBA in each of the machines in our clusters, to which
all SATA disks are attached. We configured the RAID HBA cache mode to
"write through", but, as I checked yesterday, the BBU of the RAID HBAs
are not charged. I'm not quite sure whether the BBU has something to
do with the data loss, as far as I know, all data should be persisted
to the underlying disk before acknowledging upper layer systems when
cache mode is "write through". Am I missing anything? Thanks:-)

If the raid card was good, it would change caching strategy when/if
the BBU has no power left, but if it didn't and "it was good when we
last booted up", then it is possible that it 'promised' that writing
to BBU-backed RAM was ok for ack'ing the writes even if they are not
on disk yet, and when the BBU failed (for whatever reason), then this
promise was not honored and lots of writes were lost.

In addition: In 2020 I have seen two cases (as a Ceph consultant) of 
severe data corruption with BlueStore after a power failure.

In both cases this happened on systems where an HBA was involved. In the 
end we blaimed the HBAs which were in RAID mode.

I have done extensive power failure testing afterwards on NVMe-only and 
on systems with HBAs in JBOD mode and I was never able to reproduce the 
data corruption after a power failure.

My suspicion is still that the HBAs were caching some data and it was 
not written to the medium before the power failed although BlueStore was 
told it was.

My bet: This is the HBA, not BlueStore's fault.

Wido
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx