Re: BlueStore not surviving power outage

Xuehan Xu <xxhdx1985126@xxxxxxxxx> · Wed, 28 Apr 2021 10:24:56 +0800

Thanks, everyone.

There is a RAID HBA in each of the machines in our clusters, to which
all SATA disks are attached. We configured the RAID HBA cache mode to
"write through", but, as I checked yesterday, the BBU of the RAID HBAs
are not charged. I'm not quite sure whether the BBU has something to
do with the data loss, as far as I know, all data should be persisted
to the underlying disk before acknowledging upper layer systems when
cache mode is "write through". Am I missing anything? Thanks:-)

On Tue, 27 Apr 2021 at 23:43, Martin Verges <martin.verges@xxxxxxxx> wrote:
>
> What drives do you use? Do they have PLP (power loss protection)? Is
> there any form of raid controller involved?
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.verges@xxxxxxxx
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
> On Tue, 27 Apr 2021 at 10:54, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
> >
> > Hi, everyone.
> >
> > Recently, one of our online cluster experienced a whole cluster power
> > outage, and after the power recovered, many osd started to log the
> > following error:
> >
> > 2021-04-27 15:38:05.503 2b372b957700 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
> > device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
> > object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
> > 2021-04-27 15:38:05.504 2b372b957700 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
> > device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
> > object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
> > 2021-04-27 15:38:05.505 2b372b957700 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
> > device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
> > object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
> > 2021-04-27 15:38:05.506 2b372b957700 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
> > device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
> > object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
> > 2021-04-27 15:38:28.379 2b372c158700 -1
> > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
> > checksum at blob offset 0x40000, got 0xce935e16, expected 0x9b502da7,
> > device location [0xa9a80000~1000], logical extent 0x80000~1000, object
> > #9:c2a6d9ae:::rbd_data.3b35df93038d.0000000000000696:head#
> >
> > We are using Nautilus 14.2.10 version, and we put rocksdb on top of
> > SSDs while bluestore data on SATA disks. It seems that the BlueStore
> > didn't survive the power outage, is it supposed to behave this way? Is
> > there any way to prevent it?
> >
> > Thanks:-)
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx