Re: BlueStore not surviving power outage

Xuehan Xu <xxhdx1985126@xxxxxxxxx> · Wed, 28 Apr 2021 22:40:30 +0800

Thanks everyone.
We did massive power failure tests on our virtualization platform, and now have the confidence to say that this is not bluestore's fault. I think the best choice now is to change to a jbod configuration:)

Thanks again:)

On Wed, Apr 28, 2021, 20:51 Maged Mokhtar <mmokhtar@xxxxxxxxxxx> wrote:

On 27/04/2021 10:54, Xuehan Xu wrote:

> Hi, everyone.

>

> Recently, one of our online cluster experienced a whole cluster power

> outage, and after the power recovered, many osd started to log the

> following error:

>

> 2021-04-27 15:38:05.503 2b372b957700 -1

> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000

> checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,

> device location [0xa7e76000~1000], logical extent 0x1b6000~1000,

> object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#

> 2021-04-27 15:38:05.504 2b372b957700 -1

> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000

> checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,

> device location [0xa7e76000~1000], logical extent 0x1b6000~1000,

> object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#

> 2021-04-27 15:38:05.505 2b372b957700 -1

> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000

> checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,

> device location [0xa7e76000~1000], logical extent 0x1b6000~1000,

> object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#

> 2021-04-27 15:38:05.506 2b372b957700 -1

> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000

> checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,

> device location [0xa7e76000~1000], logical extent 0x1b6000~1000,

> object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#

> 2021-04-27 15:38:28.379 2b372c158700 -1

> bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000

> checksum at blob offset 0x40000, got 0xce935e16, expected 0x9b502da7,

> device location [0xa9a80000~1000], logical extent 0x80000~1000, object

> #9:c2a6d9ae:::rbd_data.3b35df93038d.0000000000000696:head#

>

> We are using Nautilus 14.2.10 version, and we put rocksdb on top of

> SSDs while bluestore data on SATA disks. It seems that the BlueStore

> didn't survive the power outage, is it supposed to behave this way? Is

> there any way to prevent it?

>

> Thanks:-)

> _______________________________________________

> Dev mailing list -- dev@xxxxxxx

> To unsubscribe send an email to dev-leave@xxxxxxx

it could also happen due to low cost consumer SSD drives, the majority 

do not support Power Loss Protection (PLP). PLP is different than 

support for sync/flush which is the supported by nearly all SSDs, as i 

understand SSDs do a read/edit/erase/write cycle in larger blocks of 

64-256KB and without PLP could lose data during the erase phase in case 

of power loss. Always use enterprise grade SSDs for this as well as 

other reasons like DWPD..etc

A while ago we spent about a month doing power failure tests testing 

durability of dm-writecache while writing high throughput and iops, we 

would get 1 or 2 failures out of 10 power cycles when using cheap SSDs 

ranging from inconsistent pgs to unfound objects, we would get no 

problems when using drives with PLP.

/Maged

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx