Re: BlueStore not surviving power outage

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Wed, 28 Apr 2021 14:51:41 +0200

On 27/04/2021 10:54, Xuehan Xu wrote:
Hi, everyone.

Recently, one of our online cluster experienced a whole cluster power
outage, and after the power recovered, many osd started to log the
following error:

2021-04-27 15:38:05.503 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.504 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.505 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:05.506 2b372b957700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x36000, got 0x41fe1397, expected 0x8d7f5975,
device location [0xa7e76000~1000], logical extent 0x1b6000~1000,
object #9:45a4e02a:::rbd_data.3b35df93038d.0000000000000095:head#
2021-04-27 15:38:28.379 2b372c158700 -1
bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x40000, got 0xce935e16, expected 0x9b502da7,
device location [0xa9a80000~1000], logical extent 0x80000~1000, object
#9:c2a6d9ae:::rbd_data.3b35df93038d.0000000000000696:head#

We are using Nautilus 14.2.10 version, and we put rocksdb on top of
SSDs while bluestore data on SATA disks. It seems that the BlueStore
didn't survive the power outage, is it supposed to behave this way? Is
there any way to prevent it?

Thanks:-)
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

it could also happen due to low cost consumer SSD drives, the majority 
do not support Power Loss Protection (PLP). PLP is different than 
support for sync/flush which is the supported by nearly all SSDs, as i 
understand SSDs do a read/edit/erase/write cycle in larger blocks of 
64-256KB and without PLP could lose data during the erase phase in case 
of power loss. Always use enterprise grade SSDs for this as well as 
other reasons like DWPD..etc

A while ago we spent about a month doing power failure tests testing 
durability of dm-writecache while writing high throughput and iops, we 
would get 1 or 2 failures out of 10 power cycles when using cheap SSDs 
ranging from inconsistent pgs to unfound objects, we would get no 
problems when using drives with PLP.

/Maged

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx