Re: corrupt OSD: BlueFS.cc: 828: FAILED assert

Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> · Thu, 5 Jul 2018 14:43:42 +0100

Hi Igor,

Many thanks for the quick reply.

Your advice concurs with my own thoughts, given the damage, probably
safest to wipe the OSD's and start over.

thanks again,

Jake

On 05/07/18 14:28, Igor Fedotov wrote:
> Hi Jake,
> 
> IMO it doesn't make sense to recover from this drive/data as the damage
> coverage looks pretty wide.
> 
> By modifying BlueFS code you can bypass that specific assertion but most
> probably BlueFS and  other BlueStore stuff are pretty inconsistent and
> most probably are unrecoverable at the moment. Given that you have valid
> replicated data it's much simpler just to start these OSDs over.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 7/5/2018 3:58 PM, Jake Grimmett wrote:
>> Dear All,
>>
>> I have a Mimic (13.2.0) cluster, which, due to a bad disk controller,
>> corrupted three Bluestore OSD's on one node.
>>
>> Unfortunately these three OSD's crash when they try to start.
>>
>> systemctl start ceph-osd@193
>> (snip)
>> /BlueFS.cc: 828: FAILED assert(r != q->second->file_map.end())
>>
>> Full log here: http://p.ip.fi/yFYn
>>
>> "ceph-bluestore-tool repair" also crashes, with a similar error in
>> BlueFS.cc
>>
>> # ceph-bluestore-tool repair --dev /dev/sdc2 --path
>> /var/lib/ceph/osd/ceph-193
>> (snip)
>> /BlueFS.cc: 828: FAILED assert(r != q->second->file_map.end())
>>
>> Full log here: http://p.ip.fi/l_Q_
>>
>> This command works OK:
>>
>> # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-193
>> inferring bluefs devices from bluestore path
>> {
>>      "/var/lib/ceph/osd/ceph-193/block": {
>>          "osd_uuid": "90b25336-9932-4e0b-a16b-51159568c398",
>>          "size": 8001457295360,
>>          "btime": "2017-12-08 15:46:40.034495",
>>          "description": "main",
>>          "bluefs": "1",
>>          "ceph_fsid": "f035ee98-abfd-4496-b903-a403b29c828f",
>>          "kv_backend": "rocksdb",
>>          "magic": "ceph osd volume v026",
>>          "mkfs_done": "yes",
>>          "ready": "ready",
>>          "whoami": "193"
>>      }
>> }
>>
>> # lsblk | grep sdc
>> sdc       8:32   0   7.3T  0 disk
>> ├─sdc1    8:33   0   100M  0 part  /var/lib/ceph/osd/ceph-193
>> └─sdc2    8:34   0   7.3T  0 part
>>
>> Since the OSD's failed, the Cluster has rebalanced, though I still have
>> ceph HEALTH_ERR:
>> 95 scrub errors; Possible data damage: 11 pgs inconsistent
>>
>> Manual scrubs are not started by the OSD demons (reported elsewhere, see
>>    "ceph pg scrub" does not start)
>>
>> Looking at the old logs, I see ~3500 entries in the logs of the bad
>> OSDs, all similar to:
>>
>>      -9> 2018-07-04 14:42:34.744 7f9ef0bbb1c0  2 rocksdb:
>> [/root/ceph-build/ceph-13.2.0/src/rocksdb/db/version_set.cc:1330] Unable
>> to load table properties for file 43530 --- Corruption: bad block
>> contents���5b
>>
>> There are a much smaller number of crc errors, similar to :
>>
>> 2> 2018-07-02 12:58:07.702 7fd3649eb1c0 -1
>> bluestore(/var/lib/ceph/osd/ceph-425) _verify_csum bad crc32c/0x1000
>> checksum at blob offset 0x0, got 0xff625379, expected 0x75b558bc, device
>> location [0xf5a66e0000~1000], logical extent 0x0~1000, object
>> #-1:2c691ffb:::osdmap.176500:0#
>>
>> I'm inclined to wipe these three OSD's and start again, but am happy to
>> try suggestions to repair.
>>
>> thanks for any suggestions,
>>
>> Jake
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com