Hi Sebastian,
first of all I'm not sure this issue has the same root cause as Francois
one. Highly likely it's just another BlueFS/RocksDB data corruption
which is indicated in the same way.
In this respect I would rather mention this one reported just yesterday:
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/M2ZRZD4725SRPFE5MMZPI7JBNO23FNU6/
So similarly I'd like to ask some questions/collect more data. Please
find the list below:
1) Is this a bare metal or containerized deployment?
2) What's the output for "hdparm -W <dev>" for devices in question? Any
enabled write caching at the disk controller?
3) Could you please share the broken OSD startup log with debug-bluefs
set to 20?
4) Could you please export bluefs files (this might need some extra
space to keep all the bluefs data at target filesystem) via
ceph-bluestore-tool and share the content of db/002182.sst file? The
first 4M would be generally sufficient if it's huge.
5) Have you seen RocksDB data corruptions at this cluster before
6)What's the disk h/w for these OSDs - disk drives and controllers?
7) Did you reboot the nodes or just restart the OSDs? Did all the
issues happen at the same or at different nodes? How many OSDs were
restarted total?
8) Is that correct that this is a hdd-only setup, there is no standaone
SSD/NVMe for WAL/DB?
9) Would you be able to run some long lasting (and potentially data
corrupting) experiments at this cluster in an attempt to pin point the
issue. I'm thinking about periodic OSD shutdown under the load to catch
the corrupting event. With a raised debug level for that specific OSD.
The major problem with this bug debugging is that we can see its
consequences - but we have no clue about what was happening when actual
corruption happened. Hence we need to reproduce that somehow. So please
let me know if we can use your cluster/help for that...
Thanks in advance,
Igor
On 12/21/2021 7:47 PM, Sebastian Mazza wrote:
Hi all,
after a reboot of a cluster 3 OSDs can not be started. The OSDs exit with the following error message:
2021-12-21T01:01:02.209+0100 7fd368cebf00 4 rocksdb: [db_impl/db_impl.cc:396] Shutdown: canceling all background work
2021-12-21T01:01:02.209+0100 7fd368cebf00 4 rocksdb: [db_impl/db_impl.cc:573] Shutdown complete
2021-12-21T01:01:02.209+0100 7fd368cebf00 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-21T01:01:02.213+0100 7fd368cebf00 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:
2021-12-21T01:01:02.213+0100 7fd368cebf00 1 bluefs umount
2021-12-21T01:01:02.213+0100 7fd368cebf00 1 bdev(0x559bbe0ea800 /var/lib/ceph/osd/ceph-7/block) close
2021-12-21T01:01:02.293+0100 7fd368cebf00 1 bdev(0x559bbe0ea400 /var/lib/ceph/osd/ceph-7/block) close
2021-12-21T01:01:02.537+0100 7fd368cebf00 -1 osd.7 0 OSD:init: unable to mount object store
2021-12-21T01:01:02.537+0100 7fd368cebf00 -1 ** ERROR: osd init failed: (5) Input/output error
I found a similar problem in this Mailing list: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/MJLVS7UPJ5AZKOYN3K2VQW7WIOEQGC5V/#MABLFA4FHG6SX7YN4S6BGSCP6DOAX6UE
In this thread, Francois was able to successfully repair his OSD data with `ceph-bluestore-tool fsck`. I tried to run:
`ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-7 -l /var/log/ceph/bluestore-tool-fsck-osd-7.log --log-level 20 > /var/log/ceph/bluestore-tool-fsck-osd-7.out 2>&1`
But that results in:
2021-12-21T16:44:18.455+0100 7fc54ef7a240 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-21T16:44:18.455+0100 7fc54ef7a240 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:
fsck failed: (5) Input/output error
I also tried to run `ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-7 repair`. But that also fails with:
2021-12-21T17:34:06.780+0100 7f35765f7240 0 bluestore(/var/lib/ceph/osd/ceph-7) _open_db_and_around read-only:0 repair:0
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bdev(0x55fce5a1a800 /var/lib/ceph/osd/ceph-7/block) open path /var/lib/ceph/osd/ceph-7/block
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bdev(0x55fce5a1a800 /var/lib/ceph/osd/ceph-7/block) open size 12000134430720 (0xae9ffc00000, 11 TiB)
block_size 4096 (4 KiB) rotational discard not supported
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bluestore(/var/lib/ceph/osd/ceph-7) _set_cache_sizes cache_size 1073741824 meta 0.45 kv 0.45 data 0.06
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bdev(0x55fce5a1ac00 /var/lib/ceph/osd/ceph-7/block) open path /var/lib/ceph/osd/ceph-7/block
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bdev(0x55fce5a1ac00 /var/lib/ceph/osd/ceph-7/block) open size 12000134430720 (0xae9ffc00000, 11 TiB)
block_size 4096 (4 KiB) rotational discard not supported
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-7/block size 11 TiB
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bluefs mount
2021-12-21T17:34:06.780+0100 7f35765f7240 1 bluefs _init_alloc shared, id 1, capacity 0xae9ffc00000, block size 0x10000
2021-12-21T17:34:06.904+0100 7f35765f7240 1 bluefs mount shared_bdev_used = 0
2021-12-21T17:34:06.904+0100 7f35765f7240 1 bluestore(/var/lib/ceph/osd/ceph-7) _prepare_db_environment set db_paths to db,11400127709184 db.slow,11400127709184
2021-12-21T17:34:06.908+0100 7f35765f7240 -1 rocksdb: Corruption: Bad table magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-21T17:34:06.908+0100 7f35765f7240 -1 bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:
2021-12-21T17:34:06.908+0100 7f35765f7240 1 bluefs umount
2021-12-21T17:34:06.908+0100 7f35765f7240 1 bdev(0x55fce5a1ac00 /var/lib/ceph/osd/ceph-7/block) close
2021-12-21T17:34:07.072+0100 7f35765f7240 1 bdev(0x55fce5a1a800 /var/lib/ceph/osd/ceph-7/block) close
The cluster is not in production, therefore, I can remove all corrupt pools and delete the OSDs. However, I would like to understand what was going on, in order to be able to avoid such a situation in the future.
I will provide the OSD logs from the time around the server reboot at the following link: https://we.tl/t-fArHXTmSM7
Ceph version: 16.2.6
Thanks,
Sebastian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
Igor Fedotov
Ceph Lead Developer
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx