On 12/20/21 10:09, Igor Fedotov wrote:
Hi Andrej,
first of all I'd like to mention that this issue is rather not new to
16.2.7. There is a ticket: https://tracker.ceph.com/issues/47330 which
has mentions the similar case for mimic. And the ticket erroneously
tagged as resolved - but the proposed fix just introduces bluefs file
import option to ceph-bluestore-tool which permits manual recovery.
Please be aware that this import is present in master only and it has
a bug (still open) by itself: https://github.com/ceph/ceph/pull/44317
So some more questions which might help in troubleshooting:
1) Did the error pop up immediately after the upgrade or some
successful starts happened on 16.2.7 before the failure?
no successful restarts
2) Could you please share an OSD log prior to the shutdown which
triggerd the corruption? And the one for the first start afterwards.
here
http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz
3) Please set debug-bluefs to 20, retry the OSD start and share the log.
http://www-f9.ijs.si/~andrej/ceph-osd.611.log-20211220-short.gz
4) Please share the content of the broken CURRENT file
[root@lcst0032 db]# hexdump CURRENT
0000000 909f d59e f778 4f50 acb0 b1ea 59a2 9e90
0000010
Thanks,
Andrej
Thanks,
Igor
On 12/20/2021 11:17 AM, Andrej Filipcic wrote:
Hi,
When upgrading to 16.2.7 from 16.2.6, 8 out of ~1600 OSDs failed to
start. The first 16.2.7 startup crashes here:
2021-12-19T09:52:34.128+0100 7ff7104c0080 1 bluefs mount
2021-12-19T09:52:34.129+0100 7ff7104c0080 1 bluefs _init_alloc
shared, id 1, capacity 0xe8d7fc00000, block size 0x10000
2021-12-19T09:52:34.238+0100 7ff7104c0080 1 bluefs mount
shared_bdev_used = 0
2021-12-19T09:52:34.238+0100 7ff7104c0080 1
bluestore(/var/lib/ceph/osd/ceph-611) _prepare_db_environment set
db_paths to db,15200851643596 db.slow,15200851643596
2021-12-19T09:52:34.257+0100 7ff7104c0080 -1 rocksdb: verify_sharding
unable to list column families: Corruption: CURRENT file does not end
with newline
2021-12-19T09:52:34.257+0100 7ff7104c0080 -1
bluestore(/var/lib/ceph/osd/ceph-611) _open_db erroring opening db:
2021-12-19T09:52:34.257+0100 7ff7104c0080 1 bluefs umount
I could export the rocksdb, and the contents of the CURRENT file is
corruped, I understand it should contain the MANIFEST-* info.
I have attached the full osd log of one failure, the others failed
OSD all fail for the same reason.
Any hint? for now, I keep those osds off if they can be further
debugged.
(resending with shortened log)
Best regards,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail:Andrej.Filipcic@xxxxxx
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-477-3166
-------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx