Dear experts,
Recently, we suffered network problem due to switch H/W failure and
caused massive OSD offline.
However, after the recovery of network, several of OSD are unable to
join back, resulting some down or unknown PGs. Restarting OSD doesn't
help. It will end up with sticking at booting process, specifically,
after "log_to_monitors {default=true}" step.
I have no idea how to debug this issue. Turning debug_osd to 20, only
shows following:
2022-12-01T03:32:13.065+0000 7f37ed19e700 5 osd.13 1045417 heartbeat
osd_stat(store_statfs(0x198fa330000/0xd13420000/0x1b4a39191000, data
0x1941ad7e6764/0x19a42ba30000, compress 0x0/0x0/0x0, omap 0x5e4f268f,
meta 0xcb4f2d971), peers [] op hist [])
2022-12-01T03:32:13.066+0000 7f37ed19e700 20 osd.13 1045417
check_full_status cur ratio 0.941459, physical ratio 0.941459, new state
nearfull
2022-12-01T03:32:13.068+0000 7f3808a53700 10 osd.13 1045417 tick
2022-12-01T03:32:13.068+0000 7f3808a53700 10 osd.13 1045417 do_waiters
-- start
2022-12-01T03:32:13.068+0000 7f3808a53700 10 osd.13 1045417 do_waiters
-- finish
2022-12-01T03:32:13.068+0000 7f3808a53700 20 osd.13 1045417 tick
last_purged_snaps_scrub 2022-11-30T10:11:21.878157+0000 next
2022-12-01T10:11:21.878157+0000
2022-12-01T03:32:13.342+0000 7f38071d2700 10 osd.13 1045417
tick_without_osd_lock
2022-12-01T03:32:14.065+0000 7f3808a53700 10 osd.13 1045417 tick
2022-12-01T03:32:14.065+0000 7f3808a53700 10 osd.13 1045417 do_waiters
-- start
2022-12-01T03:32:14.065+0000 7f3808a53700 10 osd.13 1045417 do_waiters
-- finish
2022-12-01T03:32:14.065+0000 7f3808a53700 20 osd.13 1045417 tick
last_purged_snaps_scrub 2022-11-30T10:11:21.878157+0000 next
2022-12-01T10:11:21.878157+0000
2022-12-01T03:32:14.353+0000 7f38071d2700 10 osd.13 1045417
tick_without_osd_lock
The ceph version is Octopus: 15.2.17.
OSD storage backend: bluestore
OS: CentOS7 64bit.
Any idea?
Thanks
&
Best regards,
Felix Lee ~
--
Felix H.T Lee Academia Sinica Grid & Cloud.
Tel: +886-2-27898308
Office: Room P111, Institute of Physics, 128 Academia Road, Section 2,
Nankang, Taipei 115, Taiwan
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx