Hi, Just to reiterate, I'm referring to an OSD crash loop because of the following error: "2023-12-03T04:00:36.686+0000 7f08520e2700 -1 bdev(0x55f02a28a400 /var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not permitted)". More relevant log entries: https://pastebin.com/gDat6rfk The crash log suggested that there could be a hardware issue but there was none, I was able to access the block device for testing purposes without any issues, and the problem went away after I rebooted the host, this OSD is currently operating without any issues under load. Any ideas? /Z On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > Thanks! The bug I referenced is the reason for the 1st OSD crash, but not > for the subsequent crashes. The reason for those is described where you > <snip />. I'm asking for help with that one. > > /Z > > On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad <ceph+list@xxxxxxxxxx> > wrote: > >> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote: >> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded >> >https://tracker.ceph.com/issues/53906 bug. >> >> <snip /> >> >> >It would be good to understand what has triggered this condition and how >> it >> >can be resolved without rebooting the whole host. I would very much >> >appreciate any suggestions. >> >> If you look closely at 53906 you'll see it's a duplicate of >> https://tracker.ceph.com/issues/53907 >> >> In there you have the fix and a workaround until next minor is released. >> >> -- >> Kai Stian Olstad >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx