Any input from anyone? /Z On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > Hi, > > Just to reiterate, I'm referring to an OSD crash loop because of the > following error: > > "2023-12-03T04:00:36.686+0000 7f08520e2700 -1 bdev(0x55f02a28a400 > /var/lib/ceph/osd/ceph-56/block) _aio_thread got r=-1 ((1) Operation not > permitted)". More relevant log entries: https://pastebin.com/gDat6rfk > > The crash log suggested that there could be a hardware issue but there was > none, I was able to access the block device for testing purposes without > any issues, and the problem went away after I rebooted the host, this OSD > is currently operating without any issues under load. > > Any ideas? > > /Z > > On Sun, 3 Dec 2023 at 16:09, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > >> Thanks! The bug I referenced is the reason for the 1st OSD crash, but not >> for the subsequent crashes. The reason for those is described where you >> <snip />. I'm asking for help with that one. >> >> /Z >> >> On Sun, 3 Dec 2023 at 15:31, Kai Stian Olstad <ceph+list@xxxxxxxxxx> >> wrote: >> >>> On Sun, Dec 03, 2023 at 06:53:08AM +0200, Zakhar Kirpichenko wrote: >>> >One of our 16.2.14 cluster OSDs crashed again because of the dreaded >>> >https://tracker.ceph.com/issues/53906 bug. >>> >>> <snip /> >>> >>> >It would be good to understand what has triggered this condition and >>> how it >>> >can be resolved without rebooting the whole host. I would very much >>> >appreciate any suggestions. >>> >>> If you look closely at 53906 you'll see it's a duplicate of >>> https://tracker.ceph.com/issues/53907 >>> >>> In there you have the fix and a workaround until next minor is released. >>> >>> -- >>> Kai Stian Olstad >>> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx