Hey Ilya, Thanks so much for the patches, we are planning to test them either this afternoon or tomorrow at the latest, I will let you know the results. Regards, Robin Geuze From: Ilya Dryomov <idryomov@xxxxxxxxx> Sent: 06 July 2021 19:21 To: Robin Geuze Cc: Ceph Development Subject: Re: All RBD IO stuck after flapping OSD's On Tue, Jun 29, 2021 at 12:07 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > On Tue, Jun 29, 2021 at 10:39 AM Robin Geuze <robin.geuze@xxxxxxxxxxxx> wrote: > > > > Hey Ilya, > > > > Do you have any idea on the cause of this bug yet? I tried to dig around a bit myself in the source, but the logic around this locking is very complex, so I couldn't figure out where the problem is. > > I do. The proper fix would indeed be large and not backportable but > I have a workaround in mind that should be simple enough to backport > all the way to 5.4. The trick is making sure that the workaround is > fine from the exclusive lock protocol POV. > > I'll try to flesh it out by the end of this week and report back > early next week. Hi Robin, I CCed you on the patches. They should apply to 5.4 cleanly. You mentioned you have a build farm set up, please take them for a spin. Thanks, Ilya