On 03/28/17 17:28, Brian Andrus wrote: > Just adding some anecdotal input. It likely won't be ultimately > helpful other than a +1.. > > Seemingly, we also have the same issue since enabling exclusive-lock > on images. We experienced these messages at a large scale when making > a CRUSH map change a few weeks ago that resulted in many many VMs > experiencing the blocked task kernel messages, requiring reboots. > > We've since disabled on all images we can, but there are still > jewel-era instances that cannot have the feature disabled. Since > disabling the feature, I have not observed any cases of blocked tasks, > but so far given the limited timeframe I'd consider that anecdotal. > > Why do you need it enabled in jewel-era instances? With jewel you can set them on the fly, and live migrate the VM to get the client to update its usage of it. I couldn't find any difference except removing big images is faster with object-map (which depends on exclusive-lock). So I can't imagine why it can be required. And how long did you test it? I tested it a few weeks ago for about a week, with no hangs. Normally there are hangs after a few days. And I have permanently disabled it since the 20th, without any hangs since. And I'm gradually adding back the VMs that died when they were there, starting with the worst offenders. With that small time, I'm still very convinced. And did you test other features? I suspected exclusive-lock, so I only tested removing that one, which required removing object-map and fast-diff too, so I didn't test those 2 separately. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com