On 20/06/16 19:51, Gregory Farnum wrote: > On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick >> >> At this stage, I have a strong suspicion that it is the introduction of >> "require_feature_tunables5 = 1" in the tunables. This seems to require >> all RADOS connections to be re-established. > > Do you have any evidence of that besides the one restart? > > I guess it's possible that we aren't kicking requests if the crush map > but not the rest of the osdmap changes, but I'd be surprised. > -Greg I think the key fact to take note of is that we had long-running Qemu processes that had been started a few months ago, using Infernalis librbd shared libs. If Infernalis had no concept of require_feature_tunables5, then it seems logical that these clients would block if the cluster were upgraded to Jewel and this tunable became mandatory. I have just upgraded our fourth and final cluster to Jewel. Prior to applying optimal tunables, we upgraded our hypervisor nodes' librbd also, and migrated all VMs at least once, to start a fresh Qemu process for each (using the updated librbd). We're seeing ~65% data movement due to chooseleaf_stable 0 => 1, but other than that, so far so good. No clients are blocking indefinitely. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com