Hi Daniel, Many thanks for your useful tests and your results. How much IO wait do you have on your client vms? Has it significantly increased or not? Many thanks Andrei ----- Original Message ----- > From: "Daniel Swarbrick" <daniel.swarbrick@xxxxxxxxxxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Wednesday, 22 June, 2016 13:43:37 > Subject: Re: cluster down during backfilling, Jewel tunables and client IO optimisations > On 20/06/16 19:51, Gregory Farnum wrote: >> On Mon, Jun 20, 2016 at 8:33 AM, Daniel Swarbrick >>> >>> At this stage, I have a strong suspicion that it is the introduction of >>> "require_feature_tunables5 = 1" in the tunables. This seems to require >>> all RADOS connections to be re-established. >> >> Do you have any evidence of that besides the one restart? >> >> I guess it's possible that we aren't kicking requests if the crush map >> but not the rest of the osdmap changes, but I'd be surprised. >> -Greg > > I think the key fact to take note of is that we had long-running Qemu > processes that had been started a few months ago, using Infernalis > librbd shared libs. > > If Infernalis had no concept of require_feature_tunables5, then it seems > logical that these clients would block if the cluster were upgraded to > Jewel and this tunable became mandatory. > > I have just upgraded our fourth and final cluster to Jewel. Prior to > applying optimal tunables, we upgraded our hypervisor nodes' librbd > also, and migrated all VMs at least once, to start a fresh Qemu process > for each (using the updated librbd). > > We're seeing ~65% data movement due to chooseleaf_stable 0 => 1, but > other than that, so far so good. No clients are blocking indefinitely. > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com