Re: cluster down during backfilling, Jewel tunables and client IO optimisations

Daniel Swarbrick <daniel.swarbrick@xxxxxxxxxxxxxxxx> · Mon, 20 Jun 2016 17:33:28 +0200

We have just updated our third cluster from Infernalis to Jewel, and are
experiencing similar issues.

We run a number of KVM virtual machines (qemu 2.5) with RBD images, and
have seen a lot of D-state processes and even jbd/2 timeouts and kernel
stack traces inside the guests. At first I thought the VMs were being
starved of IO, but this is still happening after throttling back the
recovery with:

osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1

After upgrading the cluster to Jewel, I changed our crushmap to use the
newer straw2 algorithm, which resulted in a little data movment, but no
problems at that stage.

Once the cluster had settled down again, I set tunables to optimal
(hammer profile -> jewel profile), which has triggered between 50% and
70% misplaced PGs on our clusters. This is when the trouble started each
time, and when we had cascading failures of VMs.

However, after performing hard shutdowns on the VMs and restarting them,
they seemed to be OK.

At this stage, I have a strong suspicion that it is the introduction of
"require_feature_tunables5 = 1" in the tunables. This seems to require
all RADOS connections to be re-established.

On 20/06/16 13:54, Andrei Mikhailovsky wrote:
> Hi Oliver,
> 
> I am also seeing this as a strange behavriour indeed! I was going through the logs and I was not able to find any errors or issues. There was also no slow/blocked requests that I could see during the recovery process.
> 
> Does anyone has an idea what could be the issue here? I don't want to shut down all vms every time there is a new release with updated tunable values.
> 
> 
> Andrei
> 
> 
> 
> ----- Original Message -----
>> From: "Oliver Dzombic" <info@xxxxxxxxxxxxxxxxx>
>> To: "andrei" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Sunday, 19 June, 2016 10:14:35
>> Subject: Re:  cluster down during backfilling, Jewel tunables and client IO optimisations
> 
>> Hi,
>>
>> so far the key values for that are:
>>
>> osd_client_op_priority = 63 ( anyway default, but i set it to remember it )
>> osd_recovery_op_priority = 1
>>
>>
>> In addition i set:
>>
>> osd_max_backfills = 1
>> osd_recovery_max_active = 1
>>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com