Re: [ceph-users] stalls caused by scrub on jewel

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Thu, 1 Dec 2016 21:38:00 +0100

Hi Yoann,

Thank you for your input. I was just told by RH support that it’s gonna make it to RHCS 2.0 (10.2.3). Thank you guys for the fix !

We thought about increasing the number of PGs just after changing the merge/split threshold values but this would have led to a _lot_ of data movements (1.2 billion of XFS files) over weeks, without any possibility to scrub / deep-scrub to ensure data consistency. Still as soon as we get the fix, we will increase the number of PGs.

Regards,

Frederic.

> Le 1 déc. 2016 à 16:47, Yoann Moulin <yoann.moulin@xxxxxxx> a écrit :
> 
> Hello,
> 
>> We're impacted by this bug (case 01725311). Our cluster is running RHCS 2.0 and is no more capable to scrub neither deep-scrub.
>> 
>> [1] http://tracker.ceph.com/issues/17859
>> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1394007
>> [3] https://github.com/ceph/ceph/pull/11898
>> 
>> I'm worried we'll have to live with a cluster that can't scrub/deep-scrub until March 2017 (ETA for RHCS 2.2 running Jewel 10.2.4).
>> 
>> Can we have this fix any sooner ?
> 
> As far as I know about that bug, it appears if you have big PGs, a workaround could be increasing the pg_num of the pool that has the biggest PGs.
> 
> -- 
> Yoann Moulin
> EPFL IC-IT

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html