Re: Bad performances in recovery

Jan Schermer <jan@xxxxxxxxxxx> · Thu, 20 Aug 2015 18:09:57 +0200

Are you sure it was because of configuration changes?
Maybe it was restarting the OSDs that fixed it?
We often hit an issue with backfill_toofull where the recovery/backfill processes get stuck until we restart the daemons (sometimes setting recovery_max_active helps as well). It still shows recovery of few objects now and then (few KB/s) and then stops completely.

Jan

> On 20 Aug 2015, at 17:43, Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> wrote:
> 
>> 
>> Just to update the mailing list, we ended up going back to default
>> ceph.conf without any additional settings than what is mandatory. We are
>> now reaching speeds we never reached before, both in recovery and in
>> regular usage. There was definitely something we set in the ceph.conf
>> bogging everything down.
> 
> Could you please share the old and new ceph.conf, or the section that
> was removed?
> 
> Best regards,
> Alex
> 
>> 
>> 
>> On 2015-08-20 4:06 AM, Christian Balzer wrote:
>>> 
>>> Hello,
>>> 
>>> from all the pertinent points by Somnath, the one about pre-conditioning
>>> would be pretty high on my list, especially if this slowness persists and
>>> nothing else (scrub) is going on.
>>> 
>>> This might be "fixed" by doing a fstrim.
>>> 
>>> Additionally the levelDB's per OSD are of course sync'ing heavily during
>>> reconstruction, so that might not be the favorite thing for your type of
>>> SSDs.
>>> 
>>> But ultimately situational awareness is very important, as in "what" is
>>> actually going and slowing things down.
>>> As usual my recommendations would be to use atop, iostat or similar on all
>>> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
>>> maybe just one of them or something else entirely.
>>> 
>>> Christian
>>> 
>>> On Wed, 19 Aug 2015 20:54:11 +0000 Somnath Roy wrote:
>>> 
>>>> Also, check if scrubbing started in the cluster or not. That may
>>>> considerably slow down the cluster.
>>>> 
>>>> -----Original Message-----
>>>> From: Somnath Roy
>>>> Sent: Wednesday, August 19, 2015 1:35 PM
>>>> To: 'J-P Methot'; ceph-users@xxxxxxxx
>>>> Subject: RE:  Bad performances in recovery
>>>> 
>>>> All the writes will go through the journal.
>>>> It may happen your SSDs are not preconditioned well and after a lot of
>>>> writes during recovery IOs are stabilized to lower number. This is quite
>>>> common for SSDs if that is the case.
>>>> 
>>>> Thanks & Regards
>>>> Somnath
>>>> 
>>>> -----Original Message-----
>>>> From: J-P Methot [mailto:jpmethot@xxxxxxxxxx]
>>>> Sent: Wednesday, August 19, 2015 1:03 PM
>>>> To: Somnath Roy; ceph-users@xxxxxxxx
>>>> Subject: Re:  Bad performances in recovery
>>>> 
>>>> Hi,
>>>> 
>>>> Thank you for the quick reply. However, we do have those exact settings
>>>> for recovery and it still strongly affects client io. I have looked at
>>>> various ceph logs and osd logs and nothing is out of the ordinary.
>>>> Here's an idea though, please tell me if I am wrong.
>>>> 
>>>> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
>>>> explained several times on this mailing list, Samsung SSDs suck in ceph.
>>>> They have horrible O_dsync speed and die easily, when used as journal.
>>>> That's why we're using Intel ssds for journaling, so that we didn't end
>>>> up putting 96 samsung SSDs in the trash.
>>>> 
>>>> In recovery though, what is the ceph behaviour? What kind of write does
>>>> it do on the OSD SSDs? Does it write directly to the SSDs or through the
>>>> journal?
>>>> 
>>>> Additionally, something else we notice: the ceph cluster is MUCH slower
>>>> after recovery than before. Clearly there is a bottleneck somewhere and
>>>> that bottleneck does not get cleared up after the recovery is done.
>>>> 
>>>> 
>>>> On 2015-08-19 3:32 PM, Somnath Roy wrote:
>>>>> If you are concerned about *client io performance* during recovery,
>>>>> use these settings..
>>>>> 
>>>>> osd recovery max active = 1
>>>>> osd max backfills = 1
>>>>> osd recovery threads = 1
>>>>> osd recovery op priority = 1
>>>>> 
>>>>> If you are concerned about *recovery performance*, you may want to
>>>>> bump this up, but I doubt it will help much from default settings..
>>>>> 
>>>>> Thanks & Regards
>>>>> Somnath
>>>>> 
>>>>> -----Original Message-----
>>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>>>> Of J-P Methot
>>>>> Sent: Wednesday, August 19, 2015 12:17 PM
>>>>> To: ceph-users@xxxxxxxx
>>>>> Subject:  Bad performances in recovery
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
>>>>> a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
>>>>> The ceph version is hammer v0.94.1 . There is a performance overhead
>>>>> because we're using SSDs (I've heard it gets better in infernalis, but
>>>>> we're not upgrading just yet) but we can reach numbers that I would
>>>>> consider "alright".
>>>>> 
>>>>> Now, the issue is, when the cluster goes into recovery it's very fast
>>>>> at first, but then slows down to ridiculous levels as it moves
>>>>> forward. You can go from 7% to 2% to recover in ten minutes, but it
>>>>> may take 2 hours to recover the last 2%. While this happens, the
>>>>> attached openstack setup becomes incredibly slow, even though there is
>>>>> only a small fraction of objects still recovering (less than 1%). The
>>>>> settings that may affect recovery speed are very low, as they are by
>>>>> default, yet they still affect client io speed way more than it should.
>>>>> 
>>>>> Why would ceph recovery become so slow as it progress and affect
>>>>> client io even though it's recovering at a snail's pace? And by a
>>>>> snail's pace, I mean a few kb/second on 10gbps uplinks. --
>>>>> ====================== Jean-Philippe Méthot
>>>>> Administrateur système / System administrator GloboTech Communications
>>>>> Phone: 1-514-907-0050
>>>>> Toll Free: 1-(888)-GTCOMM1
>>>>> Fax: 1-(514)-907-0750
>>>>> jpmethot@xxxxxxxxxx
>>>>> http://www.gtcomm.net
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@xxxxxxxxxxxxxx
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> 
>>>>> ________________________________
>>>>> 
>>>>> PLEASE NOTE: The information contained in this electronic mail message
>>>>> is intended only for the use of the designated recipient(s) named
>>>>> above. If the reader of this message is not the intended recipient,
>>>>> you are hereby notified that you have received this message in error
>>>>> and that any review, dissemination, distribution, or copying of this
>>>>> message is strictly prohibited. If you have received this
>>>>> communication in error, please notify the sender by telephone or
>>>>> e-mail (as shown above) immediately and destroy any and all copies of
>>>>> this message in your possession (whether hard copies or electronically
>>>>> stored copies).
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> ======================
>>>> Jean-Philippe Méthot
>>>> Administrateur système / System administrator GloboTech Communications
>>>> Phone: 1-514-907-0050
>>>> Toll Free: 1-(888)-GTCOMM1
>>>> Fax: 1-(514)-907-0750
>>>> jpmethot@xxxxxxxxxx
>>>> http://www.gtcomm.net
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>> 
>> 
>> --
>> ======================
>> Jean-Philippe Méthot
>> Administrateur système / System administrator
>> GloboTech Communications
>> Phone: 1-514-907-0050
>> Toll Free: 1-(888)-GTCOMM1
>> Fax: 1-(514)-907-0750
>> jpmethot@xxxxxxxxxx
>> http://www.gtcomm.net
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com