Re: Bad performances in recovery

Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx> · Thu, 20 Aug 2015 11:43:24 -0400

>
> Just to update the mailing list, we ended up going back to default
> ceph.conf without any additional settings than what is mandatory. We are
> now reaching speeds we never reached before, both in recovery and in
> regular usage. There was definitely something we set in the ceph.conf
> bogging everything down.

Could you please share the old and new ceph.conf, or the section that
was removed?

Best regards,
Alex

>
>
> On 2015-08-20 4:06 AM, Christian Balzer wrote:
>>
>> Hello,
>>
>> from all the pertinent points by Somnath, the one about pre-conditioning
>> would be pretty high on my list, especially if this slowness persists and
>> nothing else (scrub) is going on.
>>
>> This might be "fixed" by doing a fstrim.
>>
>> Additionally the levelDB's per OSD are of course sync'ing heavily during
>> reconstruction, so that might not be the favorite thing for your type of
>> SSDs.
>>
>> But ultimately situational awareness is very important, as in "what" is
>> actually going and slowing things down.
>> As usual my recommendations would be to use atop, iostat or similar on all
>> your nodes and see if your OSD SSDs are indeed the bottleneck or if it is
>> maybe just one of them or something else entirely.
>>
>> Christian
>>
>> On Wed, 19 Aug 2015 20:54:11 +0000 Somnath Roy wrote:
>>
>>> Also, check if scrubbing started in the cluster or not. That may
>>> considerably slow down the cluster.
>>>
>>> -----Original Message-----
>>> From: Somnath Roy
>>> Sent: Wednesday, August 19, 2015 1:35 PM
>>> To: 'J-P Methot'; ceph-users@xxxxxxxx
>>> Subject: RE:  Bad performances in recovery
>>>
>>> All the writes will go through the journal.
>>> It may happen your SSDs are not preconditioned well and after a lot of
>>> writes during recovery IOs are stabilized to lower number. This is quite
>>> common for SSDs if that is the case.
>>>
>>> Thanks & Regards
>>> Somnath
>>>
>>> -----Original Message-----
>>> From: J-P Methot [mailto:jpmethot@xxxxxxxxxx]
>>> Sent: Wednesday, August 19, 2015 1:03 PM
>>> To: Somnath Roy; ceph-users@xxxxxxxx
>>> Subject: Re:  Bad performances in recovery
>>>
>>> Hi,
>>>
>>> Thank you for the quick reply. However, we do have those exact settings
>>> for recovery and it still strongly affects client io. I have looked at
>>> various ceph logs and osd logs and nothing is out of the ordinary.
>>> Here's an idea though, please tell me if I am wrong.
>>>
>>> We use intel SSDs for journaling and samsung SSDs as proper OSDs. As was
>>> explained several times on this mailing list, Samsung SSDs suck in ceph.
>>> They have horrible O_dsync speed and die easily, when used as journal.
>>> That's why we're using Intel ssds for journaling, so that we didn't end
>>> up putting 96 samsung SSDs in the trash.
>>>
>>> In recovery though, what is the ceph behaviour? What kind of write does
>>> it do on the OSD SSDs? Does it write directly to the SSDs or through the
>>> journal?
>>>
>>> Additionally, something else we notice: the ceph cluster is MUCH slower
>>> after recovery than before. Clearly there is a bottleneck somewhere and
>>> that bottleneck does not get cleared up after the recovery is done.
>>>
>>>
>>> On 2015-08-19 3:32 PM, Somnath Roy wrote:
>>>> If you are concerned about *client io performance* during recovery,
>>>> use these settings..
>>>>
>>>> osd recovery max active = 1
>>>> osd max backfills = 1
>>>> osd recovery threads = 1
>>>> osd recovery op priority = 1
>>>>
>>>> If you are concerned about *recovery performance*, you may want to
>>>> bump this up, but I doubt it will help much from default settings..
>>>>
>>>> Thanks & Regards
>>>> Somnath
>>>>
>>>> -----Original Message-----
>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
>>>> Of J-P Methot
>>>> Sent: Wednesday, August 19, 2015 12:17 PM
>>>> To: ceph-users@xxxxxxxx
>>>> Subject:  Bad performances in recovery
>>>>
>>>> Hi,
>>>>
>>>> Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for
>>>> a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each.
>>>> The ceph version is hammer v0.94.1 . There is a performance overhead
>>>> because we're using SSDs (I've heard it gets better in infernalis, but
>>>> we're not upgrading just yet) but we can reach numbers that I would
>>>> consider "alright".
>>>>
>>>> Now, the issue is, when the cluster goes into recovery it's very fast
>>>> at first, but then slows down to ridiculous levels as it moves
>>>> forward. You can go from 7% to 2% to recover in ten minutes, but it
>>>> may take 2 hours to recover the last 2%. While this happens, the
>>>> attached openstack setup becomes incredibly slow, even though there is
>>>> only a small fraction of objects still recovering (less than 1%). The
>>>> settings that may affect recovery speed are very low, as they are by
>>>> default, yet they still affect client io speed way more than it should.
>>>>
>>>> Why would ceph recovery become so slow as it progress and affect
>>>> client io even though it's recovering at a snail's pace? And by a
>>>> snail's pace, I mean a few kb/second on 10gbps uplinks. --
>>>> ====================== Jean-Philippe Méthot
>>>> Administrateur système / System administrator GloboTech Communications
>>>> Phone: 1-514-907-0050
>>>> Toll Free: 1-(888)-GTCOMM1
>>>> Fax: 1-(514)-907-0750
>>>> jpmethot@xxxxxxxxxx
>>>> http://www.gtcomm.net
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>> ________________________________
>>>>
>>>> PLEASE NOTE: The information contained in this electronic mail message
>>>> is intended only for the use of the designated recipient(s) named
>>>> above. If the reader of this message is not the intended recipient,
>>>> you are hereby notified that you have received this message in error
>>>> and that any review, dissemination, distribution, or copying of this
>>>> message is strictly prohibited. If you have received this
>>>> communication in error, please notify the sender by telephone or
>>>> e-mail (as shown above) immediately and destroy any and all copies of
>>>> this message in your possession (whether hard copies or electronically
>>>> stored copies).
>>>>
>>>
>>>
>>> --
>>> ======================
>>> Jean-Philippe Méthot
>>> Administrateur système / System administrator GloboTech Communications
>>> Phone: 1-514-907-0050
>>> Toll Free: 1-(888)-GTCOMM1
>>> Fax: 1-(514)-907-0750
>>> jpmethot@xxxxxxxxxx
>>> http://www.gtcomm.net
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
>
> --
> ======================
> Jean-Philippe Méthot
> Administrateur système / System administrator
> GloboTech Communications
> Phone: 1-514-907-0050
> Toll Free: 1-(888)-GTCOMM1
> Fax: 1-(514)-907-0750
> jpmethot@xxxxxxxxxx
> http://www.gtcomm.net
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com