Re: Hammer broke after adding 3rd osd server

Alwin Antreich <sysadmin-ceph@xxxxxxxxxxxx> · Tue, 26 Apr 2016 18:13:17 -0400

Hi Andrei,

are you using Jumbo Frames? My experience, I had a driver issues where
one NIC wouldn't accept the MTU set for the interface and the cluster
ran into a very similar behavior as you are describing. After I have set
the MTU for all NICs and servers to the working value of my troubling
NIC, everything went back to normal.

Regards,
Alwin

On 04/26/2016 05:18 PM, Wido den Hollander wrote:
> 
>> Op 26 april 2016 om 22:31 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>>
>>
>> Hi Wido,
>>
>> Thanks for your reply. We have a very simple ceph network. A single 40gbit/s infiniband switch where the osd servers and hosts are connected to. There are no default gates on the storage network. The IB is used only for ceph; everything else goes over the ethernet. 
>>
>> I've checked the stats on the IB interfaces of osd servers and there are no errors. The ipoib interface has a very small number of dropped packets (0.0003%).
>>
>> What kind of network tests would you suggest that I run? What do you mean by "I would suggest that you check if the network towards clients is also OK."? By clients do you mean the host servers?
>>
> 
> With clients I mean you verify if the hosts talking to the Ceph cluster can reach each machine running OSDs.
> 
> In my case there was packet loss from certain clients which caused the issues to occur.
> 
> Wido
> 
>> Many thanks
>>
>> Andrei
>>
>> ----- Original Message -----
>>> From: "Wido den Hollander" <wido@xxxxxxxx>
>>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
>>> Sent: Tuesday, 26 April, 2016 21:17:59
>>> Subject: Re:  Hammer broke after adding 3rd osd server
>>
>>>> Op 26 april 2016 om 17:52 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>>>>
>>>>
>>>> Hello everyone,
>>>>
>>>> I've recently performed a hardware upgrade on our small two osd server ceph
>>>> cluster, which seems to have broke the ceph cluster. We are using ceph for
>>>> cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with
>>>> latest updates and kernel 4.4.6 from ubuntu repo.
>>>>
>>>> Previous hardware:
>>>>
>>>> 2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz each
>>>> and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using IPoIB.
>>>>
>>>> The following things were upgraded:
>>>>
>>>> 1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now have
>>>> 5 osds per single ssd.
>>>> 2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz
>>>> 3. Upgraded ram on osd servers to become 64gb
>>>> 4. Installed additional osd disk to have 10 osds per server.
>>>>
>>>> After adding the third osd server and finishing the initial sync, the cluster
>>>> worked okay for 1-2 days. No issues were noticed. On a third day my monitoring
>>>> system started reporting a bunch of issues from the ceph cluster as well as
>>>> from our virtual machines. This tend to happen between 7:20am and 7:40am and
>>>> lasts for about 2-3 hours before things become normal again. I've checked the
>>>> osd servers and there is nothing that I could find in cron or otherwise that
>>>> starts around 7:20am.
>>>>
>>>> The problem is as follows: the new osd server's load goes to 400+ with ceph-osd
>>>> processes consuming all cpu resources. The ceph -w shows a high number of slow
>>>> requests which relate to osds belonging to the new osd server. The log files
>>>> show the following:
>>>>
>>>> 2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN]
>>>> slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014:
>>>> osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621
>>>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096]
>>>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>>>> subops from 22
>>>> 2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN]
>>>> slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169:
>>>> osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621
>>>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192]
>>>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>>>> rw locks
>>>>
>>>>
>>>>
>>>> There are practically every osd involved in the slow requests and they tend to
>>>> be between the old two osd servers and the new one. There were no issues as far
>>>> as I can see between the old two servers.
>>>>
>>>> The first thing i've checked is the networking. No issue was identified from
>>>> running ping -i .1 <servername> as well as using hping3 for the tcp connection
>>>> checks. The network tests were running for over a week and not a single packet
>>>> was lost. The slow requests took place while the network tests were running.
>>>>
>>>> I've also checked the osd and ssd disks and I was not able to identify anything
>>>> problematic.
>>>>
>>>> Stopping all osds on the new server causes no issues between the old two osd
>>>> servers. I've left the new server disconnected for a few days and had no issues
>>>> with the cluster.
>>>>
>>>> I am a bit lost on what else to try and how to debug the issue. Could someone
>>>> please help me?
>>>>
>>>
>>> I would still say this is a network issue.
>>>
>>> "currently waiting for rw locks" is usually a network problem.
>>>
>>> I found this out myself a few weeks ago:
>>> http://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/
>>>
>>> The problem there was a wrong gateway on some machines.
>>>
>>> In that situation the OSDs could talk just fine, but they had problems with
>>> sending traffic back to the clients which lead to buffers filling up.
>>>
>>> I would suggest that you check if the network towards clients is also OK.
>>>
>>> Wido
>>>
>>>> Many thanks
>>>>
>>>> Andrei
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com