Re: Hammer broke after adding 3rd osd server

Andrei Mikhailovsky <andrei@xxxxxxxxxx> · Tue, 26 Apr 2016 21:31:06 +0100 (BST)

Hi Wido,

Thanks for your reply. We have a very simple ceph network. A single 40gbit/s infiniband switch where the osd servers and hosts are connected to. There are no default gates on the storage network. The IB is used only for ceph; everything else goes over the ethernet. 

I've checked the stats on the IB interfaces of osd servers and there are no errors. The ipoib interface has a very small number of dropped packets (0.0003%).

What kind of network tests would you suggest that I run? What do you mean by "I would suggest that you check if the network towards clients is also OK."? By clients do you mean the host servers?

Many thanks

Andrei

----- Original Message -----
> From: "Wido den Hollander" <wido@xxxxxxxx>
> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>
> Sent: Tuesday, 26 April, 2016 21:17:59
> Subject: Re:  Hammer broke after adding 3rd osd server

>> Op 26 april 2016 om 17:52 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>> 
>> 
>> Hello everyone,
>> 
>> I've recently performed a hardware upgrade on our small two osd server ceph
>> cluster, which seems to have broke the ceph cluster. We are using ceph for
>> cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with
>> latest updates and kernel 4.4.6 from ubuntu repo.
>> 
>> Previous hardware:
>> 
>> 2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz each
>> and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using IPoIB.
>> 
>> The following things were upgraded:
>> 
>> 1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now have
>> 5 osds per single ssd.
>> 2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz
>> 3. Upgraded ram on osd servers to become 64gb
>> 4. Installed additional osd disk to have 10 osds per server.
>> 
>> After adding the third osd server and finishing the initial sync, the cluster
>> worked okay for 1-2 days. No issues were noticed. On a third day my monitoring
>> system started reporting a bunch of issues from the ceph cluster as well as
>> from our virtual machines. This tend to happen between 7:20am and 7:40am and
>> lasts for about 2-3 hours before things become normal again. I've checked the
>> osd servers and there is nothing that I could find in cron or otherwise that
>> starts around 7:20am.
>> 
>> The problem is as follows: the new osd server's load goes to 400+ with ceph-osd
>> processes consuming all cpu resources. The ceph -w shows a high number of slow
>> requests which relate to osds belonging to the new osd server. The log files
>> show the following:
>> 
>> 2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN]
>> slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014:
>> osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621
>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096]
>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>> subops from 22
>> 2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN]
>> slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169:
>> osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621
>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192]
>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>> rw locks
>> 
>> 
>> 
>> There are practically every osd involved in the slow requests and they tend to
>> be between the old two osd servers and the new one. There were no issues as far
>> as I can see between the old two servers.
>> 
>> The first thing i've checked is the networking. No issue was identified from
>> running ping -i .1 <servername> as well as using hping3 for the tcp connection
>> checks. The network tests were running for over a week and not a single packet
>> was lost. The slow requests took place while the network tests were running.
>> 
>> I've also checked the osd and ssd disks and I was not able to identify anything
>> problematic.
>> 
>> Stopping all osds on the new server causes no issues between the old two osd
>> servers. I've left the new server disconnected for a few days and had no issues
>> with the cluster.
>> 
>> I am a bit lost on what else to try and how to debug the issue. Could someone
>> please help me?
>> 
> 
> I would still say this is a network issue.
> 
> "currently waiting for rw locks" is usually a network problem.
> 
> I found this out myself a few weeks ago:
> http://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/
> 
> The problem there was a wrong gateway on some machines.
> 
> In that situation the OSDs could talk just fine, but they had problems with
> sending traffic back to the clients which lead to buffers filling up.
> 
> I would suggest that you check if the network towards clients is also OK.
> 
> Wido
> 
>> Many thanks
>> 
>> Andrei
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com