Hi Wido, Thanks for your reply. We have a very simple ceph network. A single 40gbit/s infiniband switch where the osd servers and hosts are connected to. There are no default gates on the storage network. The IB is used only for ceph; everything else goes over the ethernet. I've checked the stats on the IB interfaces of osd servers and there are no errors. The ipoib interface has a very small number of dropped packets (0.0003%). What kind of network tests would you suggest that I run? What do you mean by "I would suggest that you check if the network towards clients is also OK."? By clients do you mean the host servers? Many thanks Andrei ----- Original Message ----- > From: "Wido den Hollander" <wido@xxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> > Sent: Tuesday, 26 April, 2016 21:17:59 > Subject: Re: Hammer broke after adding 3rd osd server >> Op 26 april 2016 om 17:52 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>: >> >> >> Hello everyone, >> >> I've recently performed a hardware upgrade on our small two osd server ceph >> cluster, which seems to have broke the ceph cluster. We are using ceph for >> cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with >> latest updates and kernel 4.4.6 from ubuntu repo. >> >> Previous hardware: >> >> 2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz each >> and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using IPoIB. >> >> The following things were upgraded: >> >> 1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now have >> 5 osds per single ssd. >> 2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz >> 3. Upgraded ram on osd servers to become 64gb >> 4. Installed additional osd disk to have 10 osds per server. >> >> After adding the third osd server and finishing the initial sync, the cluster >> worked okay for 1-2 days. No issues were noticed. On a third day my monitoring >> system started reporting a bunch of issues from the ceph cluster as well as >> from our virtual machines. This tend to happen between 7:20am and 7:40am and >> lasts for about 2-3 hours before things become normal again. I've checked the >> osd servers and there is nothing that I could find in cron or otherwise that >> starts around 7:20am. >> >> The problem is as follows: the new osd server's load goes to 400+ with ceph-osd >> processes consuming all cpu resources. The ceph -w shows a high number of slow >> requests which relate to osds belonging to the new osd server. The log files >> show the following: >> >> 2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN] >> slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014: >> osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621 >> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096] >> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for >> subops from 22 >> 2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN] >> slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169: >> osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621 >> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192] >> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for >> rw locks >> >> >> >> There are practically every osd involved in the slow requests and they tend to >> be between the old two osd servers and the new one. There were no issues as far >> as I can see between the old two servers. >> >> The first thing i've checked is the networking. No issue was identified from >> running ping -i .1 <servername> as well as using hping3 for the tcp connection >> checks. The network tests were running for over a week and not a single packet >> was lost. The slow requests took place while the network tests were running. >> >> I've also checked the osd and ssd disks and I was not able to identify anything >> problematic. >> >> Stopping all osds on the new server causes no issues between the old two osd >> servers. I've left the new server disconnected for a few days and had no issues >> with the cluster. >> >> I am a bit lost on what else to try and how to debug the issue. Could someone >> please help me? >> > > I would still say this is a network issue. > > "currently waiting for rw locks" is usually a network problem. > > I found this out myself a few weeks ago: > http://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/ > > The problem there was a wrong gateway on some machines. > > In that situation the OSDs could talk just fine, but they had problems with > sending traffic back to the clients which lead to buffers filling up. > > I would suggest that you check if the network towards clients is also OK. > > Wido > >> Many thanks >> >> Andrei >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com