Hi Andrei, are you using Jumbo Frames? My experience, I had a driver issues where one NIC wouldn't accept the MTU set for the interface and the cluster ran into a very similar behavior as you are describing. After I have set the MTU for all NICs and servers to the working value of my troubling NIC, everything went back to normal. Regards, Alwin On 04/26/2016 05:18 PM, Wido den Hollander wrote: > >> Op 26 april 2016 om 22:31 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>: >> >> >> Hi Wido, >> >> Thanks for your reply. We have a very simple ceph network. A single 40gbit/s infiniband switch where the osd servers and hosts are connected to. There are no default gates on the storage network. The IB is used only for ceph; everything else goes over the ethernet. >> >> I've checked the stats on the IB interfaces of osd servers and there are no errors. The ipoib interface has a very small number of dropped packets (0.0003%). >> >> What kind of network tests would you suggest that I run? What do you mean by "I would suggest that you check if the network towards clients is also OK."? By clients do you mean the host servers? >> > > With clients I mean you verify if the hosts talking to the Ceph cluster can reach each machine running OSDs. > > In my case there was packet loss from certain clients which caused the issues to occur. > > Wido > >> Many thanks >> >> Andrei >> >> ----- Original Message ----- >>> From: "Wido den Hollander" <wido@xxxxxxxx> >>> To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> >>> Sent: Tuesday, 26 April, 2016 21:17:59 >>> Subject: Re: Hammer broke after adding 3rd osd server >> >>>> Op 26 april 2016 om 17:52 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>: >>>> >>>> >>>> Hello everyone, >>>> >>>> I've recently performed a hardware upgrade on our small two osd server ceph >>>> cluster, which seems to have broke the ceph cluster. We are using ceph for >>>> cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with >>>> latest updates and kernel 4.4.6 from ubuntu repo. >>>> >>>> Previous hardware: >>>> >>>> 2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz each >>>> and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using IPoIB. >>>> >>>> The following things were upgraded: >>>> >>>> 1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now have >>>> 5 osds per single ssd. >>>> 2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz >>>> 3. Upgraded ram on osd servers to become 64gb >>>> 4. Installed additional osd disk to have 10 osds per server. >>>> >>>> After adding the third osd server and finishing the initial sync, the cluster >>>> worked okay for 1-2 days. No issues were noticed. On a third day my monitoring >>>> system started reporting a bunch of issues from the ceph cluster as well as >>>> from our virtual machines. This tend to happen between 7:20am and 7:40am and >>>> lasts for about 2-3 hours before things become normal again. I've checked the >>>> osd servers and there is nothing that I could find in cron or otherwise that >>>> starts around 7:20am. >>>> >>>> The problem is as follows: the new osd server's load goes to 400+ with ceph-osd >>>> processes consuming all cpu resources. The ceph -w shows a high number of slow >>>> requests which relate to osds belonging to the new osd server. The log files >>>> show the following: >>>> >>>> 2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN] >>>> slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014: >>>> osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621 >>>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096] >>>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for >>>> subops from 22 >>>> 2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN] >>>> slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169: >>>> osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621 >>>> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192] >>>> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for >>>> rw locks >>>> >>>> >>>> >>>> There are practically every osd involved in the slow requests and they tend to >>>> be between the old two osd servers and the new one. There were no issues as far >>>> as I can see between the old two servers. >>>> >>>> The first thing i've checked is the networking. No issue was identified from >>>> running ping -i .1 <servername> as well as using hping3 for the tcp connection >>>> checks. The network tests were running for over a week and not a single packet >>>> was lost. The slow requests took place while the network tests were running. >>>> >>>> I've also checked the osd and ssd disks and I was not able to identify anything >>>> problematic. >>>> >>>> Stopping all osds on the new server causes no issues between the old two osd >>>> servers. I've left the new server disconnected for a few days and had no issues >>>> with the cluster. >>>> >>>> I am a bit lost on what else to try and how to debug the issue. Could someone >>>> please help me? >>>> >>> >>> I would still say this is a network issue. >>> >>> "currently waiting for rw locks" is usually a network problem. >>> >>> I found this out myself a few weeks ago: >>> http://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/ >>> >>> The problem there was a wrong gateway on some machines. >>> >>> In that situation the OSDs could talk just fine, but they had problems with >>> sending traffic back to the clients which lead to buffers filling up. >>> >>> I would suggest that you check if the network towards clients is also OK. >>> >>> Wido >>> >>>> Many thanks >>>> >>>> Andrei >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com