Re: Hammer broke after adding 3rd osd server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A quick update on the case.

I think i've isolated the problem. I've spent a while checking the osd servers for differences in configuration. I've noticed two distinctions. The first one being the sysctl.conf tuning options for ipoib, which were not present on the new server. The second one is the cron.daily schedule for the mlocate. 

The mlocate was starting at 6:25 in the morning and this is the reason why it was initially overlooked as the problems tend to start after 7:20am. The old osd servers had mlocate service disabled completely.

Last night I've adjusted the sysctl.conf options and disabled the mlocate and so far so good. The cluster did not fall to bits in the morning, which is a great sign. My guess is that the mlocate service was utilising the osd disk(s) which was causing the slow requests. 


Many thanks for all your help

Andrei


----- Original Message -----
> From: "andrei" <andrei@xxxxxxxxxx>
> To: "Wido den Hollander" <wido@xxxxxxxx>
> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
> Sent: Thursday, 28 April, 2016 15:23:06
> Subject: Re:  Hammer broke after adding 3rd osd server

> Hello guys,
> 
> Done a bit of digging over the last few days and collected a bunch of logs from
> the osd servers including ops in flight. Will be digging through the data later
> on today.
> 
> I've also done a bunch of network connectivity tests and can verify that I did
> not find any evidence of network issues. The ping and hping (tcp) tests were
> running over the past few days and did not show any errors or packet drops or
> similar issues. The network interface stats reflect that as well. I've ran the
> network tests between all osds servers and cloud host servers. The network
> interfaces are configured to use the same mtu of 65520 (ipoib interface).
> 
> What I did notice today is once again, the problem tend to start between 7:20
> and 7:40 in the morning, produce a tons of slow requests between the old two
> osd servers and the new osd server. The slow requests go away for about 20-30
> mins and return and keep returning after about 20-30 minutes. During the slow
> requests the ceph-osd processes go nuts on the new osd server only. They
> consume all cpu and the server load goes to 300+. A few hours into the slow
> requests cycle i've stopped the ceph-osd processes on one of the old osd
> servers, but the problem does not go away. The only thing that help the cluster
> is a full reboot of the new osd server. After the reboot the slow requests do
> not come back until the next morning.
> 
> If anyone has an idea what else I could try, please let me know.
> 
> Andrei
> 
> ----- Original Message -----
>> From: "Wido den Hollander" <wido@xxxxxxxx>
>> To: "andrei" <andrei@xxxxxxxxxx>
>> Cc: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
>> Sent: Tuesday, 26 April, 2016 22:18:37
>> Subject: Re:  Hammer broke after adding 3rd osd server
> 
>>> Op 26 april 2016 om 22:31 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>>> 
>>> 
>>> Hi Wido,
>>> 
>>> Thanks for your reply. We have a very simple ceph network. A single 40gbit/s
>>> infiniband switch where the osd servers and hosts are connected to. There are
>>> no default gates on the storage network. The IB is used only for ceph;
>>> everything else goes over the ethernet.
>>> 
>>> I've checked the stats on the IB interfaces of osd servers and there are no
>>> errors. The ipoib interface has a very small number of dropped packets
>>> (0.0003%).
>>> 
>>> What kind of network tests would you suggest that I run? What do you mean by "I
>>> would suggest that you check if the network towards clients is also OK."? By
>>> clients do you mean the host servers?
>>> 
>> 
>> With clients I mean you verify if the hosts talking to the Ceph cluster can
>> reach each machine running OSDs.
>> 
>> In my case there was packet loss from certain clients which caused the issues to
>> occur.
>> 
>> Wido
>> 
>>> Many thanks
>>> 
>>> Andrei
>>> 
>>> ----- Original Message -----
>>> > From: "Wido den Hollander" <wido@xxxxxxxx>
>>> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx>, "Andrei Mikhailovsky"
>>> > <andrei@xxxxxxxxxx>
>>> > Sent: Tuesday, 26 April, 2016 21:17:59
>>> > Subject: Re:  Hammer broke after adding 3rd osd server
>>> 
>>> >> Op 26 april 2016 om 17:52 schreef Andrei Mikhailovsky <andrei@xxxxxxxxxx>:
>>> >> 
>>> >> 
>>> >> Hello everyone,
>>> >> 
>>> >> I've recently performed a hardware upgrade on our small two osd server ceph
>>> >> cluster, which seems to have broke the ceph cluster. We are using ceph for
>>> >> cloudstack rbd images for vms.All of our servers are Ubuntu 14.04 LTS with
>>> >> latest updates and kernel 4.4.6 from ubuntu repo.
>>> >> 
>>> >> Previous hardware:
>>> >> 
>>> >> 2 x osd servers with 9 sas osds, 32gb ram and 12 core Intel cpu 2620 @ 2Ghz each
>>> >> and 2 consumer SSDs for journal. Infiniband 40gbit/s networking using IPoIB.
>>> >> 
>>> >> The following things were upgraded:
>>> >> 
>>> >> 1. journal ssds were upgraded from consumer ssd to Intel 3710 200gb. We now have
>>> >> 5 osds per single ssd.
>>> >> 2. added additional osd server with 64gb ram, 10 osds, Intel 2670 cpu @ 2.6Ghz
>>> >> 3. Upgraded ram on osd servers to become 64gb
>>> >> 4. Installed additional osd disk to have 10 osds per server.
>>> >> 
>>> >> After adding the third osd server and finishing the initial sync, the cluster
>>> >> worked okay for 1-2 days. No issues were noticed. On a third day my monitoring
>>> >> system started reporting a bunch of issues from the ceph cluster as well as
>>> >> from our virtual machines. This tend to happen between 7:20am and 7:40am and
>>> >> lasts for about 2-3 hours before things become normal again. I've checked the
>>> >> osd servers and there is nothing that I could find in cron or otherwise that
>>> >> starts around 7:20am.
>>> >> 
>>> >> The problem is as follows: the new osd server's load goes to 400+ with ceph-osd
>>> >> processes consuming all cpu resources. The ceph -w shows a high number of slow
>>> >> requests which relate to osds belonging to the new osd server. The log files
>>> >> show the following:
>>> >> 
>>> >> 2016-04-20 07:39:04.346459 osd.7 192.168.168.200:6813/2650 2 : cluster [WRN]
>>> >> slow request 30.032033 seconds old, received at 2016-04-20 07:38:34.314014:
>>> >> osd_op(client.140476549.0:13203438 rbd_data.2c9de71520eedd1.0000000000000621
>>> >> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 2572288~4096]
>>> >> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>>> >> subops from 22
>>> >> 2016-04-20 07:39:04.346465 osd.7 192.168.168.200:6813/2650 3 : cluster [WRN]
>>> >> slow request 30.031878 seconds old, received at 2016-04-20 07:38:34.314169:
>>> >> osd_op(client.140476549.0:13203439 rbd_data.2c9de71520eedd1.0000000000000621
>>> >> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1101824~8192]
>>> >> 5.6c3bece2 ack+ondisk+write+known_if_redirected e83912) currently waiting for
>>> >> rw locks
>>> >> 
>>> >> 
>>> >> 
>>> >> There are practically every osd involved in the slow requests and they tend to
>>> >> be between the old two osd servers and the new one. There were no issues as far
>>> >> as I can see between the old two servers.
>>> >> 
>>> >> The first thing i've checked is the networking. No issue was identified from
>>> >> running ping -i .1 <servername> as well as using hping3 for the tcp connection
>>> >> checks. The network tests were running for over a week and not a single packet
>>> >> was lost. The slow requests took place while the network tests were running.
>>> >> 
>>> >> I've also checked the osd and ssd disks and I was not able to identify anything
>>> >> problematic.
>>> >> 
>>> >> Stopping all osds on the new server causes no issues between the old two osd
>>> >> servers. I've left the new server disconnected for a few days and had no issues
>>> >> with the cluster.
>>> >> 
>>> >> I am a bit lost on what else to try and how to debug the issue. Could someone
>>> >> please help me?
>>> >> 
>>> > 
>>> > I would still say this is a network issue.
>>> > 
>>> > "currently waiting for rw locks" is usually a network problem.
>>> > 
>>> > I found this out myself a few weeks ago:
>>> > http://blog.widodh.nl/2016/01/slow-requests-with-ceph-waiting-for-rw-locks/
>>> > 
>>> > The problem there was a wrong gateway on some machines.
>>> > 
>>> > In that situation the OSDs could talk just fine, but they had problems with
>>> > sending traffic back to the clients which lead to buffers filling up.
>>> > 
>>> > I would suggest that you check if the network towards clients is also OK.
>>> > 
>>> > Wido
>>> > 
>>> >> Many thanks
>>> >> 
>>> >> Andrei
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users@xxxxxxxxxxxxxx
>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux