Re: Ceph cluster NO read / write performance :: Ops are blocked

Bill Sanders <billysanders@xxxxxxxxx> · Wed, 9 Sep 2015 12:05:18 -0700

We were experiencing something similar in our setup (rados bench does some work, then comes to a screeching halt).  No pattern to which OSD's were causing the problem, though.  Sounds like similar hardware (This was on Dell R720xd, and yeah, that controller is suuuper frustrating).

For us, setting tcp_moderate_rcvbuf to 0 on all nodes solved the issue.

echo 0 > /proc/sys/net/ipv4/tcp_moderate_rcvbuf

Or set it in /etc/sysctl.conf:

net.ipv4.tcp_moderate_rcvbuf = 0

We figured this out independently after I posted this thread, "Slow/Hung IOs":
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045674.html

Hope this helps

Bill Sanders

On Wed, Sep 9, 2015 at 11:09 AM, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> wrote:
Hi Jan,

I’ll take a look at all of those things and report back (hopefully :))

I did try setting all of my OSDs to writethrough instead of writeback on the controller, which was significantly more consistent in performance (from 1100MB/s down to 300MB/s, but still occasionally dropping to 0MB/s). Still plenty of blocked ops.

I was wondering if not-so-nicely failing OSD(s) might be the cause. My controller (PERC H730 Mini) seems frustratingly terse with SMART information, but at least one disk has a “Non-medium error count” of over 20,000..

I’ll try disabling offloads as well.

Thanks much for the suggestions!

Cheers,

Lincoln

> On Sep 9, 2015, at 3:59 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote:

>

> Just to recapitulate - the nodes are doing "nothing" when it drops to zero? Not flushing something to drives (iostat)? Not cleaning pagecache (kswapd and similiar)? Not out of any type of memory (slab, min_free_kbytes)? Not network link errors, no bad checksums (those are hard to spot, though)?

>

> Unless you find something I suggest you try disabling offloads on the NICs and see if the problem goes away.

>

> Jan

>

>> On 08 Sep 2015, at 18:26, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> wrote:

>>

>> For whatever it’s worth, my problem has returned and is very similar to yours. Still trying to figure out what’s going on over here.

>>

>> Performance is nice for a few seconds, then goes to 0. This is a similar setup to yours (12 OSDs per box, Scientific Linux 6, Ceph 0.94.3, etc)

>>

>> 384      16     29520     29504   307.287      1188 0.0492006  0.208259

>> 385      16     29813     29797   309.532      1172 0.0469708  0.206731

>> 386      16     30105     30089   311.756      1168 0.0375764  0.205189

>> 387      16     30401     30385   314.009      1184  0.036142  0.203791

>> 388      16     30695     30679   316.231      1176 0.0372316  0.202355

>> 389      16     30987     30971    318.42      1168 0.0660476  0.200962

>> 390      16     31282     31266   320.628      1180 0.0358611  0.199548

>> 391      16     31568     31552   322.734      1144 0.0405166  0.198132

>> 392      16     31857     31841   324.859      1156 0.0360826  0.196679

>> 393      16     32090     32074   326.404       932 0.0416869   0.19549

>> 394      16     32205     32189   326.743       460 0.0251877  0.194896

>> 395      16     32302     32286   326.897       388 0.0280574  0.194395

>> 396      16     32348     32332   326.537       184 0.0256821  0.194157

>> 397      16     32385     32369   326.087       148 0.0254342  0.193965

>> 398      16     32424     32408   325.659       156 0.0263006  0.193763

>> 399      16     32445     32429   325.054        84 0.0233839  0.193655

>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg lat: 0.193655

>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat

>> 400      16     32445     32429   324.241         0         -  0.193655

>> 401      16     32445     32429   323.433         0         -  0.193655

>> 402      16     32445     32429   322.628         0         -  0.193655

>> 403      16     32445     32429   321.828         0         -  0.193655

>> 404      16     32445     32429   321.031         0         -  0.193655

>> 405      16     32445     32429   320.238         0         -  0.193655

>> 406      16     32445     32429    319.45         0         -  0.193655

>> 407      16     32445     32429   318.665         0         -  0.193655

>>

>> needless to say, very strange.

>>

>> —Lincoln

>>

>>

>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh <vickey.singh22693@xxxxxxxxx> wrote:

>>>

>>> Adding ceph-users.

>>>

>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh <vickey.singh22693@xxxxxxxxx> wrote:

>>>

>>>

>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke <ulembke@xxxxxxxxxxxx> wrote:

>>> Hi Vickey,

>>> Thanks for your time in replying to my problem.

>>>

>>> I had the same rados bench output after changing the motherboard of the monitor node with the lowest IP...

>>> Due to the new mainboard, I assume the hw-clock was wrong during startup. Ceph health show no errors, but all VMs aren't able to do IO (very high load on the VMs - but no traffic).

>>> I stopped the mon, but this don't changed anything. I had to restart all other mons to get IO again. After that I started the first mon also (with the right time now) and all worked fine again...

>>>

>>> Thanks i will try to restart all OSD / MONS and report back , if it solves my problem

>>>

>>> Another posibility:

>>> Do you use journal on SSDs? Perhaps the SSDs can't write to garbage collection?

>>>

>>> No i don't have journals on SSD , they are on the same OSD disk.

>>>

>>>

>>>

>>> Udo

>>>

>>>

>>> On 07.09.2015 16:36, Vickey Singh wrote:

>>>> Dear Experts

>>>>

>>>> Can someone please help me , why my cluster is not able write data.

>>>>

>>>> See the below output  cur MB/S  is 0  and Avg MB/s is decreasing.

>>>>

>>>>

>>>> Ceph Hammer  0.94.2

>>>> CentOS 6 (3.10.69-1)

>>>>

>>>> The Ceph status says OPS are blocked , i have tried checking , what all i know

>>>>

>>>> - System resources ( CPU , net, disk , memory )    -- All normal

>>>> - 10G network for public and cluster network  -- no saturation

>>>> - Add disks are physically healthy

>>>> - No messages in /var/log/messages OR dmesg

>>>> - Tried restarting OSD which are blocking operation , but no luck

>>>> - Tried writing through RBD  and Rados bench , both are giving same problemm

>>>>

>>>> Please help me to fix this problem.

>>>>

>>>> #  rados bench -p rbd 60 write

>>>> Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 objects

>>>> Object prefix: benchmark_data_stor1_1791844

>>>>  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat

>>>>    0       0         0         0         0         0         -         0

>>>>    1      16       125       109   435.873       436  0.022076 0.0697864

>>>>    2      16       139       123   245.948        56  0.246578 0.0674407

>>>>    3      16       139       123   163.969         0         - 0.0674407

>>>>    4      16       139       123   122.978         0         - 0.0674407

>>>>    5      16       139       123    98.383         0         - 0.0674407

>>>>    6      16       139       123   81.9865         0         - 0.0674407

>>>>    7      16       139       123   70.2747         0         - 0.0674407

>>>>    8      16       139       123   61.4903         0         - 0.0674407

>>>>    9      16       139       123   54.6582         0         - 0.0674407

>>>>   10      16       139       123   49.1924         0         - 0.0674407

>>>>   11      16       139       123   44.7201         0         - 0.0674407

>>>>   12      16       139       123   40.9934         0         - 0.0674407

>>>>   13      16       139       123   37.8401         0         - 0.0674407

>>>>   14      16       139       123   35.1373         0         - 0.0674407

>>>>   15      16       139       123   32.7949         0         - 0.0674407

>>>>   16      16       139       123   30.7451         0         - 0.0674407

>>>>   17      16       139       123   28.9364         0         - 0.0674407

>>>>   18      16       139       123   27.3289         0         - 0.0674407

>>>>   19      16       139       123   25.8905         0         - 0.0674407

>>>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 avg lat: 0.0674407

>>>>  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat

>>>>   20      16       139       123    24.596         0         - 0.0674407

>>>>   21      16       139       123   23.4247         0         - 0.0674407

>>>>   22      16       139       123     22.36         0         - 0.0674407

>>>>   23      16       139       123   21.3878         0         - 0.0674407

>>>>   24      16       139       123   20.4966         0         - 0.0674407

>>>>   25      16       139       123   19.6768         0         - 0.0674407

>>>>   26      16       139       123     18.92         0         - 0.0674407

>>>>   27      16       139       123   18.2192         0         - 0.0674407

>>>>   28      16       139       123   17.5686         0         - 0.0674407

>>>>   29      16       139       123   16.9628         0         - 0.0674407

>>>>   30      16       139       123   16.3973         0         - 0.0674407

>>>>   31      16       139       123   15.8684         0         - 0.0674407

>>>>   32      16       139       123   15.3725         0         - 0.0674407

>>>>   33      16       139       123   14.9067         0         - 0.0674407

>>>>   34      16       139       123   14.4683         0         - 0.0674407

>>>>   35      16       139       123   14.0549         0         - 0.0674407

>>>>   36      16       139       123   13.6645         0         - 0.0674407

>>>>   37      16       139       123   13.2952         0         - 0.0674407

>>>>   38      16       139       123   12.9453         0         - 0.0674407

>>>>   39      16       139       123   12.6134         0         - 0.0674407

>>>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 avg lat: 0.0674407

>>>>  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat

>>>>   40      16       139       123   12.2981         0         - 0.0674407

>>>>   41      16       139       123   11.9981         0         - 0.0674407

>>>>

>>>>

>>>>

>>>>

>>>>   cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8

>>>>    health HEALTH_WARN

>>>>           1 requests are blocked > 32 sec

>>>>    monmap e3: 3 mons at {stor0111=10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,stor011

>>>> 5=10.100.1.115:6789/0}

>>>>           election epoch 32, quorum 0,1,2 stor0111,stor0113,stor0115

>>>>    osdmap e19536: 50 osds: 50 up, 50 in

>>>>     pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 kobjects

>>>>           91513 GB used, 47642 GB / 135 TB avail

>>>>               2752 active+clean

>>>>

>>>>

>>>> Tried using RBD

>>>>

>>>>

>>>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct

>>>> 10000+0 records in

>>>> 10000+0 records out

>>>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s

>>>>

>>>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct

>>>> 100+0 records in

>>>> 100+0 records out

>>>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s

>>>>

>>>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct

>>>> 1+0 records in

>>>> 1+0 records out

>>>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s

>>>> ]#

>>>>

>>>>

>>>>

>>>>

>>>>

>>>>

>>>>

>>>>

>>>> _______________________________________________

>>>> ceph-users mailing list

>>>>

>>>> ceph-users@xxxxxxxxxxxxxx

>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>

>>>

>>>

>>> _______________________________________________

>>> ceph-users mailing list

>>> ceph-users@xxxxxxxxxxxxxx

>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>

>> _______________________________________________

>> ceph-users mailing list

>> ceph-users@xxxxxxxxxxxxxx

>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com