Ah right....this is where it gets interesting. You are probably hitting a cache full on a PG somewhere which is either making everything wait until it flushes or something like that. What cache settings have you got set? I assume you have SSD's for the cache tier? Can you share the size of the pool. If possible could you also create a non tiered test pool and do some benchmarks on that to rule out any issue with the hardware and OSD's. > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Lincoln Bryant > Sent: 17 September 2015 17:54 > To: Nick Fisk <nick@xxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Ceph cluster NO read / write performance :: Ops > are blocked > > Hi Nick, > > Thanks for responding. Yes, I am. > > —Lincoln > > > On Sep 17, 2015, at 11:53 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > > > You are getting a fair amount of reads on the disks whilst doing these > writes. You're not using cache tiering are you? > > > >> -----Original Message----- > >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf > >> Of Lincoln Bryant > >> Sent: 17 September 2015 17:42 > >> To: ceph-users@xxxxxxxxxxxxxx > >> Subject: Re: Ceph cluster NO read / write performance :: > >> Ops are blocked > >> > >> Hello again, > >> > >> Well, I disabled offloads on the NIC -- didn’t work for me. I also > >> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested elsewhere > >> in the thread to no avail. > >> > >> Today I was watching iostat on an OSD box ('iostat -xm 5') when the > >> cluster got into “slow” state: > >> > >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu- > sz > >> await svctm %util > >> sdb 0.00 13.57 84.23 167.47 0.45 2.78 26.26 2.06 8.18 > 3.85 > >> 96.93 > >> sdc 0.00 46.71 5.59 289.22 0.03 2.54 17.85 3.18 10.77 > 0.97 > >> 28.72 > >> sdd 0.00 16.57 45.11 91.62 0.25 0.55 12.01 0.75 5.51 > 2.45 > >> 33.47 > >> sde 0.00 13.57 6.99 143.31 0.03 2.53 34.97 1.99 13.27 > 2.12 > >> 31.86 > >> sdf 0.00 18.76 4.99 158.48 0.10 1.09 14.88 1.26 7.69 1.24 > >> 20.26 > >> sdg 0.00 25.55 81.64 237.52 0.44 2.89 21.36 4.14 12.99 > 2.58 > >> 82.22 > >> sdh 0.00 89.42 16.17 492.42 0.09 3.81 15.69 17.12 33.66 > 0.73 > >> 36.95 > >> sdi 0.00 20.16 17.76 189.62 0.10 1.67 17.46 3.45 16.63 > 1.57 > >> 32.55 > >> sdj 0.00 31.54 0.00 185.23 0.00 1.91 21.15 3.33 18.00 > 0.03 > >> 0.62 > >> sdk 0.00 26.15 2.40 133.33 0.01 0.84 12.79 1.07 7.87 > 0.85 > >> 11.58 > >> sdl 0.00 25.55 9.38 123.95 0.05 1.15 18.44 0.50 3.74 1.58 > >> 21.10 > >> sdm 0.00 6.39 92.61 47.11 0.47 0.26 10.65 1.27 9.07 > 6.92 > >> 96.73 > >> > >> The %util is rather high on some disks, but I’m not an expert at > >> looking at iostat so I’m not sure how worrisome this is. Does > >> anything here stand out to anyone? > >> > >> At the time of that iostat, Ceph was reporting a lot of blocked ops > >> on the OSD associated with sde (as well as about 30 other OSDs), but > >> it doesn’t look all that busy. Some simple ‘dd’ tests seem to indicate the > disk is fine. > >> > >> Similarly, iotop seems OK on this host: > >> > >> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND > >> 472477 be/4 root 0.00 B/s 5.59 M/s 0.00 % 0.57 % ceph-osd -i 111 -- > pid- > >> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 470621 be/4 root 0.00 B/s 10.09 M/s 0.00 % 0.40 % ceph-osd -i 111 -- > pid- > >> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3495447 be/4 root 0.00 B/s 272.19 K/s 0.00 % 0.36 % ceph-osd -i 114 -- > >> pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488389 be/4 root 0.00 B/s 596.80 K/s 0.00 % 0.16 % ceph-osd -i 109 -- > >> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488060 be/4 root 0.00 B/s 600.83 K/s 0.00 % 0.15 % ceph-osd -i 108 -- > >> pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3505573 be/4 root 0.00 B/s 528.25 K/s 0.00 % 0.10 % ceph-osd -i 119 -- > >> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3495434 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.10 % ceph-osd -i 114 -- > pid- > >> file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3502327 be/4 root 0.00 B/s 506.07 K/s 0.00 % 0.09 % ceph-osd -i 118 -- > >> pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3489100 be/4 root 0.00 B/s 106.86 K/s 0.00 % 0.09 % ceph-osd -i 110 -- > >> pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3496631 be/4 root 0.00 B/s 229.85 K/s 0.00 % 0.05 % ceph-osd -i 115 -- > >> pid-file /var/run/ceph/osd.115.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3505561 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd -i 119 -- > >> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488059 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd -i 108 -- > pid- > >> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488391 be/4 root 46.37 K/s 431.47 K/s 0.00 % 0.02 % ceph-osd -i 109 - > - > >> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3500639 be/4 root 0.00 B/s 221.78 K/s 0.00 % 0.02 % ceph-osd -i 117 -- > >> pid-file /var/run/ceph/osd.117.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488392 be/4 root 34.28 K/s 185.49 K/s 0.00 % 0.02 % ceph-osd -i 109 - > - > >> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster ceph > >> 3488062 be/4 root 4.03 K/s 66.54 K/s 0.00 % 0.02 % ceph-osd -i 108 -- > pid- > >> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph > >> > >> These are all 6TB seagates in single-disk RAID 0 on a PERC H730 Mini > >> controller. > >> > >> I did try removing the disk with 20k non-medium errors, but that > >> didn’t seem to help. > >> > >> Thanks for any insight! > >> > >> Cheers, > >> Lincoln Bryant > >> > >>> On Sep 9, 2015, at 1:09 PM, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> > wrote: > >>> > >>> Hi Jan, > >>> > >>> I’ll take a look at all of those things and report back (hopefully > >>> :)) > >>> > >>> I did try setting all of my OSDs to writethrough instead of > >>> writeback on the > >> controller, which was significantly more consistent in performance > >> (from 1100MB/s down to 300MB/s, but still occasionally dropping to > >> 0MB/s). Still plenty of blocked ops. > >>> > >>> I was wondering if not-so-nicely failing OSD(s) might be the cause. > >>> My > >> controller (PERC H730 Mini) seems frustratingly terse with SMART > >> information, but at least one disk has a “Non-medium error count” of > >> over 20,000.. > >>> > >>> I’ll try disabling offloads as well. > >>> > >>> Thanks much for the suggestions! > >>> > >>> Cheers, > >>> Lincoln > >>> > >>>> On Sep 9, 2015, at 3:59 AM, Jan Schermer <jan@xxxxxxxxxxx> wrote: > >>>> > >>>> Just to recapitulate - the nodes are doing "nothing" when it drops to > zero? > >> Not flushing something to drives (iostat)? Not cleaning pagecache > >> (kswapd and similiar)? Not out of any type of memory (slab, > >> min_free_kbytes)? Not network link errors, no bad checksums (those are > hard to spot, though)? > >>>> > >>>> Unless you find something I suggest you try disabling offloads on > >>>> the NICs > >> and see if the problem goes away. > >>>> > >>>> Jan > >>>> > >>>>> On 08 Sep 2015, at 18:26, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> > wrote: > >>>>> > >>>>> For whatever it’s worth, my problem has returned and is very > >>>>> similar to > >> yours. Still trying to figure out what’s going on over here. > >>>>> > >>>>> Performance is nice for a few seconds, then goes to 0. This is a > >>>>> similar setup to yours (12 OSDs per box, Scientific Linux 6, Ceph > >>>>> 0.94.3, etc) > >>>>> > >>>>> 384 16 29520 29504 307.287 1188 0.0492006 0.208259 > >>>>> 385 16 29813 29797 309.532 1172 0.0469708 0.206731 > >>>>> 386 16 30105 30089 311.756 1168 0.0375764 0.205189 > >>>>> 387 16 30401 30385 314.009 1184 0.036142 0.203791 > >>>>> 388 16 30695 30679 316.231 1176 0.0372316 0.202355 > >>>>> 389 16 30987 30971 318.42 1168 0.0660476 0.200962 > >>>>> 390 16 31282 31266 320.628 1180 0.0358611 0.199548 > >>>>> 391 16 31568 31552 322.734 1144 0.0405166 0.198132 > >>>>> 392 16 31857 31841 324.859 1156 0.0360826 0.196679 > >>>>> 393 16 32090 32074 326.404 932 0.0416869 0.19549 > >>>>> 394 16 32205 32189 326.743 460 0.0251877 0.194896 > >>>>> 395 16 32302 32286 326.897 388 0.0280574 0.194395 > >>>>> 396 16 32348 32332 326.537 184 0.0256821 0.194157 > >>>>> 397 16 32385 32369 326.087 148 0.0254342 0.193965 > >>>>> 398 16 32424 32408 325.659 156 0.0263006 0.193763 > >>>>> 399 16 32445 32429 325.054 84 0.0233839 0.193655 > >>>>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg lat: > >> 0.193655 > >>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > >>>>> 400 16 32445 32429 324.241 0 - 0.193655 > >>>>> 401 16 32445 32429 323.433 0 - 0.193655 > >>>>> 402 16 32445 32429 322.628 0 - 0.193655 > >>>>> 403 16 32445 32429 321.828 0 - 0.193655 > >>>>> 404 16 32445 32429 321.031 0 - 0.193655 > >>>>> 405 16 32445 32429 320.238 0 - 0.193655 > >>>>> 406 16 32445 32429 319.45 0 - 0.193655 > >>>>> 407 16 32445 32429 318.665 0 - 0.193655 > >>>>> > >>>>> needless to say, very strange. > >>>>> > >>>>> —Lincoln > >>>>> > >>>>> > >>>>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh > >> <vickey.singh22693@xxxxxxxxx> wrote: > >>>>>> > >>>>>> Adding ceph-users. > >>>>>> > >>>>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh > >> <vickey.singh22693@xxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke > >> <ulembke@xxxxxxxxxxxx> wrote: > >>>>>> Hi Vickey, > >>>>>> Thanks for your time in replying to my problem. > >>>>>> > >>>>>> I had the same rados bench output after changing the motherboard > >>>>>> of > >> the monitor node with the lowest IP... > >>>>>> Due to the new mainboard, I assume the hw-clock was wrong during > >> startup. Ceph health show no errors, but all VMs aren't able to do IO > >> (very high load on the VMs - but no traffic). > >>>>>> I stopped the mon, but this don't changed anything. I had to > >>>>>> restart all > >> other mons to get IO again. After that I started the first mon also > >> (with the right time now) and all worked fine again... > >>>>>> > >>>>>> Thanks i will try to restart all OSD / MONS and report back , if > >>>>>> it solves my problem > >>>>>> > >>>>>> Another posibility: > >>>>>> Do you use journal on SSDs? Perhaps the SSDs can't write to > >>>>>> garbage > >> collection? > >>>>>> > >>>>>> No i don't have journals on SSD , they are on the same OSD disk. > >>>>>> > >>>>>> > >>>>>> > >>>>>> Udo > >>>>>> > >>>>>> > >>>>>> On 07.09.2015 16:36, Vickey Singh wrote: > >>>>>>> Dear Experts > >>>>>>> > >>>>>>> Can someone please help me , why my cluster is not able write > data. > >>>>>>> > >>>>>>> See the below output cur MB/S is 0 and Avg MB/s is decreasing. > >>>>>>> > >>>>>>> > >>>>>>> Ceph Hammer 0.94.2 > >>>>>>> CentOS 6 (3.10.69-1) > >>>>>>> > >>>>>>> The Ceph status says OPS are blocked , i have tried checking , > >>>>>>> what all i know > >>>>>>> > >>>>>>> - System resources ( CPU , net, disk , memory ) -- All normal > >>>>>>> - 10G network for public and cluster network -- no saturation > >>>>>>> - Add disks are physically healthy > >>>>>>> - No messages in /var/log/messages OR dmesg > >>>>>>> - Tried restarting OSD which are blocking operation , but no > >>>>>>> luck > >>>>>>> - Tried writing through RBD and Rados bench , both are giving > >>>>>>> same problemm > >>>>>>> > >>>>>>> Please help me to fix this problem. > >>>>>>> > >>>>>>> # rados bench -p rbd 60 write > >>>>>>> Maintaining 16 concurrent writes of 4194304 bytes for up to 60 > >>>>>>> seconds or 0 objects Object prefix: > benchmark_data_stor1_1791844 > >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > >>>>>>> 0 0 0 0 0 0 - 0 > >>>>>>> 1 16 125 109 435.873 436 0.022076 0.0697864 > >>>>>>> 2 16 139 123 245.948 56 0.246578 0.0674407 > >>>>>>> 3 16 139 123 163.969 0 - 0.0674407 > >>>>>>> 4 16 139 123 122.978 0 - 0.0674407 > >>>>>>> 5 16 139 123 98.383 0 - 0.0674407 > >>>>>>> 6 16 139 123 81.9865 0 - 0.0674407 > >>>>>>> 7 16 139 123 70.2747 0 - 0.0674407 > >>>>>>> 8 16 139 123 61.4903 0 - 0.0674407 > >>>>>>> 9 16 139 123 54.6582 0 - 0.0674407 > >>>>>>> 10 16 139 123 49.1924 0 - 0.0674407 > >>>>>>> 11 16 139 123 44.7201 0 - 0.0674407 > >>>>>>> 12 16 139 123 40.9934 0 - 0.0674407 > >>>>>>> 13 16 139 123 37.8401 0 - 0.0674407 > >>>>>>> 14 16 139 123 35.1373 0 - 0.0674407 > >>>>>>> 15 16 139 123 32.7949 0 - 0.0674407 > >>>>>>> 16 16 139 123 30.7451 0 - 0.0674407 > >>>>>>> 17 16 139 123 28.9364 0 - 0.0674407 > >>>>>>> 18 16 139 123 27.3289 0 - 0.0674407 > >>>>>>> 19 16 139 123 25.8905 0 - 0.0674407 > >>>>>>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 avg > lat: > >> 0.0674407 > >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > >>>>>>> 20 16 139 123 24.596 0 - 0.0674407 > >>>>>>> 21 16 139 123 23.4247 0 - 0.0674407 > >>>>>>> 22 16 139 123 22.36 0 - 0.0674407 > >>>>>>> 23 16 139 123 21.3878 0 - 0.0674407 > >>>>>>> 24 16 139 123 20.4966 0 - 0.0674407 > >>>>>>> 25 16 139 123 19.6768 0 - 0.0674407 > >>>>>>> 26 16 139 123 18.92 0 - 0.0674407 > >>>>>>> 27 16 139 123 18.2192 0 - 0.0674407 > >>>>>>> 28 16 139 123 17.5686 0 - 0.0674407 > >>>>>>> 29 16 139 123 16.9628 0 - 0.0674407 > >>>>>>> 30 16 139 123 16.3973 0 - 0.0674407 > >>>>>>> 31 16 139 123 15.8684 0 - 0.0674407 > >>>>>>> 32 16 139 123 15.3725 0 - 0.0674407 > >>>>>>> 33 16 139 123 14.9067 0 - 0.0674407 > >>>>>>> 34 16 139 123 14.4683 0 - 0.0674407 > >>>>>>> 35 16 139 123 14.0549 0 - 0.0674407 > >>>>>>> 36 16 139 123 13.6645 0 - 0.0674407 > >>>>>>> 37 16 139 123 13.2952 0 - 0.0674407 > >>>>>>> 38 16 139 123 12.9453 0 - 0.0674407 > >>>>>>> 39 16 139 123 12.6134 0 - 0.0674407 > >>>>>>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 avg > lat: > >> 0.0674407 > >>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat > >>>>>>> 40 16 139 123 12.2981 0 - 0.0674407 > >>>>>>> 41 16 139 123 11.9981 0 - 0.0674407 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8 > >>>>>>> health HEALTH_WARN > >>>>>>> 1 requests are blocked > 32 sec monmap e3: 3 mons at > >>>>>>> {stor0111=10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,stor0 > >>>>>>> 11 > >>>>>>> 5=10.100.1.115:6789/0} > >>>>>>> election epoch 32, quorum 0,1,2 > >>>>>>> stor0111,stor0113,stor0115 osdmap e19536: 50 osds: 50 up, 50 in > >>>>>>> pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 kobjects > >>>>>>> 91513 GB used, 47642 GB / 135 TB avail > >>>>>>> 2752 active+clean > >>>>>>> > >>>>>>> > >>>>>>> Tried using RBD > >>>>>>> > >>>>>>> > >>>>>>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct > >>>>>>> 10000+0 records in > >>>>>>> 10000+0 records out > >>>>>>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s > >>>>>>> > >>>>>>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct > >>>>>>> 100+0 records in > >>>>>>> 100+0 records out > >>>>>>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s > >>>>>>> > >>>>>>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct > >>>>>>> 1+0 records in > >>>>>>> 1+0 records out > >>>>>>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s ]# > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> ceph-users mailing list > >>>>>>> > >>>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> ceph-users mailing list > >>>>>> ceph-users@xxxxxxxxxxxxxx > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>>> > >>>>> _______________________________________________ > >>>>> ceph-users mailing list > >>>>> ceph-users@xxxxxxxxxxxxxx > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> > >>> > >>> _______________________________________________ > >>> ceph-users mailing list > >>> ceph-users@xxxxxxxxxxxxxx > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@xxxxxxxxxxxxxx > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com