Hi Nick, Thanks for the detailed response and insight. SSDs are indeed definitely on the to-buy list. I will certainly try to rule out any hardware issues in the meantime. Cheers, Lincoln > On Sep 17, 2015, at 12:53 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > It's probably helped but I fear that your overall design is not going to work well for you. Cache Tier + Base tier + journals on the same disks is going to really hurt. > > The problem when using cache tiering (especially with EC pools in future releases) is that to modify a block that isn't in the cache tier you have to promote it 1st, which often kicks another block out the cache. > > So worse case you could have for a single write > > R from EC -> W to CT + jrnl W -> W actual data to CT + jrnl W -> R from CT -> W to EC + jrnl W > > Plus any metadata updates. Either way you looking at probably somewhere near a 10x write amplification for 4MB writes, which will quickly overload your disks leading to very slow performance. Smaller IO's would still cause 4MB blocks to be shifted between pools. What makes it worse is that these promotions/evictions tend to happen to hot PG's and not spread round the whole cluster meaning that a single hot OSD can hold up writes across the whole pool. > > I know it's not what you want to hear, but I can't think of anything you can do to help in this instance unless you are willing to get some SSD journals and maybe move the Cache pool on to separate disks or SSD's. Basically try and limit the amount of random IO the disks have to do. > > Of course please do try and find a time to stop all IO and then run the test on the test 3 way pool, to rule out any hardware/OS issues. > > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Lincoln Bryant >> Sent: 17 September 2015 18:36 >> To: Nick Fisk <nick@xxxxxxxxxx> >> Cc: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: Ceph cluster NO read / write performance :: Ops >> are blocked >> >> Just a small update — the blocked ops did disappear after doubling the >> target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this blocked >> ops problem about 10 times now :) >> >> Assuming this is the issue, is there any workaround for this problem (or is it >> working as intended)? (Should I set up a cron to run cache-try-flush-evict-all >> every night? :)) >> >> Another curious thing is that a rolling restart of all OSDs also seems to fix the >> problem — for a time. I’m not sure how that would fit in if this is the >> problem. >> >> —Lincoln >> >>> On Sep 17, 2015, at 12:07 PM, Lincoln Bryant <lincolnb@xxxxxxxxxxxx> >> wrote: >>> >>> We have CephFS utilizing a cache tier + EC backend. The cache tier and ec >> pool sit on the same spinners — no SSDs. Our cache tier has a >> target_max_bytes of 5TB and the total storage is about 1PB. >>> >>> I do have a separate test pool with 3x replication and no cache tier, and I >> still see significant performance drops and blocked ops with no/minimal >> client I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of >> client write I/O and no active scrubs. The rados bench on my test pool looks >> like this: >>> >>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>> 0 0 0 0 0 0 - 0 >>> 1 31 94 63 251.934 252 0.31017 0.217719 >>> 2 31 103 72 143.969 36 0.978544 0.260631 >>> 3 31 103 72 95.9815 0 - 0.260631 >>> 4 31 111 80 79.9856 16 2.29218 0.476458 >>> 5 31 112 81 64.7886 4 2.5559 0.50213 >>> 6 31 112 81 53.9905 0 - 0.50213 >>> 7 31 115 84 47.9917 6 3.71826 0.615882 >>> 8 31 115 84 41.9928 0 - 0.615882 >>> 9 31 115 84 37.327 0 - 0.615882 >>> 10 31 117 86 34.3942 2.66667 6.73678 0.794532 >>> >>> I’m really leaning more toward it being a weird controller/disk problem. >>> >>> As a test, I suppose I could double the target_max_bytes, just so the cache >> tier stops evicting while client I/O is writing? >>> >>> —Lincoln >>> >>>> On Sep 17, 2015, at 11:59 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: >>>> >>>> Ah right....this is where it gets interesting. >>>> >>>> You are probably hitting a cache full on a PG somewhere which is either >> making everything wait until it flushes or something like that. >>>> >>>> What cache settings have you got set? >>>> >>>> I assume you have SSD's for the cache tier? Can you share the size of the >> pool. >>>> >>>> If possible could you also create a non tiered test pool and do some >> benchmarks on that to rule out any issue with the hardware and OSD's. >>>> >>>>> -----Original Message----- >>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On >>>>> Behalf Of Lincoln Bryant >>>>> Sent: 17 September 2015 17:54 >>>>> To: Nick Fisk <nick@xxxxxxxxxx> >>>>> Cc: ceph-users@xxxxxxxxxxxxxx >>>>> Subject: Re: Ceph cluster NO read / write performance >>>>> :: Ops are blocked >>>>> >>>>> Hi Nick, >>>>> >>>>> Thanks for responding. Yes, I am. >>>>> >>>>> —Lincoln >>>>> >>>>>> On Sep 17, 2015, at 11:53 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: >>>>>> >>>>>> You are getting a fair amount of reads on the disks whilst doing >>>>>> these >>>>> writes. You're not using cache tiering are you? >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On >>>>>>> Behalf Of Lincoln Bryant >>>>>>> Sent: 17 September 2015 17:42 >>>>>>> To: ceph-users@xxxxxxxxxxxxxx >>>>>>> Subject: Re: Ceph cluster NO read / write performance :: >>>>>>> Ops are blocked >>>>>>> >>>>>>> Hello again, >>>>>>> >>>>>>> Well, I disabled offloads on the NIC -- didn’t work for me. I also >>>>>>> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested >>>>>>> elsewhere in the thread to no avail. >>>>>>> >>>>>>> Today I was watching iostat on an OSD box ('iostat -xm 5') when >>>>>>> the cluster got into “slow” state: >>>>>>> >>>>>>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu- >>>>> sz >>>>>>> await svctm %util >>>>>>> sdb 0.00 13.57 84.23 167.47 0.45 2.78 26.26 2.06 8.18 >>>>> 3.85 >>>>>>> 96.93 >>>>>>> sdc 0.00 46.71 5.59 289.22 0.03 2.54 17.85 3.18 10.77 >>>>> 0.97 >>>>>>> 28.72 >>>>>>> sdd 0.00 16.57 45.11 91.62 0.25 0.55 12.01 0.75 5.51 >>>>> 2.45 >>>>>>> 33.47 >>>>>>> sde 0.00 13.57 6.99 143.31 0.03 2.53 34.97 1.99 13.27 >>>>> 2.12 >>>>>>> 31.86 >>>>>>> sdf 0.00 18.76 4.99 158.48 0.10 1.09 14.88 1.26 7.69 >> 1.24 >>>>>>> 20.26 >>>>>>> sdg 0.00 25.55 81.64 237.52 0.44 2.89 21.36 4.14 12.99 >>>>> 2.58 >>>>>>> 82.22 >>>>>>> sdh 0.00 89.42 16.17 492.42 0.09 3.81 15.69 17.12 >> 33.66 >>>>> 0.73 >>>>>>> 36.95 >>>>>>> sdi 0.00 20.16 17.76 189.62 0.10 1.67 17.46 3.45 16.63 >>>>> 1.57 >>>>>>> 32.55 >>>>>>> sdj 0.00 31.54 0.00 185.23 0.00 1.91 21.15 3.33 18.00 >>>>> 0.03 >>>>>>> 0.62 >>>>>>> sdk 0.00 26.15 2.40 133.33 0.01 0.84 12.79 1.07 7.87 >>>>> 0.85 >>>>>>> 11.58 >>>>>>> sdl 0.00 25.55 9.38 123.95 0.05 1.15 18.44 0.50 3.74 >> 1.58 >>>>>>> 21.10 >>>>>>> sdm 0.00 6.39 92.61 47.11 0.47 0.26 10.65 1.27 9.07 >>>>> 6.92 >>>>>>> 96.73 >>>>>>> >>>>>>> The %util is rather high on some disks, but I’m not an expert at >>>>>>> looking at iostat so I’m not sure how worrisome this is. Does >>>>>>> anything here stand out to anyone? >>>>>>> >>>>>>> At the time of that iostat, Ceph was reporting a lot of blocked >>>>>>> ops on the OSD associated with sde (as well as about 30 other >>>>>>> OSDs), but it doesn’t look all that busy. Some simple ‘dd’ tests >>>>>>> seem to indicate the >>>>> disk is fine. >>>>>>> >>>>>>> Similarly, iotop seems OK on this host: >>>>>>> >>>>>>> TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND >>>>>>> 472477 be/4 root 0.00 B/s 5.59 M/s 0.00 % 0.57 % ceph-osd -i 111 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 470621 be/4 root 0.00 B/s 10.09 M/s 0.00 % 0.40 % ceph-osd -i >> 111 -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3495447 be/4 root 0.00 B/s 272.19 K/s 0.00 % 0.36 % ceph-osd -i >> 114 -- >>>>>>> pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488389 be/4 root 0.00 B/s 596.80 K/s 0.00 % 0.16 % ceph-osd - >> i 109 -- >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488060 be/4 root 0.00 B/s 600.83 K/s 0.00 % 0.15 % ceph-osd -i >> 108 -- >>>>>>> pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3505573 be/4 root 0.00 B/s 528.25 K/s 0.00 % 0.10 % ceph-osd -i >> 119 -- >>>>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3495434 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.10 % ceph-osd -i 114 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3502327 be/4 root 0.00 B/s 506.07 K/s 0.00 % 0.09 % ceph-osd -i >> 118 -- >>>>>>> pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3489100 be/4 root 0.00 B/s 106.86 K/s 0.00 % 0.09 % ceph-osd -i >> 110 -- >>>>>>> pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3496631 be/4 root 0.00 B/s 229.85 K/s 0.00 % 0.05 % ceph-osd -i >> 115 -- >>>>>>> pid-file /var/run/ceph/osd.115.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3505561 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd -i >> 119 -- >>>>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488059 be/4 root 0.00 B/s 2.02 K/s 0.00 % 0.03 % ceph-osd -i 108 >> -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph >>>>>>> 3488391 be/4 root 46.37 K/s 431.47 K/s 0.00 % 0.02 % ceph-osd -i >> 109 - >>>>> - >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3500639 be/4 root 0.00 B/s 221.78 K/s 0.00 % 0.02 % ceph-osd -i >> 117 -- >>>>>>> pid-file /var/run/ceph/osd.117.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488392 be/4 root 34.28 K/s 185.49 K/s 0.00 % 0.02 % ceph-osd -i >> 109 - >>>>> - >>>>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster >> ceph >>>>>>> 3488062 be/4 root 4.03 K/s 66.54 K/s 0.00 % 0.02 % ceph-osd -i >> 108 -- >>>>> pid- >>>>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster >>>>>>> ceph >>>>>>> >>>>>>> These are all 6TB seagates in single-disk RAID 0 on a PERC H730 >>>>>>> Mini controller. >>>>>>> >>>>>>> I did try removing the disk with 20k non-medium errors, but that >>>>>>> didn’t seem to help. >>>>>>> >>>>>>> Thanks for any insight! >>>>>>> >>>>>>> Cheers, >>>>>>> Lincoln Bryant >>>>>>> >>>>>>>> On Sep 9, 2015, at 1:09 PM, Lincoln Bryant >>>>>>>> <lincolnb@xxxxxxxxxxxx> >>>>> wrote: >>>>>>>> >>>>>>>> Hi Jan, >>>>>>>> >>>>>>>> I’ll take a look at all of those things and report back >>>>>>>> (hopefully >>>>>>>> :)) >>>>>>>> >>>>>>>> I did try setting all of my OSDs to writethrough instead of >>>>>>>> writeback on the >>>>>>> controller, which was significantly more consistent in performance >>>>>>> (from 1100MB/s down to 300MB/s, but still occasionally dropping to >>>>>>> 0MB/s). Still plenty of blocked ops. >>>>>>>> >>>>>>>> I was wondering if not-so-nicely failing OSD(s) might be the cause. >>>>>>>> My >>>>>>> controller (PERC H730 Mini) seems frustratingly terse with SMART >>>>>>> information, but at least one disk has a “Non-medium error count” >>>>>>> of over 20,000.. >>>>>>>> >>>>>>>> I’ll try disabling offloads as well. >>>>>>>> >>>>>>>> Thanks much for the suggestions! >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Lincoln >>>>>>>> >>>>>>>>> On Sep 9, 2015, at 3:59 AM, Jan Schermer <jan@xxxxxxxxxxx> >> wrote: >>>>>>>>> >>>>>>>>> Just to recapitulate - the nodes are doing "nothing" when it >>>>>>>>> drops to >>>>> zero? >>>>>>> Not flushing something to drives (iostat)? Not cleaning pagecache >>>>>>> (kswapd and similiar)? Not out of any type of memory (slab, >>>>>>> min_free_kbytes)? Not network link errors, no bad checksums (those >>>>>>> are >>>>> hard to spot, though)? >>>>>>>>> >>>>>>>>> Unless you find something I suggest you try disabling offloads >>>>>>>>> on the NICs >>>>>>> and see if the problem goes away. >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>>> On 08 Sep 2015, at 18:26, Lincoln Bryant >>>>>>>>>> <lincolnb@xxxxxxxxxxxx> >>>>> wrote: >>>>>>>>>> >>>>>>>>>> For whatever it’s worth, my problem has returned and is very >>>>>>>>>> similar to >>>>>>> yours. Still trying to figure out what’s going on over here. >>>>>>>>>> >>>>>>>>>> Performance is nice for a few seconds, then goes to 0. This is >>>>>>>>>> a similar setup to yours (12 OSDs per box, Scientific Linux 6, >>>>>>>>>> Ceph 0.94.3, etc) >>>>>>>>>> >>>>>>>>>> 384 16 29520 29504 307.287 1188 0.0492006 0.208259 >>>>>>>>>> 385 16 29813 29797 309.532 1172 0.0469708 0.206731 >>>>>>>>>> 386 16 30105 30089 311.756 1168 0.0375764 0.205189 >>>>>>>>>> 387 16 30401 30385 314.009 1184 0.036142 0.203791 >>>>>>>>>> 388 16 30695 30679 316.231 1176 0.0372316 0.202355 >>>>>>>>>> 389 16 30987 30971 318.42 1168 0.0660476 0.200962 >>>>>>>>>> 390 16 31282 31266 320.628 1180 0.0358611 0.199548 >>>>>>>>>> 391 16 31568 31552 322.734 1144 0.0405166 0.198132 >>>>>>>>>> 392 16 31857 31841 324.859 1156 0.0360826 0.196679 >>>>>>>>>> 393 16 32090 32074 326.404 932 0.0416869 0.19549 >>>>>>>>>> 394 16 32205 32189 326.743 460 0.0251877 0.194896 >>>>>>>>>> 395 16 32302 32286 326.897 388 0.0280574 0.194395 >>>>>>>>>> 396 16 32348 32332 326.537 184 0.0256821 0.194157 >>>>>>>>>> 397 16 32385 32369 326.087 148 0.0254342 0.193965 >>>>>>>>>> 398 16 32424 32408 325.659 156 0.0263006 0.193763 >>>>>>>>>> 399 16 32445 32429 325.054 84 0.0233839 0.193655 >>>>>>>>>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg >> lat: >>>>>>> 0.193655 >>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat >>>>>>>>>> 400 16 32445 32429 324.241 0 - 0.193655 >>>>>>>>>> 401 16 32445 32429 323.433 0 - 0.193655 >>>>>>>>>> 402 16 32445 32429 322.628 0 - 0.193655 >>>>>>>>>> 403 16 32445 32429 321.828 0 - 0.193655 >>>>>>>>>> 404 16 32445 32429 321.031 0 - 0.193655 >>>>>>>>>> 405 16 32445 32429 320.238 0 - 0.193655 >>>>>>>>>> 406 16 32445 32429 319.45 0 - 0.193655 >>>>>>>>>> 407 16 32445 32429 318.665 0 - 0.193655 >>>>>>>>>> >>>>>>>>>> needless to say, very strange. >>>>>>>>>> >>>>>>>>>> —Lincoln >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh >>>>>>> <vickey.singh22693@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> Adding ceph-users. >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh >>>>>>> <vickey.singh22693@xxxxxxxxx> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke >>>>>>> <ulembke@xxxxxxxxxxxx> wrote: >>>>>>>>>>> Hi Vickey, >>>>>>>>>>> Thanks for your time in replying to my problem. >>>>>>>>>>> >>>>>>>>>>> I had the same rados bench output after changing the >>>>>>>>>>> motherboard of >>>>>>> the monitor node with the lowest IP... >>>>>>>>>>> Due to the new mainboard, I assume the hw-clock was wrong >>>>>>>>>>> during >>>>>>> startup. Ceph health show no errors, but all VMs aren't able to do >>>>>>> IO (very high load on the VMs - but no traffic). >>>>>>>>>>> I stopped the mon, but this don't changed anything. I had to >>>>>>>>>>> restart all >>>>>>> other mons to get IO again. After that I started the first mon >>>>>>> also (with the right time now) and all worked fine again... >>>>>>>>>>> >>>>>>>>>>> Thanks i will try to restart all OSD / MONS and report back , >>>>>>>>>>> if it solves my problem >>>>>>>>>>> >>>>>>>>>>> Another posibility: >>>>>>>>>>> Do you use journal on SSDs? Perhaps the SSDs can't write to >>>>>>>>>>> garbage >>>>>>> collection? >>>>>>>>>>> >>>>>>>>>>> No i don't have journals on SSD , they are on the same OSD disk. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Udo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 07.09.2015 16:36, Vickey Singh wrote: >>>>>>>>>>>> Dear Experts >>>>>>>>>>>> >>>>>>>>>>>> Can someone please help me , why my cluster is not able write >>>>> data. >>>>>>>>>>>> >>>>>>>>>>>> See the below output cur MB/S is 0 and Avg MB/s is >> decreasing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Ceph Hammer 0.94.2 >>>>>>>>>>>> CentOS 6 (3.10.69-1) >>>>>>>>>>>> >>>>>>>>>>>> The Ceph status says OPS are blocked , i have tried checking >>>>>>>>>>>> , what all i know >>>>>>>>>>>> >>>>>>>>>>>> - System resources ( CPU , net, disk , memory ) -- All normal >>>>>>>>>>>> - 10G network for public and cluster network -- no >>>>>>>>>>>> saturation >>>>>>>>>>>> - Add disks are physically healthy >>>>>>>>>>>> - No messages in /var/log/messages OR dmesg >>>>>>>>>>>> - Tried restarting OSD which are blocking operation , but no >>>>>>>>>>>> luck >>>>>>>>>>>> - Tried writing through RBD and Rados bench , both are >>>>>>>>>>>> giving same problemm >>>>>>>>>>>> >>>>>>>>>>>> Please help me to fix this problem. >>>>>>>>>>>> >>>>>>>>>>>> # rados bench -p rbd 60 write Maintaining 16 concurrent >>>>>>>>>>>> writes of 4194304 bytes for up to 60 seconds or 0 objects >>>>>>>>>>>> Object prefix: >>>>> benchmark_data_stor1_1791844 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 0 0 0 0 0 0 - 0 >>>>>>>>>>>> 1 16 125 109 435.873 436 0.022076 0.0697864 >>>>>>>>>>>> 2 16 139 123 245.948 56 0.246578 0.0674407 >>>>>>>>>>>> 3 16 139 123 163.969 0 - 0.0674407 >>>>>>>>>>>> 4 16 139 123 122.978 0 - 0.0674407 >>>>>>>>>>>> 5 16 139 123 98.383 0 - 0.0674407 >>>>>>>>>>>> 6 16 139 123 81.9865 0 - 0.0674407 >>>>>>>>>>>> 7 16 139 123 70.2747 0 - 0.0674407 >>>>>>>>>>>> 8 16 139 123 61.4903 0 - 0.0674407 >>>>>>>>>>>> 9 16 139 123 54.6582 0 - 0.0674407 >>>>>>>>>>>> 10 16 139 123 49.1924 0 - 0.0674407 >>>>>>>>>>>> 11 16 139 123 44.7201 0 - 0.0674407 >>>>>>>>>>>> 12 16 139 123 40.9934 0 - 0.0674407 >>>>>>>>>>>> 13 16 139 123 37.8401 0 - 0.0674407 >>>>>>>>>>>> 14 16 139 123 35.1373 0 - 0.0674407 >>>>>>>>>>>> 15 16 139 123 32.7949 0 - 0.0674407 >>>>>>>>>>>> 16 16 139 123 30.7451 0 - 0.0674407 >>>>>>>>>>>> 17 16 139 123 28.9364 0 - 0.0674407 >>>>>>>>>>>> 18 16 139 123 27.3289 0 - 0.0674407 >>>>>>>>>>>> 19 16 139 123 25.8905 0 - 0.0674407 >>>>>>>>>>>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117 >>>>>>>>>>>> avg >>>>> lat: >>>>>>> 0.0674407 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 20 16 139 123 24.596 0 - 0.0674407 >>>>>>>>>>>> 21 16 139 123 23.4247 0 - 0.0674407 >>>>>>>>>>>> 22 16 139 123 22.36 0 - 0.0674407 >>>>>>>>>>>> 23 16 139 123 21.3878 0 - 0.0674407 >>>>>>>>>>>> 24 16 139 123 20.4966 0 - 0.0674407 >>>>>>>>>>>> 25 16 139 123 19.6768 0 - 0.0674407 >>>>>>>>>>>> 26 16 139 123 18.92 0 - 0.0674407 >>>>>>>>>>>> 27 16 139 123 18.2192 0 - 0.0674407 >>>>>>>>>>>> 28 16 139 123 17.5686 0 - 0.0674407 >>>>>>>>>>>> 29 16 139 123 16.9628 0 - 0.0674407 >>>>>>>>>>>> 30 16 139 123 16.3973 0 - 0.0674407 >>>>>>>>>>>> 31 16 139 123 15.8684 0 - 0.0674407 >>>>>>>>>>>> 32 16 139 123 15.3725 0 - 0.0674407 >>>>>>>>>>>> 33 16 139 123 14.9067 0 - 0.0674407 >>>>>>>>>>>> 34 16 139 123 14.4683 0 - 0.0674407 >>>>>>>>>>>> 35 16 139 123 14.0549 0 - 0.0674407 >>>>>>>>>>>> 36 16 139 123 13.6645 0 - 0.0674407 >>>>>>>>>>>> 37 16 139 123 13.2952 0 - 0.0674407 >>>>>>>>>>>> 38 16 139 123 12.9453 0 - 0.0674407 >>>>>>>>>>>> 39 16 139 123 12.6134 0 - 0.0674407 >>>>>>>>>>>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117 >>>>>>>>>>>> avg >>>>> lat: >>>>>>> 0.0674407 >>>>>>>>>>>> sec Cur ops started finished avg MB/s cur MB/s last lat avg >> lat >>>>>>>>>>>> 40 16 139 123 12.2981 0 - 0.0674407 >>>>>>>>>>>> 41 16 139 123 11.9981 0 - 0.0674407 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8 >>>>>>>>>>>> health HEALTH_WARN >>>>>>>>>>>> 1 requests are blocked > 32 sec monmap e3: 3 mons at >>>>>>>>>>>> {stor0111=10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,st >>>>>>>>>>>> or0 >>>>>>>>>>>> 11 >>>>>>>>>>>> 5=10.100.1.115:6789/0} >>>>>>>>>>>> election epoch 32, quorum 0,1,2 >>>>>>>>>>>> stor0111,stor0113,stor0115 osdmap e19536: 50 osds: 50 up, 50 >>>>>>>>>>>> in pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183 >> kobjects >>>>>>>>>>>> 91513 GB used, 47642 GB / 135 TB avail >>>>>>>>>>>> 2752 active+clean >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Tried using RBD >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct >>>>>>>>>>>> 10000+0 records in >>>>>>>>>>>> 10000+0 records out >>>>>>>>>>>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct >>>>>>>>>>>> 100+0 records in >>>>>>>>>>>> 100+0 records out >>>>>>>>>>>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s >>>>>>>>>>>> >>>>>>>>>>>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct >>>>>>>>>>>> 1+0 records in >>>>>>>>>>>> 1+0 records out >>>>>>>>>>>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s ]# >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list >>>>>>>>>>>> >>>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list >>>>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> ceph-users mailing list >>>>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list >>>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com