Re: Ceph cluster NO read / write performance :: Ops are blocked

Nick Fisk <nick@xxxxxxxxxx> · Thu, 17 Sep 2015 18:53:17 +0100

It's probably helped but I fear that your overall design is not going to work well for you. Cache Tier + Base tier + journals on the same disks is going to really hurt.

The problem when using cache tiering (especially with EC pools in future releases) is that to modify a block that isn't in the cache tier you have to promote it 1st, which often kicks another block out the cache.

So worse case you could have for a single write

R from EC -> W to CT + jrnl W -> W actual data to CT + jrnl W -> R from CT -> W to EC + jrnl W

Plus any metadata updates. Either way you looking at probably somewhere near a 10x write amplification for 4MB writes, which will quickly overload your disks leading to very slow performance. Smaller IO's would still cause 4MB blocks to be shifted between pools. What makes it worse is that these promotions/evictions tend to happen to hot PG's and not spread round the whole cluster meaning that a single hot OSD can hold up writes across the whole pool.

I know it's not what you want to hear, but I can't think of anything you can do to help in this instance unless you are willing to get some SSD journals and maybe move the Cache pool on to separate disks or SSD's. Basically try and limit the amount of random IO the disks have to do.

Of course please do try and find a time to stop all IO and then run the test on the test 3 way pool, to rule out any hardware/OS issues. 

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Lincoln Bryant
> Sent: 17 September 2015 18:36
> To: Nick Fisk <nick@xxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Ceph cluster NO read / write performance :: Ops
> are blocked
> 
> Just a small update — the blocked ops did disappear after doubling the
> target_max_bytes. We’ll see if it sticks! I’ve thought I’ve solved this blocked
> ops problem about 10 times now :)
> 
> Assuming this is the issue, is there any workaround for this problem (or is it
> working as intended)? (Should I set up a cron to run cache-try-flush-evict-all
> every night? :))
> 
> Another curious thing is that a rolling restart of all OSDs also seems to fix the
> problem — for a time. I’m not sure how that would fit in if this is the
> problem.
> 
> —Lincoln
> 
> > On Sep 17, 2015, at 12:07 PM, Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
> wrote:
> >
> > We have CephFS utilizing a cache tier + EC backend. The cache tier and ec
> pool sit on the same spinners — no SSDs. Our cache tier has a
> target_max_bytes of 5TB and the total storage is about 1PB.
> >
> > I do have a separate test pool with 3x replication and no cache tier, and I
> still see significant performance drops and blocked ops with no/minimal
> client I/O from CephFS. Right now I have 530 blocked ops with 20MB/s of
> client write I/O and no active scrubs. The rados bench on my test pool looks
> like this:
> >
> >  sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> >    0       0         0         0         0         0         -         0
> >    1      31        94        63   251.934       252   0.31017  0.217719
> >    2      31       103        72   143.969        36  0.978544  0.260631
> >    3      31       103        72   95.9815         0         -  0.260631
> >    4      31       111        80   79.9856        16   2.29218  0.476458
> >    5      31       112        81   64.7886         4    2.5559   0.50213
> >    6      31       112        81   53.9905         0         -   0.50213
> >    7      31       115        84   47.9917         6   3.71826  0.615882
> >    8      31       115        84   41.9928         0         -  0.615882
> >    9      31       115        84    37.327         0         -  0.615882
> >   10      31       117        86   34.3942   2.66667   6.73678  0.794532
> >
> > I’m really leaning more toward it being a weird controller/disk problem.
> >
> > As a test, I suppose I could double the target_max_bytes, just so the cache
> tier stops evicting while client I/O is writing?
> >
> > —Lincoln
> >
> >> On Sep 17, 2015, at 11:59 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >>
> >> Ah right....this is where it gets interesting.
> >>
> >> You are probably hitting a cache full on a PG somewhere which is either
> making everything wait until it flushes or something like that.
> >>
> >> What cache settings have you got set?
> >>
> >> I assume you have SSD's for the cache tier? Can you share the size of the
> pool.
> >>
> >> If possible could you also create a non tiered test pool and do some
> benchmarks on that to rule out any issue with the hardware and OSD's.
> >>
> >>> -----Original Message-----
> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >>> Behalf Of Lincoln Bryant
> >>> Sent: 17 September 2015 17:54
> >>> To: Nick Fisk <nick@xxxxxxxxxx>
> >>> Cc: ceph-users@xxxxxxxxxxxxxx
> >>> Subject: Re:  Ceph cluster NO read / write performance
> >>> :: Ops are blocked
> >>>
> >>> Hi Nick,
> >>>
> >>> Thanks for responding. Yes, I am.
> >>>
> >>> —Lincoln
> >>>
> >>>> On Sep 17, 2015, at 11:53 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> >>>>
> >>>> You are getting a fair amount of reads on the disks whilst doing
> >>>> these
> >>> writes. You're not using cache tiering are you?
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
> >>>>> Behalf Of Lincoln Bryant
> >>>>> Sent: 17 September 2015 17:42
> >>>>> To: ceph-users@xxxxxxxxxxxxxx
> >>>>> Subject: Re:  Ceph cluster NO read / write performance ::
> >>>>> Ops are blocked
> >>>>>
> >>>>> Hello again,
> >>>>>
> >>>>> Well, I disabled offloads on the NIC -- didn’t work for me. I also
> >>>>> tried setting net.ipv4.tcp_moderate_rcvbuf = 0 as suggested
> >>>>> elsewhere in the thread to no avail.
> >>>>>
> >>>>> Today I was watching iostat on an OSD box ('iostat -xm 5') when
> >>>>> the cluster got into “slow” state:
> >>>>>
> >>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-
> >>> sz
> >>>>> await  svctm  %util
> >>>>> sdb               0.00    13.57   84.23  167.47     0.45     2.78    26.26     2.06    8.18
> >>> 3.85
> >>>>> 96.93
> >>>>> sdc               0.00    46.71    5.59  289.22     0.03     2.54    17.85     3.18   10.77
> >>> 0.97
> >>>>> 28.72
> >>>>> sdd               0.00    16.57   45.11   91.62     0.25     0.55    12.01     0.75    5.51
> >>> 2.45
> >>>>> 33.47
> >>>>> sde               0.00    13.57    6.99  143.31     0.03     2.53    34.97     1.99   13.27
> >>> 2.12
> >>>>> 31.86
> >>>>> sdf               0.00    18.76    4.99  158.48     0.10     1.09    14.88     1.26    7.69
> 1.24
> >>>>> 20.26
> >>>>> sdg               0.00    25.55   81.64  237.52     0.44     2.89    21.36     4.14   12.99
> >>> 2.58
> >>>>> 82.22
> >>>>> sdh               0.00    89.42   16.17  492.42     0.09     3.81    15.69    17.12
> 33.66
> >>> 0.73
> >>>>> 36.95
> >>>>> sdi               0.00    20.16   17.76  189.62     0.10     1.67    17.46     3.45   16.63
> >>> 1.57
> >>>>> 32.55
> >>>>> sdj               0.00    31.54    0.00  185.23     0.00     1.91    21.15     3.33   18.00
> >>> 0.03
> >>>>> 0.62
> >>>>> sdk               0.00    26.15    2.40  133.33     0.01     0.84    12.79     1.07    7.87
> >>> 0.85
> >>>>> 11.58
> >>>>> sdl               0.00    25.55    9.38  123.95     0.05     1.15    18.44     0.50    3.74
> 1.58
> >>>>> 21.10
> >>>>> sdm               0.00     6.39   92.61   47.11     0.47     0.26    10.65     1.27    9.07
> >>> 6.92
> >>>>> 96.73
> >>>>>
> >>>>> The %util is rather high on some disks, but I’m not an expert at
> >>>>> looking at iostat so I’m not sure how worrisome this is. Does
> >>>>> anything here stand out to anyone?
> >>>>>
> >>>>> At the time of that iostat, Ceph was reporting a lot of blocked
> >>>>> ops on the OSD associated with sde (as well as about 30 other
> >>>>> OSDs), but it doesn’t look all that busy. Some simple ‘dd’ tests
> >>>>> seem to indicate the
> >>> disk is fine.
> >>>>>
> >>>>> Similarly, iotop seems OK on this host:
> >>>>>
> >>>>> TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
> >>>>> 472477 be/4 root        0.00 B/s    5.59 M/s  0.00 %  0.57 % ceph-osd -i 111
> --
> >>> pid-
> >>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
> >>>>> 470621 be/4 root        0.00 B/s   10.09 M/s  0.00 %  0.40 % ceph-osd -i
> 111 --
> >>> pid-
> >>>>> file /var/run/ceph/osd.111.pid -c /etc/ceph/ceph.conf --cluster ceph
> >>>>> 3495447 be/4 root        0.00 B/s  272.19 K/s  0.00 %  0.36 % ceph-osd -i
> 114 --
> >>>>> pid-file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3488389 be/4 root	 0.00 B/s  596.80 K/s  0.00 %  0.16 % ceph-osd -
> i 109 --
> >>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3488060 be/4 root        0.00 B/s  600.83 K/s  0.00 %  0.15 % ceph-osd -i
> 108 --
> >>>>> pid-file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3505573 be/4 root        0.00 B/s  528.25 K/s  0.00 %  0.10 % ceph-osd -i
> 119 --
> >>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3495434 be/4 root        0.00 B/s    2.02 K/s  0.00 %  0.10 % ceph-osd -i 114
> --
> >>> pid-
> >>>>> file /var/run/ceph/osd.114.pid -c /etc/ceph/ceph.conf --cluster ceph
> >>>>> 3502327 be/4 root        0.00 B/s  506.07 K/s  0.00 %  0.09 % ceph-osd -i
> 118 --
> >>>>> pid-file /var/run/ceph/osd.118.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3489100 be/4 root        0.00 B/s  106.86 K/s  0.00 %  0.09 % ceph-osd -i
> 110 --
> >>>>> pid-file /var/run/ceph/osd.110.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3496631 be/4 root        0.00 B/s  229.85 K/s  0.00 %  0.05 % ceph-osd -i
> 115 --
> >>>>> pid-file /var/run/ceph/osd.115.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3505561 be/4 root	 0.00 B/s    2.02 K/s  0.00 %  0.03 % ceph-osd -i
> 119 --
> >>>>> pid-file /var/run/ceph/osd.119.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3488059 be/4 root        0.00 B/s    2.02 K/s  0.00 %  0.03 % ceph-osd -i 108
> --
> >>> pid-
> >>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster ceph
> >>>>> 3488391 be/4 root       46.37 K/s  431.47 K/s  0.00 %  0.02 % ceph-osd -i
> 109 -
> >>> -
> >>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3500639 be/4 root        0.00 B/s  221.78 K/s  0.00 %  0.02 % ceph-osd -i
> 117 --
> >>>>> pid-file /var/run/ceph/osd.117.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3488392 be/4 root       34.28 K/s  185.49 K/s  0.00 %  0.02 % ceph-osd -i
> 109 -
> >>> -
> >>>>> pid-file /var/run/ceph/osd.109.pid -c /etc/ceph/ceph.conf --cluster
> ceph
> >>>>> 3488062 be/4 root        4.03 K/s   66.54 K/s  0.00 %  0.02 % ceph-osd -i
> 108 --
> >>> pid-
> >>>>> file /var/run/ceph/osd.108.pid -c /etc/ceph/ceph.conf --cluster
> >>>>> ceph
> >>>>>
> >>>>> These are all 6TB seagates in single-disk RAID 0 on a PERC H730
> >>>>> Mini controller.
> >>>>>
> >>>>> I did try removing the disk with 20k non-medium errors, but that
> >>>>> didn’t seem to help.
> >>>>>
> >>>>> Thanks for any insight!
> >>>>>
> >>>>> Cheers,
> >>>>> Lincoln Bryant
> >>>>>
> >>>>>> On Sep 9, 2015, at 1:09 PM, Lincoln Bryant
> >>>>>> <lincolnb@xxxxxxxxxxxx>
> >>> wrote:
> >>>>>>
> >>>>>> Hi Jan,
> >>>>>>
> >>>>>> I’ll take a look at all of those things and report back
> >>>>>> (hopefully
> >>>>>> :))
> >>>>>>
> >>>>>> I did try setting all of my OSDs to writethrough instead of
> >>>>>> writeback on the
> >>>>> controller, which was significantly more consistent in performance
> >>>>> (from 1100MB/s down to 300MB/s, but still occasionally dropping to
> >>>>> 0MB/s). Still plenty of blocked ops.
> >>>>>>
> >>>>>> I was wondering if not-so-nicely failing OSD(s) might be the cause.
> >>>>>> My
> >>>>> controller (PERC H730 Mini) seems frustratingly terse with SMART
> >>>>> information, but at least one disk has a “Non-medium error count”
> >>>>> of over 20,000..
> >>>>>>
> >>>>>> I’ll try disabling offloads as well.
> >>>>>>
> >>>>>> Thanks much for the suggestions!
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Lincoln
> >>>>>>
> >>>>>>> On Sep 9, 2015, at 3:59 AM, Jan Schermer <jan@xxxxxxxxxxx>
> wrote:
> >>>>>>>
> >>>>>>> Just to recapitulate - the nodes are doing "nothing" when it
> >>>>>>> drops to
> >>> zero?
> >>>>> Not flushing something to drives (iostat)? Not cleaning pagecache
> >>>>> (kswapd and similiar)? Not out of any type of memory (slab,
> >>>>> min_free_kbytes)? Not network link errors, no bad checksums (those
> >>>>> are
> >>> hard to spot, though)?
> >>>>>>>
> >>>>>>> Unless you find something I suggest you try disabling offloads
> >>>>>>> on the NICs
> >>>>> and see if the problem goes away.
> >>>>>>>
> >>>>>>> Jan
> >>>>>>>
> >>>>>>>> On 08 Sep 2015, at 18:26, Lincoln Bryant
> >>>>>>>> <lincolnb@xxxxxxxxxxxx>
> >>> wrote:
> >>>>>>>>
> >>>>>>>> For whatever it’s worth, my problem has returned and is very
> >>>>>>>> similar to
> >>>>> yours. Still trying to figure out what’s going on over here.
> >>>>>>>>
> >>>>>>>> Performance is nice for a few seconds, then goes to 0. This is
> >>>>>>>> a similar setup to yours (12 OSDs per box, Scientific Linux 6,
> >>>>>>>> Ceph 0.94.3, etc)
> >>>>>>>>
> >>>>>>>> 384      16     29520     29504   307.287      1188 0.0492006  0.208259
> >>>>>>>> 385      16     29813     29797   309.532      1172 0.0469708  0.206731
> >>>>>>>> 386      16     30105     30089   311.756      1168 0.0375764  0.205189
> >>>>>>>> 387      16     30401     30385   314.009      1184  0.036142  0.203791
> >>>>>>>> 388      16     30695     30679   316.231      1176 0.0372316  0.202355
> >>>>>>>> 389      16     30987     30971    318.42      1168 0.0660476  0.200962
> >>>>>>>> 390      16     31282     31266   320.628      1180 0.0358611  0.199548
> >>>>>>>> 391      16     31568     31552   322.734      1144 0.0405166  0.198132
> >>>>>>>> 392      16     31857     31841   324.859      1156 0.0360826  0.196679
> >>>>>>>> 393      16     32090     32074   326.404       932 0.0416869   0.19549
> >>>>>>>> 394      16     32205     32189   326.743       460 0.0251877  0.194896
> >>>>>>>> 395      16     32302     32286   326.897       388 0.0280574  0.194395
> >>>>>>>> 396      16     32348     32332   326.537       184 0.0256821  0.194157
> >>>>>>>> 397      16     32385     32369   326.087       148 0.0254342  0.193965
> >>>>>>>> 398      16     32424     32408   325.659       156 0.0263006  0.193763
> >>>>>>>> 399      16     32445     32429   325.054        84 0.0233839  0.193655
> >>>>>>>> 2015-09-08 11:22:31.940164 min lat: 0.0165045 max lat: 67.6184 avg
> lat:
> >>>>> 0.193655
> >>>>>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
> >>>>>>>> 400      16     32445     32429   324.241         0         -  0.193655
> >>>>>>>> 401      16     32445     32429   323.433         0         -  0.193655
> >>>>>>>> 402      16     32445     32429   322.628         0         -  0.193655
> >>>>>>>> 403      16     32445     32429   321.828         0         -  0.193655
> >>>>>>>> 404      16     32445     32429   321.031         0         -  0.193655
> >>>>>>>> 405      16     32445     32429   320.238         0         -  0.193655
> >>>>>>>> 406      16     32445     32429    319.45         0         -  0.193655
> >>>>>>>> 407      16     32445     32429   318.665         0         -  0.193655
> >>>>>>>>
> >>>>>>>> needless to say, very strange.
> >>>>>>>>
> >>>>>>>> —Lincoln
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Sep 7, 2015, at 3:35 PM, Vickey Singh
> >>>>> <vickey.singh22693@xxxxxxxxx> wrote:
> >>>>>>>>>
> >>>>>>>>> Adding ceph-users.
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 7, 2015 at 11:31 PM, Vickey Singh
> >>>>> <vickey.singh22693@xxxxxxxxx> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 7, 2015 at 10:04 PM, Udo Lembke
> >>>>> <ulembke@xxxxxxxxxxxx> wrote:
> >>>>>>>>> Hi Vickey,
> >>>>>>>>> Thanks for your time in replying to my problem.
> >>>>>>>>>
> >>>>>>>>> I had the same rados bench output after changing the
> >>>>>>>>> motherboard of
> >>>>> the monitor node with the lowest IP...
> >>>>>>>>> Due to the new mainboard, I assume the hw-clock was wrong
> >>>>>>>>> during
> >>>>> startup. Ceph health show no errors, but all VMs aren't able to do
> >>>>> IO (very high load on the VMs - but no traffic).
> >>>>>>>>> I stopped the mon, but this don't changed anything. I had to
> >>>>>>>>> restart all
> >>>>> other mons to get IO again. After that I started the first mon
> >>>>> also (with the right time now) and all worked fine again...
> >>>>>>>>>
> >>>>>>>>> Thanks i will try to restart all OSD / MONS and report back ,
> >>>>>>>>> if it solves my problem
> >>>>>>>>>
> >>>>>>>>> Another posibility:
> >>>>>>>>> Do you use journal on SSDs? Perhaps the SSDs can't write to
> >>>>>>>>> garbage
> >>>>> collection?
> >>>>>>>>>
> >>>>>>>>> No i don't have journals on SSD , they are on the same OSD disk.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Udo
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 07.09.2015 16:36, Vickey Singh wrote:
> >>>>>>>>>> Dear Experts
> >>>>>>>>>>
> >>>>>>>>>> Can someone please help me , why my cluster is not able write
> >>> data.
> >>>>>>>>>>
> >>>>>>>>>> See the below output  cur MB/S  is 0  and Avg MB/s is
> decreasing.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Ceph Hammer  0.94.2
> >>>>>>>>>> CentOS 6 (3.10.69-1)
> >>>>>>>>>>
> >>>>>>>>>> The Ceph status says OPS are blocked , i have tried checking
> >>>>>>>>>> , what all i know
> >>>>>>>>>>
> >>>>>>>>>> - System resources ( CPU , net, disk , memory )    -- All normal
> >>>>>>>>>> - 10G network for public and cluster network  -- no
> >>>>>>>>>> saturation
> >>>>>>>>>> - Add disks are physically healthy
> >>>>>>>>>> - No messages in /var/log/messages OR dmesg
> >>>>>>>>>> - Tried restarting OSD which are blocking operation , but no
> >>>>>>>>>> luck
> >>>>>>>>>> - Tried writing through RBD  and Rados bench , both are
> >>>>>>>>>> giving same problemm
> >>>>>>>>>>
> >>>>>>>>>> Please help me to fix this problem.
> >>>>>>>>>>
> >>>>>>>>>> #  rados bench -p rbd 60 write Maintaining 16 concurrent
> >>>>>>>>>> writes of 4194304 bytes for up to 60 seconds or 0 objects
> >>>>>>>>>> Object prefix:
> >>> benchmark_data_stor1_1791844
> >>>>>>>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
> lat
> >>>>>>>>>> 0       0         0         0         0         0         -         0
> >>>>>>>>>> 1      16       125       109   435.873       436  0.022076 0.0697864
> >>>>>>>>>> 2      16       139       123   245.948        56  0.246578 0.0674407
> >>>>>>>>>> 3      16       139       123   163.969         0         - 0.0674407
> >>>>>>>>>> 4      16       139       123   122.978         0         - 0.0674407
> >>>>>>>>>> 5      16       139       123    98.383         0         - 0.0674407
> >>>>>>>>>> 6      16       139       123   81.9865         0         - 0.0674407
> >>>>>>>>>> 7      16       139       123   70.2747         0         - 0.0674407
> >>>>>>>>>> 8      16       139       123   61.4903         0         - 0.0674407
> >>>>>>>>>> 9      16       139       123   54.6582         0         - 0.0674407
> >>>>>>>>>> 10      16       139       123   49.1924         0         - 0.0674407
> >>>>>>>>>> 11      16       139       123   44.7201         0         - 0.0674407
> >>>>>>>>>> 12      16       139       123   40.9934         0         - 0.0674407
> >>>>>>>>>> 13      16       139       123   37.8401         0         - 0.0674407
> >>>>>>>>>> 14      16       139       123   35.1373         0         - 0.0674407
> >>>>>>>>>> 15      16       139       123   32.7949         0         - 0.0674407
> >>>>>>>>>> 16      16       139       123   30.7451         0         - 0.0674407
> >>>>>>>>>> 17      16       139       123   28.9364         0         - 0.0674407
> >>>>>>>>>> 18      16       139       123   27.3289         0         - 0.0674407
> >>>>>>>>>> 19      16       139       123   25.8905         0         - 0.0674407
> >>>>>>>>>> 2015-09-07 15:54:52.694071min lat: 0.022076 max lat: 0.46117
> >>>>>>>>>> avg
> >>> lat:
> >>>>> 0.0674407
> >>>>>>>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
> lat
> >>>>>>>>>> 20      16       139       123    24.596         0         - 0.0674407
> >>>>>>>>>> 21      16       139       123   23.4247         0         - 0.0674407
> >>>>>>>>>> 22      16       139       123     22.36         0         - 0.0674407
> >>>>>>>>>> 23      16       139       123   21.3878         0         - 0.0674407
> >>>>>>>>>> 24      16       139       123   20.4966         0         - 0.0674407
> >>>>>>>>>> 25      16       139       123   19.6768         0         - 0.0674407
> >>>>>>>>>> 26      16       139       123     18.92         0         - 0.0674407
> >>>>>>>>>> 27      16       139       123   18.2192         0         - 0.0674407
> >>>>>>>>>> 28      16       139       123   17.5686         0         - 0.0674407
> >>>>>>>>>> 29      16       139       123   16.9628         0         - 0.0674407
> >>>>>>>>>> 30      16       139       123   16.3973         0         - 0.0674407
> >>>>>>>>>> 31      16       139       123   15.8684         0         - 0.0674407
> >>>>>>>>>> 32      16       139       123   15.3725         0         - 0.0674407
> >>>>>>>>>> 33      16       139       123   14.9067         0         - 0.0674407
> >>>>>>>>>> 34      16       139       123   14.4683         0         - 0.0674407
> >>>>>>>>>> 35      16       139       123   14.0549         0         - 0.0674407
> >>>>>>>>>> 36      16       139       123   13.6645         0         - 0.0674407
> >>>>>>>>>> 37      16       139       123   13.2952         0         - 0.0674407
> >>>>>>>>>> 38      16       139       123   12.9453         0         - 0.0674407
> >>>>>>>>>> 39      16       139       123   12.6134         0         - 0.0674407
> >>>>>>>>>> 2015-09-07 15:55:12.697124min lat: 0.022076 max lat: 0.46117
> >>>>>>>>>> avg
> >>> lat:
> >>>>> 0.0674407
> >>>>>>>>>> sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg
> lat
> >>>>>>>>>> 40      16       139       123   12.2981         0         - 0.0674407
> >>>>>>>>>> 41      16       139       123   11.9981         0         - 0.0674407
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> cluster 86edf8b8-b353-49f1-ab0a-a4827a9ea5e8
> >>>>>>>>>> health HEALTH_WARN
> >>>>>>>>>>       1 requests are blocked > 32 sec  monmap e3: 3 mons at
> >>>>>>>>>> {stor0111=10.100.1.111:6789/0,stor0113=10.100.1.113:6789/0,st
> >>>>>>>>>> or0
> >>>>>>>>>> 11
> >>>>>>>>>> 5=10.100.1.115:6789/0}
> >>>>>>>>>>       election epoch 32, quorum 0,1,2
> >>>>>>>>>> stor0111,stor0113,stor0115  osdmap e19536: 50 osds: 50 up, 50
> >>>>>>>>>> in pgmap v928610: 2752 pgs, 9 pools, 30476 GB data, 4183
> kobjects
> >>>>>>>>>>       91513 GB used, 47642 GB / 135 TB avail
> >>>>>>>>>>           2752 active+clean
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Tried using RBD
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> # dd if=/dev/zero of=file1 bs=4K count=10000 oflag=direct
> >>>>>>>>>> 10000+0 records in
> >>>>>>>>>> 10000+0 records out
> >>>>>>>>>> 40960000 bytes (41 MB) copied, 24.5529 s, 1.7 MB/s
> >>>>>>>>>>
> >>>>>>>>>> # dd if=/dev/zero of=file1 bs=1M count=100 oflag=direct
> >>>>>>>>>> 100+0 records in
> >>>>>>>>>> 100+0 records out
> >>>>>>>>>> 104857600 bytes (105 MB) copied, 1.05602 s, 9.3 MB/s
> >>>>>>>>>>
> >>>>>>>>>> # dd if=/dev/zero of=file1 bs=1G count=1 oflag=direct
> >>>>>>>>>> 1+0 records in
> >>>>>>>>>> 1+0 records out
> >>>>>>>>>> 1073741824 bytes (1.1 GB) copied, 293.551 s, 3.7 MB/s ]#
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> ceph-users mailing list
> >>>>>>>>>>
> >>>>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> ceph-users mailing list
> >>>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> ceph-users mailing list
> >>>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list
> >>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>>
> >>>>> _______________________________________________
> >>>>> ceph-users mailing list
> >>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@xxxxxxxxxxxxxx
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com