Re: reads while 100% write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Correct, the change for the default RBD features actually merged on March 1 as well (a7470c8), albeit a few hours after the commit you last tested against (c1e41af).  You can revert to pre-Jewel RBD features on an existing image by running the following:

# rbd feature disable <image name> exclusive-lock,object-map,fast-diff,deep-flatten

Hopefully the new PR to add the WILLNEED fadvise flag helps. 

-- 

Jason Dillaman 


----- Original Message -----
> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
> To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx
> Sent: Wednesday, March 30, 2016 4:39:09 PM
> Subject: Re: reads while 100% write
> 
> I use 64K.
> Explicit settings are identical for both revisions.
> 
> Looks like the following change slows down performance 10 times:
> 
> -OPTION(rbd_default_features, OPT_INT, 3) // only applies to format 2
> images
> -                                        // +1 for layering, +2 for
> stripingv2,
> -                                        // +4 for exclusive lock, +8 for
> object map
> +OPTION(rbd_default_features, OPT_INT, 61)   // only applies to format 2
> images
> +                                           // +1 for layering, +2 for
> stripingv2,
> +                                           // +4 for exclusive lock, +8
> for object map
> +                                           // +16 for fast-diff, +32 for
> deep-flatten,
> +                                           // +64 for journaling
> 
> 
> 
> On 3/30/16, 12:10 PM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote:
> 
> >Are you using the RBD default of 4MB object sizes or are you using
> >something much smaller like 64KB?  An object map of that size should be
> >tracking up to 24,576,000 objects.  When you ran your test before, did
> >you have the RBD object map disabled?  This definitely seems to be a use
> >case where the lack of a cache in front of BlueStore is hurting small IO.
> >
> >--
> >
> >Jason Dillaman
> >
> >
> >----- Original Message -----
> >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
> >> To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
> >> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx
> >> Sent: Wednesday, March 30, 2016 3:00:47 PM
> >> Subject: Re: reads while 100% write
> >>
> >> 1.5T in that run.
> >> With 150G behavior is the same. Except it says "_do_read 0~18 size
> >>615030”
> >> instead of 6M.
> >>
> >> Also when random 4k write starts there are more reads then writes:
> >>
> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> >>avgrq-sz
> >> avgqu-sz   await r_await w_await  svctm  %util
> >>
> >> sdd               0.00  1887.00    0.00  344.00     0.00  8924.00
> >>51.88
> >>     0.36    1.06    0.00    1.06   0.91  31.20
> >> sde              30.00     0.00   30.00  957.00 18120.00  3828.00
> >>44.47
> >>     0.25    0.26    3.87    0.14   0.17  16.40
> >>
> >> Logs: http://pastebin.com/gGzfR5ez
> >>
> >>
> >> On 3/30/16, 11:37 AM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote:
> >>
> >> >How large is your RBD image?  100 terabytes?
> >> >
> >> >--
> >> >
> >> >Jason Dillaman
> >> >
> >> >
> >> >----- Original Message -----
> >> >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
> >> >> To: "Sage Weil" <sage@xxxxxxxxxxxx>
> >> >> Cc: ceph-devel@xxxxxxxxxxxxxxx
> >> >> Sent: Wednesday, March 30, 2016 2:14:12 PM
> >> >> Subject: Re: reads while 100% write
> >> >>
> >> >> These are suspicious lines:
> >> >>
> >> >> 2016-03-30 10:54:23.142205 7f2e933ff700 10 bluestore(src/dev/osd0)
> >>read
> >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head#
> >>6144018~6012 =
> >> >> 6012
> >> >> 2016-03-30 10:54:23.142252 7f2e933ff700 15 bluestore(src/dev/osd0)
> >>read
> >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
> >> >> 2016-03-30 10:54:23.142260 7f2e933ff700 20 bluestore(src/dev/osd0)
> >> >> _do_read 8210~4096 size 6150030
> >> >> 2016-03-30 10:54:23.142267 7f2e933ff700  5 bdev(src/dev/osd0/block)
> >>read
> >> >> 8003854336~8192
> >> >> 2016-03-30 10:54:23.142609 7f2e933ff700 10 bluestore(src/dev/osd0)
> >>read
> >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 =
> >> >>4096
> >> >> 2016-03-30 10:54:23.142882 7f2e933ff700 15 bluestore(src/dev/osd0)
> >> >>_write
> >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
> >> >> 2016-03-30 10:54:23.142888 7f2e933ff700 20 bluestore(src/dev/osd0)
> >> >> _do_write #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 -
> >> >>have
> >> >> 6150030 bytes in 1 extents
> >> >>
> >> >> More logs here: http://pastebin.com/74WLzFYw
> >> >>
> >> >>
> >> >>
> >> >> On 3/30/16, 4:19 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote:
> >> >>
> >> >> >On Wed, 30 Mar 2016, Evgeniy Firsov wrote:
> >> >> >> After pulling master branch on Friday I start seeing odd fio
> >> >>behavior, I
> >> >> >> see a lot of reads while writing and very low performance no
> >>matter
> >> >> >> whether it read or write workload.
> >> >> >>
> >> >> >> Output from sequential 1M write:
> >> >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> >> >> >>avgrq-sz
> >> >> >> avgqu-sz   await r_await w_await  svctm  %util
> >> >> >>
> >> >> >> sdd               0.00   409.00    0.00  364.00     0.00  3092.00
> >> >> >>16.99
> >> >> >>     0.28    0.78    0.00    0.78   0.76  27.60
> >> >> >> sde               0.00   242.00  365.00  363.00  2436.00  9680.00
> >> >> >>33.29
> >> >> >>     0.18    0.24    0.42    0.07   0.23  16.80
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> block.db -> /dev/sdd
> >> >> >> block -> /dev/sde
> >> >> >>
> >> >> >> health HEALTH_OK
> >> >> >> monmap e1: 1 mons at {a=127.0.0.1:6789/0}
> >> >> >>        election epoch 3, quorum 0 a
> >> >> >> osdmap e7: 1 osds: 1 up, 1 in
> >> >> >>        flags sortbitwise
> >> >> >> pgmap v24: 64 pgs, 1 pools, 577 MB data, 9152 objects
> >> >> >>        8210 MB used, 178 GB / 186 GB avail
> >> >> >>              64 active+clean
> >> >> >> client io 1550 kB/s rd, 9559 kB/s wr, 645 op/s rd, 387 op/s wr
> >> >> >>
> >> >> >>
> >> >> >> While on earlier revision(c1e41af) everything looks as expected:
> >> >> >>
> >> >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> >> >> >>avgrq-sz
> >> >> >> avgqu-sz   await r_await w_await  svctm  %util
> >> >> >> sdd               0.00  4910.00    0.00  680.00     0.00 22416.00
> >> >> >>65.93
> >> >> >>     1.05    1.55    0.00    1.55   1.18  80.00
> >> >> >> sde               0.00     0.00    0.00 3418.00     0.00 217612.00
> >> >> >> 127.33    63.78   18.18    0.00   18.18   0.25  86.40
> >> >> >>
> >> >> >> Other observation, may be related to the issue, is that CPU load
> >>is
> >> >> >> imbalanced. Single ³tp_osd_tp² thread is 100% busy, while the
> >>rest is
> >> >> >>idle.
> >> >> >> Looks like all load goes to single thread pool shard, earlier CPU
> >>was
> >> >> >>well
> >> >> >> balanced.
> >> >> >
> >> >> >Hmm.  Can you capture a log with debug bluestore = 20 and debug
> >>bdev =
> >> >>20?
> >> >> >
> >> >> >Thanks!
> >> >> >sage
> >> >> >
> >> >> >
> >> >> >>
> >> >> >>
> >> >> >> ‹
> >> >> >> Evgeniy
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> PLEASE NOTE: The information contained in this electronic mail
> >> >>message
> >> >> >>is intended only for the use of the designated recipient(s) named
> >> >>above.
> >> >> >>If the reader of this message is not the intended recipient, you
> >>are
> >> >> >>hereby notified that you have received this message in error and
> >>that
> >> >> >>any review, dissemination, distribution, or copying of this
> >>message is
> >> >> >>strictly prohibited. If you have received this communication in
> >>error,
> >> >> >>please notify the sender by telephone or e-mail (as shown above)
> >> >> >>immediately and destroy any and all copies of this message in your
> >> >> >>possession (whether hard copies or electronically stored copies).
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe
> >> >>ceph-devel" in
> >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >>
> >> >> >>
> >> >>
> >> >> PLEASE NOTE: The information contained in this electronic mail
> >>message
> >> >>is
> >> >> intended only for the use of the designated recipient(s) named above.
> >> >>If the
> >> >> reader of this message is not the intended recipient, you are hereby
> >> >> notified that you have received this message in error and that any
> >> >>review,
> >> >> dissemination, distribution, or copying of this message is strictly
> >> >> prohibited. If you have received this communication in error, please
> >> >>notify
> >> >> the sender by telephone or e-mail (as shown above) immediately and
> >> >>destroy
> >> >> any and all copies of this message in your possession (whether hard
> >> >>copies
> >> >> or electronically stored copies).
> >> >>
> >>
> >>>>N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:
> >>>>+v
> >> >>���w�j�m��������zZ+��ݢj"��
> >>
> >> PLEASE NOTE: The information contained in this electronic mail message
> >>is
> >> intended only for the use of the designated recipient(s) named above.
> >>If the
> >> reader of this message is not the intended recipient, you are hereby
> >> notified that you have received this message in error and that any
> >>review,
> >> dissemination, distribution, or copying of this message is strictly
> >> prohibited. If you have received this communication in error, please
> >>notify
> >> the sender by telephone or e-mail (as shown above) immediately and
> >>destroy
> >> any and all copies of this message in your possession (whether hard
> >>copies
> >> or electronically stored copies).
> >>
> >>N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v
> >>���w�j�m��������zZ+��ݢj"��
> 
> PLEASE NOTE: The information contained in this electronic mail message is
> intended only for the use of the designated recipient(s) named above. If the
> reader of this message is not the intended recipient, you are hereby
> notified that you have received this message in error and that any review,
> dissemination, distribution, or copying of this message is strictly
> prohibited. If you have received this communication in error, please notify
> the sender by telephone or e-mail (as shown above) immediately and destroy
> any and all copies of this message in your possession (whether hard copies
> or electronically stored copies).
> N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux