Re: reads while 100% write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I use 64K.
Explicit settings are identical for both revisions.

Looks like the following change slows down performance 10 times:

-OPTION(rbd_default_features, OPT_INT, 3) // only applies to format 2
images
-                                        // +1 for layering, +2 for
stripingv2,
-                                        // +4 for exclusive lock, +8 for
object map
+OPTION(rbd_default_features, OPT_INT, 61)   // only applies to format 2
images
+                                           // +1 for layering, +2 for
stripingv2,
+                                           // +4 for exclusive lock, +8
for object map
+                                           // +16 for fast-diff, +32 for
deep-flatten,
+                                           // +64 for journaling



On 3/30/16, 12:10 PM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote:

>Are you using the RBD default of 4MB object sizes or are you using
>something much smaller like 64KB?  An object map of that size should be
>tracking up to 24,576,000 objects.  When you ran your test before, did
>you have the RBD object map disabled?  This definitely seems to be a use
>case where the lack of a cache in front of BlueStore is hurting small IO.
>
>--
>
>Jason Dillaman
>
>
>----- Original Message -----
>> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
>> To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
>> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx
>> Sent: Wednesday, March 30, 2016 3:00:47 PM
>> Subject: Re: reads while 100% write
>>
>> 1.5T in that run.
>> With 150G behavior is the same. Except it says "_do_read 0~18 size
>>615030”
>> instead of 6M.
>>
>> Also when random 4k write starts there are more reads then writes:
>>
>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>>avgrq-sz
>> avgqu-sz   await r_await w_await  svctm  %util
>>
>> sdd               0.00  1887.00    0.00  344.00     0.00  8924.00
>>51.88
>>     0.36    1.06    0.00    1.06   0.91  31.20
>> sde              30.00     0.00   30.00  957.00 18120.00  3828.00
>>44.47
>>     0.25    0.26    3.87    0.14   0.17  16.40
>>
>> Logs: http://pastebin.com/gGzfR5ez
>>
>>
>> On 3/30/16, 11:37 AM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote:
>>
>> >How large is your RBD image?  100 terabytes?
>> >
>> >--
>> >
>> >Jason Dillaman
>> >
>> >
>> >----- Original Message -----
>> >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
>> >> To: "Sage Weil" <sage@xxxxxxxxxxxx>
>> >> Cc: ceph-devel@xxxxxxxxxxxxxxx
>> >> Sent: Wednesday, March 30, 2016 2:14:12 PM
>> >> Subject: Re: reads while 100% write
>> >>
>> >> These are suspicious lines:
>> >>
>> >> 2016-03-30 10:54:23.142205 7f2e933ff700 10 bluestore(src/dev/osd0)
>>read
>> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head#
>>6144018~6012 =
>> >> 6012
>> >> 2016-03-30 10:54:23.142252 7f2e933ff700 15 bluestore(src/dev/osd0)
>>read
>> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
>> >> 2016-03-30 10:54:23.142260 7f2e933ff700 20 bluestore(src/dev/osd0)
>> >> _do_read 8210~4096 size 6150030
>> >> 2016-03-30 10:54:23.142267 7f2e933ff700  5 bdev(src/dev/osd0/block)
>>read
>> >> 8003854336~8192
>> >> 2016-03-30 10:54:23.142609 7f2e933ff700 10 bluestore(src/dev/osd0)
>>read
>> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 =
>> >>4096
>> >> 2016-03-30 10:54:23.142882 7f2e933ff700 15 bluestore(src/dev/osd0)
>> >>_write
>> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
>> >> 2016-03-30 10:54:23.142888 7f2e933ff700 20 bluestore(src/dev/osd0)
>> >> _do_write #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 -
>> >>have
>> >> 6150030 bytes in 1 extents
>> >>
>> >> More logs here: http://pastebin.com/74WLzFYw
>> >>
>> >>
>> >>
>> >> On 3/30/16, 4:19 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote:
>> >>
>> >> >On Wed, 30 Mar 2016, Evgeniy Firsov wrote:
>> >> >> After pulling master branch on Friday I start seeing odd fio
>> >>behavior, I
>> >> >> see a lot of reads while writing and very low performance no
>>matter
>> >> >> whether it read or write workload.
>> >> >>
>> >> >> Output from sequential 1M write:
>> >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> >> >>avgrq-sz
>> >> >> avgqu-sz   await r_await w_await  svctm  %util
>> >> >>
>> >> >> sdd               0.00   409.00    0.00  364.00     0.00  3092.00
>> >> >>16.99
>> >> >>     0.28    0.78    0.00    0.78   0.76  27.60
>> >> >> sde               0.00   242.00  365.00  363.00  2436.00  9680.00
>> >> >>33.29
>> >> >>     0.18    0.24    0.42    0.07   0.23  16.80
>> >> >>
>> >> >>
>> >> >>
>> >> >> block.db -> /dev/sdd
>> >> >> block -> /dev/sde
>> >> >>
>> >> >> health HEALTH_OK
>> >> >> monmap e1: 1 mons at {a=127.0.0.1:6789/0}
>> >> >>        election epoch 3, quorum 0 a
>> >> >> osdmap e7: 1 osds: 1 up, 1 in
>> >> >>        flags sortbitwise
>> >> >> pgmap v24: 64 pgs, 1 pools, 577 MB data, 9152 objects
>> >> >>        8210 MB used, 178 GB / 186 GB avail
>> >> >>              64 active+clean
>> >> >> client io 1550 kB/s rd, 9559 kB/s wr, 645 op/s rd, 387 op/s wr
>> >> >>
>> >> >>
>> >> >> While on earlier revision(c1e41af) everything looks as expected:
>> >> >>
>> >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
>> >> >>avgrq-sz
>> >> >> avgqu-sz   await r_await w_await  svctm  %util
>> >> >> sdd               0.00  4910.00    0.00  680.00     0.00 22416.00
>> >> >>65.93
>> >> >>     1.05    1.55    0.00    1.55   1.18  80.00
>> >> >> sde               0.00     0.00    0.00 3418.00     0.00 217612.00
>> >> >> 127.33    63.78   18.18    0.00   18.18   0.25  86.40
>> >> >>
>> >> >> Other observation, may be related to the issue, is that CPU load
>>is
>> >> >> imbalanced. Single ³tp_osd_tp² thread is 100% busy, while the
>>rest is
>> >> >>idle.
>> >> >> Looks like all load goes to single thread pool shard, earlier CPU
>>was
>> >> >>well
>> >> >> balanced.
>> >> >
>> >> >Hmm.  Can you capture a log with debug bluestore = 20 and debug
>>bdev =
>> >>20?
>> >> >
>> >> >Thanks!
>> >> >sage
>> >> >
>> >> >
>> >> >>
>> >> >>
>> >> >> ‹
>> >> >> Evgeniy
>> >> >>
>> >> >>
>> >> >>
>> >> >> PLEASE NOTE: The information contained in this electronic mail
>> >>message
>> >> >>is intended only for the use of the designated recipient(s) named
>> >>above.
>> >> >>If the reader of this message is not the intended recipient, you
>>are
>> >> >>hereby notified that you have received this message in error and
>>that
>> >> >>any review, dissemination, distribution, or copying of this
>>message is
>> >> >>strictly prohibited. If you have received this communication in
>>error,
>> >> >>please notify the sender by telephone or e-mail (as shown above)
>> >> >>immediately and destroy any and all copies of this message in your
>> >> >>possession (whether hard copies or electronically stored copies).
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe
>> >>ceph-devel" in
>> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >>
>> >> >>
>> >>
>> >> PLEASE NOTE: The information contained in this electronic mail
>>message
>> >>is
>> >> intended only for the use of the designated recipient(s) named above.
>> >>If the
>> >> reader of this message is not the intended recipient, you are hereby
>> >> notified that you have received this message in error and that any
>> >>review,
>> >> dissemination, distribution, or copying of this message is strictly
>> >> prohibited. If you have received this communication in error, please
>> >>notify
>> >> the sender by telephone or e-mail (as shown above) immediately and
>> >>destroy
>> >> any and all copies of this message in your possession (whether hard
>> >>copies
>> >> or electronically stored copies).
>> >>
>>
>>>>N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:
>>>>+v
>> >>���w�j�m��������zZ+��ݢj"��
>>
>> PLEASE NOTE: The information contained in this electronic mail message
>>is
>> intended only for the use of the designated recipient(s) named above.
>>If the
>> reader of this message is not the intended recipient, you are hereby
>> notified that you have received this message in error and that any
>>review,
>> dissemination, distribution, or copying of this message is strictly
>> prohibited. If you have received this communication in error, please
>>notify
>> the sender by telephone or e-mail (as shown above) immediately and
>>destroy
>> any and all copies of this message in your possession (whether hard
>>copies
>> or electronically stored copies).
>>
>>N�����r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v
>>���w�j�m��������zZ+��ݢj"��

PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux