Re: reads while 100% write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 30 Mar 2016, Jason Dillaman wrote:
> Are you using the RBD default of 4MB object sizes or are you using 
> something much smaller like 64KB?  An object map of that size should be 
> tracking up to 24,576,000 objects.  When you ran your test before, did 
> you have the RBD object map disabled?  This definitely seems to be a use 
> case where the lack of a cache in front of BlueStore is hurting small 
> IO.

Using the rados cache hint WILLNEED is probably appropriate here..

sage

> 
> -- 
> 
> Jason Dillaman 
> 
> 
> ----- Original Message -----
> > From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
> > To: "Jason Dillaman" <dillaman@xxxxxxxxxx>
> > Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx
> > Sent: Wednesday, March 30, 2016 3:00:47 PM
> > Subject: Re: reads while 100% write
> > 
> > 1.5T in that run.
> > With 150G behavior is the same. Except it says "_do_read 0~18 size 615030”
> > instead of 6M.
> > 
> > Also when random 4k write starts there are more reads then writes:
> > 
> > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> > avgqu-sz   await r_await w_await  svctm  %util
> > 
> > sdd               0.00  1887.00    0.00  344.00     0.00  8924.00    51.88
> >     0.36    1.06    0.00    1.06   0.91  31.20
> > sde              30.00     0.00   30.00  957.00 18120.00  3828.00    44.47
> >     0.25    0.26    3.87    0.14   0.17  16.40
> > 
> > Logs: http://pastebin.com/gGzfR5ez
> > 
> > 
> > On 3/30/16, 11:37 AM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote:
> > 
> > >How large is your RBD image?  100 terabytes?
> > >
> > >--
> > >
> > >Jason Dillaman
> > >
> > >
> > >----- Original Message -----
> > >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx>
> > >> To: "Sage Weil" <sage@xxxxxxxxxxxx>
> > >> Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >> Sent: Wednesday, March 30, 2016 2:14:12 PM
> > >> Subject: Re: reads while 100% write
> > >>
> > >> These are suspicious lines:
> > >>
> > >> 2016-03-30 10:54:23.142205 7f2e933ff700 10 bluestore(src/dev/osd0) read
> > >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 6144018~6012 =
> > >> 6012
> > >> 2016-03-30 10:54:23.142252 7f2e933ff700 15 bluestore(src/dev/osd0) read
> > >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
> > >> 2016-03-30 10:54:23.142260 7f2e933ff700 20 bluestore(src/dev/osd0)
> > >> _do_read 8210~4096 size 6150030
> > >> 2016-03-30 10:54:23.142267 7f2e933ff700  5 bdev(src/dev/osd0/block) read
> > >> 8003854336~8192
> > >> 2016-03-30 10:54:23.142609 7f2e933ff700 10 bluestore(src/dev/osd0) read
> > >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 =
> > >>4096
> > >> 2016-03-30 10:54:23.142882 7f2e933ff700 15 bluestore(src/dev/osd0)
> > >>_write
> > >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096
> > >> 2016-03-30 10:54:23.142888 7f2e933ff700 20 bluestore(src/dev/osd0)
> > >> _do_write #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 -
> > >>have
> > >> 6150030 bytes in 1 extents
> > >>
> > >> More logs here: http://pastebin.com/74WLzFYw
> > >>
> > >>
> > >>
> > >> On 3/30/16, 4:19 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote:
> > >>
> > >> >On Wed, 30 Mar 2016, Evgeniy Firsov wrote:
> > >> >> After pulling master branch on Friday I start seeing odd fio
> > >>behavior, I
> > >> >> see a lot of reads while writing and very low performance no matter
> > >> >> whether it read or write workload.
> > >> >>
> > >> >> Output from sequential 1M write:
> > >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> > >> >>avgrq-sz
> > >> >> avgqu-sz   await r_await w_await  svctm  %util
> > >> >>
> > >> >> sdd               0.00   409.00    0.00  364.00     0.00  3092.00
> > >> >>16.99
> > >> >>     0.28    0.78    0.00    0.78   0.76  27.60
> > >> >> sde               0.00   242.00  365.00  363.00  2436.00  9680.00
> > >> >>33.29
> > >> >>     0.18    0.24    0.42    0.07   0.23  16.80
> > >> >>
> > >> >>
> > >> >>
> > >> >> block.db -> /dev/sdd
> > >> >> block -> /dev/sde
> > >> >>
> > >> >> health HEALTH_OK
> > >> >> monmap e1: 1 mons at {a=127.0.0.1:6789/0}
> > >> >>        election epoch 3, quorum 0 a
> > >> >> osdmap e7: 1 osds: 1 up, 1 in
> > >> >>        flags sortbitwise
> > >> >> pgmap v24: 64 pgs, 1 pools, 577 MB data, 9152 objects
> > >> >>        8210 MB used, 178 GB / 186 GB avail
> > >> >>              64 active+clean
> > >> >> client io 1550 kB/s rd, 9559 kB/s wr, 645 op/s rd, 387 op/s wr
> > >> >>
> > >> >>
> > >> >> While on earlier revision(c1e41af) everything looks as expected:
> > >> >>
> > >> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> > >> >>avgrq-sz
> > >> >> avgqu-sz   await r_await w_await  svctm  %util
> > >> >> sdd               0.00  4910.00    0.00  680.00     0.00 22416.00
> > >> >>65.93
> > >> >>     1.05    1.55    0.00    1.55   1.18  80.00
> > >> >> sde               0.00     0.00    0.00 3418.00     0.00 217612.00
> > >> >> 127.33    63.78   18.18    0.00   18.18   0.25  86.40
> > >> >>
> > >> >> Other observation, may be related to the issue, is that CPU load is
> > >> >> imbalanced. Single ³tp_osd_tp² thread is 100% busy, while the rest is
> > >> >>idle.
> > >> >> Looks like all load goes to single thread pool shard, earlier CPU was
> > >> >>well
> > >> >> balanced.
> > >> >
> > >> >Hmm.  Can you capture a log with debug bluestore = 20 and debug bdev =
> > >>20?
> > >> >
> > >> >Thanks!
> > >> >sage
> > >> >
> > >> >
> > >> >>
> > >> >>
> > >> >> ‹
> > >> >> Evgeniy
> > >> >>
> > >> >>
> > >> >>
> > >> >> PLEASE NOTE: The information contained in this electronic mail
> > >>message
> > >> >>is intended only for the use of the designated recipient(s) named
> > >>above.
> > >> >>If the reader of this message is not the intended recipient, you are
> > >> >>hereby notified that you have received this message in error and that
> > >> >>any review, dissemination, distribution, or copying of this message is
> > >> >>strictly prohibited. If you have received this communication in error,
> > >> >>please notify the sender by telephone or e-mail (as shown above)
> > >> >>immediately and destroy any and all copies of this message in your
> > >> >>possession (whether hard copies or electronically stored copies).
> > >> >> --
> > >> >> To unsubscribe from this list: send the line "unsubscribe
> > >>ceph-devel" in
> > >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> > >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >>
> > >> >>
> > >>
> > >> PLEASE NOTE: The information contained in this electronic mail message
> > >>is
> > >> intended only for the use of the designated recipient(s) named above.
> > >>If the
> > >> reader of this message is not the intended recipient, you are hereby
> > >> notified that you have received this message in error and that any
> > >>review,
> > >> dissemination, distribution, or copying of this message is strictly
> > >> prohibited. If you have received this communication in error, please
> > >>notify
> > >> the sender by telephone or e-mail (as shown above) immediately and
> > >>destroy
> > >> any and all copies of this message in your possession (whether hard
> > >>copies
> > >> or electronically stored copies).
> > >>
> > >>N???????????????r??????y?????????b???X??????ǧv???^???)޺{.n???+?????????z???]z?????????{ay???ʇڙ???,j??????f?????????h?????????z??????w??????????????????j:+v
> > >>?????????w???j???m????????????????????????zZ+??????ݢj"??????
> > 
> > PLEASE NOTE: The information contained in this electronic mail message is
> > intended only for the use of the designated recipient(s) named above. If the
> > reader of this message is not the intended recipient, you are hereby
> > notified that you have received this message in error and that any review,
> > dissemination, distribution, or copying of this message is strictly
> > prohibited. If you have received this communication in error, please notify
> > the sender by telephone or e-mail (as shown above) immediately and destroy
> > any and all copies of this message in your possession (whether hard copies
> > or electronically stored copies).
> > N???????????????r??????y?????????b???X??????ǧv???^???)޺{.n???+?????????z???]z?????????{ay???ʇڙ???,j??????f?????????h?????????z??????w??????????????????j:+v?????????w???j???m????????????????????????zZ+??????ݢj"??????
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux