Correct, the change for the default RBD features actually merged on March 1 as well (a7470c8), albeit a few hours after the commit you last tested against (c1e41af). You can revert to pre-Jewel RBD features on an existing image by running the following: # rbd feature disable <image name> exclusive-lock,object-map,fast-diff,deep-flatten Hopefully the new PR to add the WILLNEED fadvise flag helps. -- Jason Dillaman ----- Original Message ----- > From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx> > To: "Jason Dillaman" <dillaman@xxxxxxxxxx> > Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx > Sent: Wednesday, March 30, 2016 4:39:09 PM > Subject: Re: reads while 100% write > > I use 64K. > Explicit settings are identical for both revisions. > > Looks like the following change slows down performance 10 times: > > -OPTION(rbd_default_features, OPT_INT, 3) // only applies to format 2 > images > - // +1 for layering, +2 for > stripingv2, > - // +4 for exclusive lock, +8 for > object map > +OPTION(rbd_default_features, OPT_INT, 61) // only applies to format 2 > images > + // +1 for layering, +2 for > stripingv2, > + // +4 for exclusive lock, +8 > for object map > + // +16 for fast-diff, +32 for > deep-flatten, > + // +64 for journaling > > > > On 3/30/16, 12:10 PM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote: > > >Are you using the RBD default of 4MB object sizes or are you using > >something much smaller like 64KB? An object map of that size should be > >tracking up to 24,576,000 objects. When you ran your test before, did > >you have the RBD object map disabled? This definitely seems to be a use > >case where the lack of a cache in front of BlueStore is hurting small IO. > > > >-- > > > >Jason Dillaman > > > > > >----- Original Message ----- > >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx> > >> To: "Jason Dillaman" <dillaman@xxxxxxxxxx> > >> Cc: "Sage Weil" <sage@xxxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx > >> Sent: Wednesday, March 30, 2016 3:00:47 PM > >> Subject: Re: reads while 100% write > >> > >> 1.5T in that run. > >> With 150G behavior is the same. Except it says "_do_read 0~18 size > >>615030” > >> instead of 6M. > >> > >> Also when random 4k write starts there are more reads then writes: > >> > >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > >>avgrq-sz > >> avgqu-sz await r_await w_await svctm %util > >> > >> sdd 0.00 1887.00 0.00 344.00 0.00 8924.00 > >>51.88 > >> 0.36 1.06 0.00 1.06 0.91 31.20 > >> sde 30.00 0.00 30.00 957.00 18120.00 3828.00 > >>44.47 > >> 0.25 0.26 3.87 0.14 0.17 16.40 > >> > >> Logs: http://pastebin.com/gGzfR5ez > >> > >> > >> On 3/30/16, 11:37 AM, "Jason Dillaman" <dillaman@xxxxxxxxxx> wrote: > >> > >> >How large is your RBD image? 100 terabytes? > >> > > >> >-- > >> > > >> >Jason Dillaman > >> > > >> > > >> >----- Original Message ----- > >> >> From: "Evgeniy Firsov" <Evgeniy.Firsov@xxxxxxxxxxx> > >> >> To: "Sage Weil" <sage@xxxxxxxxxxxx> > >> >> Cc: ceph-devel@xxxxxxxxxxxxxxx > >> >> Sent: Wednesday, March 30, 2016 2:14:12 PM > >> >> Subject: Re: reads while 100% write > >> >> > >> >> These are suspicious lines: > >> >> > >> >> 2016-03-30 10:54:23.142205 7f2e933ff700 10 bluestore(src/dev/osd0) > >>read > >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# > >>6144018~6012 = > >> >> 6012 > >> >> 2016-03-30 10:54:23.142252 7f2e933ff700 15 bluestore(src/dev/osd0) > >>read > >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 > >> >> 2016-03-30 10:54:23.142260 7f2e933ff700 20 bluestore(src/dev/osd0) > >> >> _do_read 8210~4096 size 6150030 > >> >> 2016-03-30 10:54:23.142267 7f2e933ff700 5 bdev(src/dev/osd0/block) > >>read > >> >> 8003854336~8192 > >> >> 2016-03-30 10:54:23.142609 7f2e933ff700 10 bluestore(src/dev/osd0) > >>read > >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 = > >> >>4096 > >> >> 2016-03-30 10:54:23.142882 7f2e933ff700 15 bluestore(src/dev/osd0) > >> >>_write > >> >> 0.d_head #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 > >> >> 2016-03-30 10:54:23.142888 7f2e933ff700 20 bluestore(src/dev/osd0) > >> >> _do_write #0:b06b5e8e:::rbd_object_map.10046b8b4567:head# 8210~4096 - > >> >>have > >> >> 6150030 bytes in 1 extents > >> >> > >> >> More logs here: http://pastebin.com/74WLzFYw > >> >> > >> >> > >> >> > >> >> On 3/30/16, 4:19 AM, "Sage Weil" <sage@xxxxxxxxxxxx> wrote: > >> >> > >> >> >On Wed, 30 Mar 2016, Evgeniy Firsov wrote: > >> >> >> After pulling master branch on Friday I start seeing odd fio > >> >>behavior, I > >> >> >> see a lot of reads while writing and very low performance no > >>matter > >> >> >> whether it read or write workload. > >> >> >> > >> >> >> Output from sequential 1M write: > >> >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > >> >> >>avgrq-sz > >> >> >> avgqu-sz await r_await w_await svctm %util > >> >> >> > >> >> >> sdd 0.00 409.00 0.00 364.00 0.00 3092.00 > >> >> >>16.99 > >> >> >> 0.28 0.78 0.00 0.78 0.76 27.60 > >> >> >> sde 0.00 242.00 365.00 363.00 2436.00 9680.00 > >> >> >>33.29 > >> >> >> 0.18 0.24 0.42 0.07 0.23 16.80 > >> >> >> > >> >> >> > >> >> >> > >> >> >> block.db -> /dev/sdd > >> >> >> block -> /dev/sde > >> >> >> > >> >> >> health HEALTH_OK > >> >> >> monmap e1: 1 mons at {a=127.0.0.1:6789/0} > >> >> >> election epoch 3, quorum 0 a > >> >> >> osdmap e7: 1 osds: 1 up, 1 in > >> >> >> flags sortbitwise > >> >> >> pgmap v24: 64 pgs, 1 pools, 577 MB data, 9152 objects > >> >> >> 8210 MB used, 178 GB / 186 GB avail > >> >> >> 64 active+clean > >> >> >> client io 1550 kB/s rd, 9559 kB/s wr, 645 op/s rd, 387 op/s wr > >> >> >> > >> >> >> > >> >> >> While on earlier revision(c1e41af) everything looks as expected: > >> >> >> > >> >> >> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > >> >> >>avgrq-sz > >> >> >> avgqu-sz await r_await w_await svctm %util > >> >> >> sdd 0.00 4910.00 0.00 680.00 0.00 22416.00 > >> >> >>65.93 > >> >> >> 1.05 1.55 0.00 1.55 1.18 80.00 > >> >> >> sde 0.00 0.00 0.00 3418.00 0.00 217612.00 > >> >> >> 127.33 63.78 18.18 0.00 18.18 0.25 86.40 > >> >> >> > >> >> >> Other observation, may be related to the issue, is that CPU load > >>is > >> >> >> imbalanced. Single ³tp_osd_tp² thread is 100% busy, while the > >>rest is > >> >> >>idle. > >> >> >> Looks like all load goes to single thread pool shard, earlier CPU > >>was > >> >> >>well > >> >> >> balanced. > >> >> > > >> >> >Hmm. Can you capture a log with debug bluestore = 20 and debug > >>bdev = > >> >>20? > >> >> > > >> >> >Thanks! > >> >> >sage > >> >> > > >> >> > > >> >> >> > >> >> >> > >> >> >> ‹ > >> >> >> Evgeniy > >> >> >> > >> >> >> > >> >> >> > >> >> >> PLEASE NOTE: The information contained in this electronic mail > >> >>message > >> >> >>is intended only for the use of the designated recipient(s) named > >> >>above. > >> >> >>If the reader of this message is not the intended recipient, you > >>are > >> >> >>hereby notified that you have received this message in error and > >>that > >> >> >>any review, dissemination, distribution, or copying of this > >>message is > >> >> >>strictly prohibited. If you have received this communication in > >>error, > >> >> >>please notify the sender by telephone or e-mail (as shown above) > >> >> >>immediately and destroy any and all copies of this message in your > >> >> >>possession (whether hard copies or electronically stored copies). > >> >> >> -- > >> >> >> To unsubscribe from this list: send the line "unsubscribe > >> >>ceph-devel" in > >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> > >> >> >> > >> >> > >> >> PLEASE NOTE: The information contained in this electronic mail > >>message > >> >>is > >> >> intended only for the use of the designated recipient(s) named above. > >> >>If the > >> >> reader of this message is not the intended recipient, you are hereby > >> >> notified that you have received this message in error and that any > >> >>review, > >> >> dissemination, distribution, or copying of this message is strictly > >> >> prohibited. If you have received this communication in error, please > >> >>notify > >> >> the sender by telephone or e-mail (as shown above) immediately and > >> >>destroy > >> >> any and all copies of this message in your possession (whether hard > >> >>copies > >> >> or electronically stored copies). > >> >> > >> > >>>>N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j: > >>>>+v > >> >>���w�j�m��������zZ+��ݢj"�� > >> > >> PLEASE NOTE: The information contained in this electronic mail message > >>is > >> intended only for the use of the designated recipient(s) named above. > >>If the > >> reader of this message is not the intended recipient, you are hereby > >> notified that you have received this message in error and that any > >>review, > >> dissemination, distribution, or copying of this message is strictly > >> prohibited. If you have received this communication in error, please > >>notify > >> the sender by telephone or e-mail (as shown above) immediately and > >>destroy > >> any and all copies of this message in your possession (whether hard > >>copies > >> or electronically stored copies). > >> > >>N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v > >>���w�j�m��������zZ+��ݢj"�� > > PLEASE NOTE: The information contained in this electronic mail message is > intended only for the use of the designated recipient(s) named above. If the > reader of this message is not the intended recipient, you are hereby > notified that you have received this message in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > the sender by telephone or e-mail (as shown above) immediately and destroy > any and all copies of this message in your possession (whether hard copies > or electronically stored copies). > N�����r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w������j:+v���w�j�m��������zZ+��ݢj"�� -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html