RE: bluestore prefer wal size

Sage Weil <sweil@xxxxxxxxxx> · Mon, 6 Mar 2017 14:59:28 +0000 (UTC)

On Fri, 3 Feb 2017, Nick Fisk wrote:
> Hi Mark,
> 
> > -----Original Message-----
> > From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> > Sent: 03 February 2017 15:20
> > To: nick@xxxxxxxxxx; 'Sage Weil' <sage@xxxxxxxxxxxx>
> > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > Subject: Re: bluestore prefer wal size
> > 
> > Hi Nick,
> > 
> > So I'm still catching up to your testing, but last night I ran through a number of tests with a single OSD, a single client, and iodepth=1 fio
> > rbd tests:
> > 
> > https://drive.google.com/uc?export=download&id=0B2gTBZrkrnpZWE85OFI3Q2xQZ00
> > 
> > I tested HDD, HDD+NVMe, and NVMe configurations looking at filestore, bluestore with 1k and 16k prefer wal sizes.  I believe I'm
> > seeing similar results to what you saw.  On NVMe we are pretty similar likely due to the raw backend throughput the NVMe drives can
> > maintain but on HDD, bluestore is doing quite a bit worse than filestore for this specific use case.  During the bluestore HDD tests, I
> > watched disk access and it appears we are saturating the disk with small writes despite the low client performance.  I'll be digging into
> > this more today but I wanted to send out an update to let you know I believe I've reproduced your results and am looking into it.
> 
> Glad to hear that you can reproduce. After debugging that IF statement, 
> I think I'm at the point where I am out of my depth. But if there is 
> anything I can do, let me know.

I finally got my dev box HDD sorted out and retested this.  The IOPS were 
being halved because bluestore was triggering a rocksdb commit (and the 
associated latency) just to retire the WAL record.  I rebased and fixed it 
so that the WAL entries are deleted lazily (in the next commit round) and 
it's 2x faster than before.  I'm getting ~110 IOPS with rados bench 4k 
writes with queue depth 1 (6TB 7200rpm WD black).  That's in the 
neighborhood of what I'd expect from a spinner... it's basically seeking 
back and forth between two write positions (the rocksdb log and the new 
object data where the allocator position is).  We could probably bring 
this up a bit by making the WAL work be batched on HDD (build up several 
IOs worth and dispatch it all at once to reduce seeks).  That'll take a 
bit more work, though.

With WAL on an NVMe, I get about ~450 IOPS.

Want to give it a try?
sage

 > 
> > 
> > Thanks!
> > Mark
> > 
> > On 02/03/2017 03:49 AM, Nick Fisk wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> > >> Sent: 02 February 2017 21:15
> > >> To: 'Mark Nelson' <mnelson@xxxxxxxxxx>; 'Sage Weil'
> > >> <sage@xxxxxxxxxxxx>
> > >> Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >> Subject: RE: bluestore prefer wal size
> > >>
> > >> Hi Mark,
> > >>> -----Original Message-----
> > >>> From: Mark Nelson [mailto:mnelson@xxxxxxxxxx]
> > >>> Sent: 02 February 2017 20:17
> > >>> To: nick@xxxxxxxxxx; 'Sage Weil' <sage@xxxxxxxxxxxx>
> > >>> Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >>> Subject: Re: bluestore prefer wal size
> > >>>
> > >>> Hi Nick,
> > >>>
> > >>> On 02/02/2017 08:46 AM, Nick Fisk wrote:
> > >>>> Further update
> > >>>>
> > >>>> If I do random writes instead of sequential, I see the "defering
> > >>>> small write
> > >>> via wal" action. But from the do_alloc_write function not small write.
> > >>>>  Ie. log entry is:
> > >>>> _do_alloc_write defering small 0x1000 write via wal
> > >>>
> > >>> I just ran some tests using the branch with 4k random and sequential
> > >>> writes against a single OSD.  I see deferring small write via wal
> > >>> from do_alloc_write both for both sequential and random when
> > >>> bluestore_prefer_wal_size is set to be larger than the IO size.
> > >>> Otherwise, I don't see the messages.  I am also preconditioning with
> > >>> 4MB writes.
> > >>>
> > >>> So far I've just been testing on NVMe, but I should be able to do
> > >>> some mixed tests soon to see if there are any interesting performance implications.
> > >>
> > >> I've just looked at the RBD I ran those random tests on and I
> > >> realized that it was only about 90% pre-conditioned due to me leaving
> > >> the time_based fio option on. After filling completely with writes, now both random and sequential are doing _do_write_small
> > writes and no deferring wal writes.
> > >>
> > >> From looking through the code today, as far as I understand it
> > >>
> > >> Write_small = write to existing object under min_allow_size Write_big
> > >> = write to existing object over min_allow_size Alloc_write = write to
> > >> a new object
> > >>
> > >> Is that correct?
> > >>
> > >> If it is, I'm interested in why you are seeing alloc_writes, as I
> > >> don't see them if the RBD is fully pre-conditioned? Unless there is something changing with the min_alloc_size as you are using non
> > rotational media, which is causing this change?
> > >>
> > >> I'm adding a few extra dout's to the write_small function to try and see what's going on, hopefully I will know more tomorrow.
> > >>
> > >
> > > Morning,
> > >
> > > So, results are in. I added some debugging before the IF statement
> > > (https://github.com/liewegas/ceph/blob/944b4d3a3869a472c1e66a4fa73c127
> > > 66c7801ac/src/os/bluestore/BlueStore.cc#L7885)
> > >
> > > So the following section now looks like below:
> > >
> > >     dout(20) << __func__ << "  before IF 0x" << std::hex << b_off << "~" << b_len << " - " << b->get_blob().get_ondisk_length() << " -
> > " << b->get_blob().is_unused(b_off, b_len) << " - " << b->get_blob().is_allocated(b_off, b_len) << " - " << dendl;
> > >     if ((b_off % chunk_size == 0 && b_len % chunk_size == 0) &&
> > >         b->get_blob().get_ondisk_length() >= b_off + b_len &&
> > >         b->get_blob().is_unused(b_off, b_len) &&
> > >         b->get_blob().is_allocated(b_off, b_len)) {
> > >
> > > And the output was
> > >
> > > _do_write_small  before IF 0xf000~1000 - 10000 - 0 - 1 -
> > >
> > > So the " b->get_blob().is_unused(b_off, b_len)" call is returning false, which stops it going into the IF block and doing the prefer wal
> > write. Is that expected?
> > >
> > > Nick
> > >
> > >> Nick
> > >>
> > >>>
> > >>> Mark
> > >>>
> > >>>>
> > >>>> And thatʼs for a 4kb random write on an RBD that has been fully
> > >>>> conditioned via fio 4MB writes
> > >>>>
> > >>>> So it's something in the sequential nature of the test which trips
> > >>>> it up as I don't see any sign of deferred writes
> > >>>>
> > >>>> From looking at the code in the small write function, I can only
> > >>>> see two possibilities that would stop if from reaching the prefer
> > >>>> wal IF statement
> > >>>>
> > >>>>   while (ep != o->extent_map.extent_map.end()) {
> > >>>>     if (ep->blob_start() >= end) {
> > >>>>       break;
> > >>>>
> > >>>> and
> > >>>>
> > >>>>   if ((b_off % chunk_size == 0 && b_len % chunk_size == 0) &&
> > >>>> 	b->get_blob().get_ondisk_length() >= b_off + b_len &&
> > >>>> 	b->get_blob().is_unused(b_off, b_len) &&
> > >>>> 	b->get_blob().is_allocated(b_off, b_len)) {
> > >>>>
> > >>>> Unfortunately, thatʼs about as far as I have got, before my
> > >>>> knowledge of
> > >>> Bluestore becomes a limitation.
> > >>>>
> > >>>> Don't know if that triggers any thoughts?
> > >>>>
> > >>>>>
> > >>>>> 2017-02-02 10:39:31.988782 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) queue_transactions existing
> > >>>>> 0x56102a5a4900
> > >>>>> osr(9.73 0x56102a5db1a0)
> > >>>>> 2017-02-02 10:39:31.988786 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_create osr 0x56102a5a4900
> > >>>>> =
> > >>>>> 0x56102ab4be40 seq 260
> > >>>>> 2017-02-02 10:39:31.988797 7ff7282ff700 15
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _setattrs 9.73_head
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head# 2 key
> > >>> s
> > >>>>> 2017-02-02 10:39:31.988802 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _setattrs 9.73_head
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head# 2 key
> > >>> s
> > >>>>> = 0
> > >>>>> 2017-02-02 10:39:31.988806 7ff7282ff700 15
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _write 9.73_head
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head#
> > >>> 0xfb000~
> > >>>>> 1000
> > >>>>> 2017-02-02 10:39:31.988809 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head#
> > >>>>> 0xfb000~1000 - have 0x400000 (4194304) bytes fadvise_flags 0x0
> > >>>>> 2017-02-02 10:39:31.988812 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write prefer csum_order 12
> > >>>>> target_blob_size
> > >>>>> 0x80000
> > >>>>> 2017-02-02 10:39:31.988814 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write_small 0xfb000~1000
> > >>>>> 2017-02-02 10:39:31.988816 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write_small considering
> > >>>>> Blob(0x56102aac05a0 blob([0x256480000~80000] mutable+csum
> > >>>>> crc32c/0x1000) ref_map(0x0~80000=1) SharedBlob(0x56102aa76260))
> > >>>>> bstart 0x80000
> > >>>>> 2017-02-02 10:39:31.988821 7ff7282ff700 20
> > >>>>> bluestore.BufferSpace(0x56102aa762b8 in 0x56102a1337a0) _discard
> > >>>>> 0x7b000~1000
> > >>>>> 2017-02-02 10:39:31.988827 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write_small  wal write
> > >>>>> 0x7b000~1000 of mutable Blob(0x56102aac05a0
> > >>> blob([0x256480000~80000]
> > >>>>> mutable+csum crc32c/0x1000) ref_map(0x0~80000=1)
> > >>>>> SharedBlob(0x56102aa76260)) at [0x2564fb000~1000]
> > >>>>> 2017-02-02 10:39:31.988832 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_write_small  lex
> > >>>>> 0xfb000~1000: 0x7b000~1000
> > >>>>> Blob(0x56102aac05a0 blob([0x256480000~80000] mutable+csum
> > >>>>> crc32c/0x1000)
> > >>>>> ref_map(0x0~7b000=1,0x7b000~1000=2,0x7c000~4000=1)
> > >>>>> SharedBlob(0x56102aa76260))
> > >>>>> 2017-02-02 10:39:31.988836 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_alloc_write txc
> > >>>>> 0x56102ab4be40 0 blobs
> > >>>>> 2017-02-02 10:39:31.988837 7ff7282ff700 10 bitmapalloc:reserve
> > >>>>> instance 94627427524464 num_used 345738 total 7627973
> > >>>>> 2017-02-02 10:39:31.988840 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _wctx_finish lex_old
> > >>>>> 0xfb000~1000: 0x7b000~1000
> > >>>>> Blob(0x56102aac05a0 blob([0x256480000~80000] mutable+csum
> > >>>>> crc32c/0x1000)
> > >>>>> ref_map(0x0~7b000=1,0x7b000~1000=2,0x7c000~4000=1)
> > >>>>> SharedBlob(0x56102aa76260))
> > >>>>> 2017-02-02 10:39:31.988844 7ff7282ff700 20
> > >>>>> bluestore.extentmap(0x56102a8ee3f0) compress_extent_map
> > >>> 0xfb000~1000
> > >>>>> next shard
> > >>>>> 0x100000 merging 0x80000~7b000: 0x0~7b000 Blob(0x56102aac05a0
> > >>>>> blob([0x256480000~80000] mutable+csum crc32c/0x1000)
> > >>>>> ref_map(0x0~80000=1) SharedBlob(0x56102aa76260)) and 0xfb000~1000:
> > >>>>> 0x7b000~1000 Blob(0x56102aac05a0 blob([0x256480000~80000]
> > >>>>> mutable+csum crc32c/0x1000) ref_map(0x0~80000=1)
> > >>>>> SharedBlob(0x56102aa76260))
> > >>>>> 2017-02-02 10:39:31.988850 7ff7282ff700 20
> > >>>>> bluestore.extentmap(0x56102a8ee3f0) compress_extent_map
> > >>> 0xfb000~1000
> > >>>>> next shard
> > >>>>> 0x100000 merging 0x80000~7c000: 0x0~7c000 Blob(0x56102aac05a0
> > >>>>> blob([0x256480000~80000] mutable+csum crc32c/0x1000)
> > >>>>> ref_map(0x0~80000=1) SharedBlob(0x56102aa76260)) and 0xfc000~4000:
> > >>>>> 0x7c000~4000 Blob(0x56102aac05a0 blob([0x256480000~80000]
> > >>>>> mutable+csum crc32c/0x1000) ref_map(0x0~80000=1)
> > >>>>> SharedBlob(0x56102aa76260))
> > >>>>> 2017-02-02 10:39:31.988868 7ff7282ff700 20
> > >>>>> bluestore.extentmap(0x56102a8ee3f0) dirty_range mark shard 0x80000
> > >>>>> dirty
> > >>>>> 2017-02-02 10:39:31.988870 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _write 9.73_head
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head#
> > >>>>> 0xfb000~1000 = 0
> > >>>>> 2017-02-02 10:39:31.988875 7ff7282ff700 15
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _omap_setkeys 9.73_head
> > >>>>> #9:ce000000::::head#
> > >>>>> 2017-02-02 10:39:31.988882 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _omap_setkeys 9.73_head
> > >>>>> #9:ce000000::::head# = 0
> > >>>>> 2017-02-02 10:39:31.988885 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_write_nodes txc
> > >>>>> 0x56102ab4be40 onodes
> > >>>>> 0x56102a8ee300 shared_blobs
> > >>>>> 2017-02-02 10:39:31.988894 7ff7282ff700 20
> > >>>>> bluestore.extentmap(0x56102a8ee3f0) update shard 0x80000 is 531
> > >>>>> bytes
> > >>> (was 531) from 1 extents
> > >>>>> 2017-02-02 10:39:31.988902 7ff7282ff700 20
> > >>> bluestore(/var/lib/ceph/osd/ceph-4)   onode
> > >>>>> #9:cff290b9:::rbd_data.36646a74b0dc51.0000000000000001:head# is
> > >>>>> 424
> > >>>>> (422 bytes onode + 2 bytes spanning blobs + 0 bytes inline
> > >>>>> extents)
> > >>>>> 2017-02-02 10:39:31.988914 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_state_proc txc
> > >>>>> 0x56102ab4be40 prepare
> > >>>>> 2017-02-02 10:39:31.988916 7ff7282ff700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_finish_io 0x56102ab4be40
> > >>>>> 2017-02-02 10:39:31.988917 7ff7282ff700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_state_proc txc
> > >>>>> 0x56102ab4be40 io_done
> > >>>>> 2017-02-02 10:39:31.988918 7ff7282ff700 20
> > >>>>> bluestore.BufferSpace(0x56102aa762b8 in 0x56102a1337a0)
> > >>>>> finish_write
> > >>>>> buffer(0x56102adb1830 space 0x56102aa762b8 0x7b000~1000 writing
> > >>>>> nocache)
> > >>>>> 2017-02-02 10:39:31.988939 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _kv_sync_thread wake
> > >>>>> 2017-02-02 10:39:31.988941 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _kv_sync_thread committing 1
> > >>>>> submitting 1 cleaning 0
> > >>>>> 2017-02-02 10:39:31.988952 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_finalize_kv txc
> > >>>>> 0x56102ab4be40 allocated 0x[] released 0x[]
> > >>>>> 2017-02-02 10:39:31.988999 7ff72e30b700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _balance_bluefs_freespace
> > >>>>> bluefs
> > >>>>> 9535 M free
> > >>>>> (0.999896) bluestore 222 G free (0.954675), bluefs_ratio 0.0402143
> > >>>>> 2017-02-02 10:39:31.989400 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _kv_sync_thread committed 1
> > >>>>> cleaned 0 in 0.000456
> > >>>>> 2017-02-02 10:39:31.989405 7ff72e30b700 10
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_state_proc txc
> > >>>>> 0x56102ab4be40 kv_submitted
> > >>>>> 2017-02-02 10:39:31.989407 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _txc_finish_kv txc
> > >>>>> 0x56102ab4be40
> > >>>>> 2017-02-02 10:39:31.989413 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _wal_apply txc 0x56102ab4be40
> > >>>>> seq
> > >>>>> 255
> > >>>>> 2017-02-02 10:39:31.989414 7ff72e30b700 20
> > >>>>> bluestore(/var/lib/ceph/osd/ceph-4) _do_wal_op write
> > >>>>> [0x2564fb000~1000]
> > >>>>>
> > >>>>>
> > >>>>>>
> > >>>>>> sage
> > >>>>>>
> > >>>>>>>
> > >>>>>>> sage
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> sage
> > >>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Nick
> > >>>>>>>>>>
> > >>>>>>>>>> -----Original Message-----
> > >>>>>>>>>> From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> > >>>>>>>>>> Sent: 01 February 2017 19:03
> > >>>>>>>>>> To: 'Sage Weil' <sweil@xxxxxxxxxx>; 'Mark Nelson'
> > >>>>>>>>> <mnelson@xxxxxxxxxx>
> > >>>>>>>>>> Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >>>>>>>>>> Subject: RE: bluestore prefer wal size
> > >>>>>>>>>>
> > >>>>>>>>>> Hi Sage,
> > >>>>>>>>>>
> > >>>>>>>>>> First results not looking good. It looks like write IO to the
> > >>>>>>>>>> SSD's (sdd and sdi)
> > >>>>>>>>> is now massively amplified, by somewhere in the region of
> > >>>>>>>>> about 10x. But I'm still only getting around 100 4kb's seq
> > >>>>>>>>> write iops from the fio client. This is in comparison to
> > >>>>>>>>> 2500-3000 iops on the SSD's and ~200 iops per spinning disk (sdc,sde,sdg,sdh).
> > >>>>>>>>>>
> > >>>>>>>>>> rbd engine: RBD version: 0.1.11
> > >>>>>>>>>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/476KB/0KB /s]
> > >>>>>>>>>> [0/119/0 iops] [eta 00m:00s]
> > >>>>>>>>>> rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=31171: Wed
> > >>>>>>>>>> Feb
> > >>>>>>>>>> 1
> > >>>>>>>>> 18:56:51 2017
> > >>>>>>>>>>   write: io=27836KB, bw=475020B/s, iops=115, runt= 60006msec
> > >>>>>>>>>>     slat (usec): min=6, max=142, avg=10.98, stdev= 8.38
> > >>>>>>>>>>     clat (msec): min=1, max=271, avg= 8.61, stdev= 3.63
> > >>>>>>>>>>      lat (msec): min=1, max=271, avg= 8.62, stdev= 3.63
> > >>>>>>>>>>     clat percentiles (msec):
> > >>>>>>>>>>      |  1.00th=[    8],  5.00th=[    9], 10.00th=[    9], 20.00th=[    9],
> > >>>>>>>>>>      | 30.00th=[    9], 40.00th=[    9], 50.00th=[    9], 60.00th=[    9],
> > >>>>>>>>>>      | 70.00th=[    9], 80.00th=[    9], 90.00th=[    9], 95.00th=[    9],
> > >>>>>>>>>>      | 99.00th=[   17], 99.50th=[   25], 99.90th=[   33], 99.95th=[   36],
> > >>>>>>>>>>      | 99.99th=[  273]
> > >>>>>>>>>>     bw (KB  /s): min=  191, max=  480, per=100.00%,
> > >>>>>>>>>> avg=464.18,
> > >>> stdev=34.60
> > >>>>>>>>>>     lat (msec) : 2=0.01%, 4=0.04%, 10=97.69%, 20=1.55%, 50=0.69%
> > >>>>>>>>>>     lat (msec) : 500=0.01%
> > >>>>>>>>>>   cpu          : usr=0.13%, sys=0.12%, ctx=7224, majf=0, minf=5
> > >>>>>>>>>>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
> > >>> 32=0.0%,
> > >>>>>>>>>> =64=0.0%
> > >>>>>>>>>>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> > >>> 64=0.0%,
> > >>>>>>>>>> =64=0.0%
> > >>>>>>>>>>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> > >>>>>>>>>> 64=0.0%, =64=0.0%
> > >>>>>>>>>>      issued    : total=r=0/w=6959/d=0, short=r=0/w=0/d=0,
> > >>>>>>>>> drop=r=0/w=0/d=0
> > >>>>>>>>>>      latency   : target=0, window=0, percentile=100.00%, depth=1
> > >>>>>>>>>>
> > >>>>>>>>>> I've checked via the admin socket and the prefer wal option
> > >>>>>>>>>> is set to
> > >>> 8192.
> > >>>>>>>>>>
> > >>>>>>>>>> Random capture of iostat
> > >>>>>>>>>>
> > >>>>>>>>>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> > >>> avgqu-sz
> > >>>>>>>>> await r_await w_await  svctm  %util
> > >>>>>>>>>> sda               0.00    77.00    0.00    5.50     0.00   338.00   122.91     0.05
> > >>> 11.64
> > >>>>>>>>> 0.00   11.64   8.73   4.80
> > >>>>>>>>>> sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00
> > >>> 0.00
> > >>>>>>>>> 0.00    0.00   0.00   0.00
> > >>>>>>>>>> sdc               0.00     0.00    0.00  151.50     0.00   302.00     3.99     0.54
> > >>> 3.59
> > >>>>>>>>> 0.00    3.59   3.58  54.20
> > >>>>>>>>>> sdd               0.00     0.00    0.00 1474.00     0.00  4008.00     5.44     0.09
> > >>> 0.06
> > >>>>>>>>> 0.00    0.06   0.06   9.20
> > >>>>>>>>>> sde               0.00     0.00    0.00  217.00     0.00   434.00     4.00     0.91
> > >>> 4.20
> > >>>>>>>>> 0.00    4.20   4.20  91.20
> > >>>>>>>>>> sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00
> > >>> 0.00
> > >>>>>>>>> 0.00    0.00   0.00   0.00
> > >>>>>>>>>> sdg               0.00     0.00    0.00   66.50     0.00   134.00     4.03     0.18
> > >>> 2.68
> > >>>>>>>>> 0.00    2.68   2.68  17.80
> > >>>>>>>>>> sdh               0.00     0.00    0.00  217.00     0.00   434.00     4.00     0.80
> > >>> 3.71
> > >>>>>>>>> 0.00    3.71   3.71  80.40
> > >>>>>>>>>> sdi               0.00     0.00    0.00 1134.00     0.00  3082.00     5.44     0.09
> > >>> 0.08
> > >>>>>>>>> 0.00    0.08   0.07   8.40
> > >>>>>>>>>>
> > >>>>>>>>>> -----Original Message-----
> > >>>>>>>>>> From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > >>>>>>>>>> Sent: 01 February 2017 15:34
> > >>>>>>>>>> To: nick@xxxxxxxxxx
> > >>>>>>>>>> Cc: ceph-devel@xxxxxxxxxxxxxxx
> > >>>>>>>>>> Subject: bluestore prefer wal size
> > >>>>>>>>>>
> > >>>>>>>>>> Hey Nick,
> > >>>>>>>>>>
> > >>>>>>>>>> I've updated/improved the prefer wal size PR (that sends
> > >>>>>>>>>> small writes through the wal).  See
> > >>>>>>>>>>
> > >>>>>>>>>> 	https://github.com/ceph/ceph/pull/13217
> > >>>>>>>>>>
> > >>>>>>>>>> if you want to try it out.
> > >>>>>>>>>>
> > >>>>>>>>>> Thanks!
> > >>>>>>>>>> sage
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> --
> > >>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > >>>>>>> in the body of a message to majordomo@xxxxxxxxxxxxxxx More
> > >>>>>>> majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >>>>>>>
> > >>>>>>>
> > >>>>
> > >
> 
> 
>