RE: bluestore prefer wal size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 1 Feb 2017, Nick Fisk wrote:
 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > Sent: 01 February 2017 21:39
> > To: Nick Fisk <nick@xxxxxxxxxx>
> > Cc: 'Mark Nelson' <mnelson@xxxxxxxxxx>; ceph-devel@xxxxxxxxxxxxxxx
> > Subject: RE: bluestore prefer wal size
> > 
> > On Wed, 1 Feb 2017, Nick Fisk wrote:
> > > Further update,
> > >
> > > I set bluestore_debug_omit_block_device_write to true and this gave me
> > > near filestore performance, albeit still with very high write amp on
> > > the SSD's. So its definitely something around that part of the code
> > > waiting on the writes to the spinning disks, puzzled why the commit didn't
> > help.
> > >
> > > I also set debug logging to 20/20 for bdev,bluefs,osd,bluestore and a
> > > grep didn't reveal any of the debug in that commit eg "defering small
> > > 0x". So possibly something isn't working as expected?
> > 
> > Oh, yeah, the option isn't working then.  I saw the line in my debug output..
> > are you sure you set bluestore_prefer_wal_size ?
> 
> Yep 100%, I even did a config show on the admin socket to confirm the value. I will rebuild the OSD's and try again and do a bit more digging to see if I can work out why it's not entering that if statement.

Oh, I know what the problem is.  Will push a fixed patch shortly.

sage

> 
> > 
> > sage
> > 
> > >
> > > Nick
> > >
> > > -----Original Message-----
> > > From: Nick Fisk [mailto:nick@xxxxxxxxxx]
> > > Sent: 01 February 2017 19:03
> > > To: 'Sage Weil' <sweil@xxxxxxxxxx>; 'Mark Nelson'
> > <mnelson@xxxxxxxxxx>
> > > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > > Subject: RE: bluestore prefer wal size
> > >
> > > Hi Sage,
> > >
> > > First results not looking good. It looks like write IO to the SSD's (sdd and sdi)
> > is now massively amplified, by somewhere in the region of about 10x. But I'm
> > still only getting around 100 4kb's seq write iops from the fio client. This is in
> > comparison to 2500-3000 iops on the SSD's and ~200 iops per spinning disk
> > (sdc,sde,sdg,sdh).
> > >
> > > rbd engine: RBD version: 0.1.11
> > > Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/476KB/0KB /s] [0/119/0 iops]
> > > [eta 00m:00s]
> > > rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=31171: Wed Feb  1
> > 18:56:51 2017
> > >   write: io=27836KB, bw=475020B/s, iops=115, runt= 60006msec
> > >     slat (usec): min=6, max=142, avg=10.98, stdev= 8.38
> > >     clat (msec): min=1, max=271, avg= 8.61, stdev= 3.63
> > >      lat (msec): min=1, max=271, avg= 8.62, stdev= 3.63
> > >     clat percentiles (msec):
> > >      |  1.00th=[    8],  5.00th=[    9], 10.00th=[    9], 20.00th=[    9],
> > >      | 30.00th=[    9], 40.00th=[    9], 50.00th=[    9], 60.00th=[    9],
> > >      | 70.00th=[    9], 80.00th=[    9], 90.00th=[    9], 95.00th=[    9],
> > >      | 99.00th=[   17], 99.50th=[   25], 99.90th=[   33], 99.95th=[   36],
> > >      | 99.99th=[  273]
> > >     bw (KB  /s): min=  191, max=  480, per=100.00%, avg=464.18, stdev=34.60
> > >     lat (msec) : 2=0.01%, 4=0.04%, 10=97.69%, 20=1.55%, 50=0.69%
> > >     lat (msec) : 500=0.01%
> > >   cpu          : usr=0.13%, sys=0.12%, ctx=7224, majf=0, minf=5
> > >   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
> > >=64=0.0%
> > >      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> > >=64=0.0%
> > >      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
> > >=64=0.0%
> > >      issued    : total=r=0/w=6959/d=0, short=r=0/w=0/d=0,
> > drop=r=0/w=0/d=0
> > >      latency   : target=0, window=0, percentile=100.00%, depth=1
> > >
> > > I've checked via the admin socket and the prefer wal option is set to 8192.
> > >
> > > Random capture of iostat
> > >
> > > Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz
> > await r_await w_await  svctm  %util
> > > sda               0.00    77.00    0.00    5.50     0.00   338.00   122.91     0.05   11.64
> > 0.00   11.64   8.73   4.80
> > > sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00
> > 0.00    0.00   0.00   0.00
> > > sdc               0.00     0.00    0.00  151.50     0.00   302.00     3.99     0.54    3.59
> > 0.00    3.59   3.58  54.20
> > > sdd               0.00     0.00    0.00 1474.00     0.00  4008.00     5.44     0.09    0.06
> > 0.00    0.06   0.06   9.20
> > > sde               0.00     0.00    0.00  217.00     0.00   434.00     4.00     0.91    4.20
> > 0.00    4.20   4.20  91.20
> > > sdf               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00
> > 0.00    0.00   0.00   0.00
> > > sdg               0.00     0.00    0.00   66.50     0.00   134.00     4.03     0.18    2.68
> > 0.00    2.68   2.68  17.80
> > > sdh               0.00     0.00    0.00  217.00     0.00   434.00     4.00     0.80    3.71
> > 0.00    3.71   3.71  80.40
> > > sdi               0.00     0.00    0.00 1134.00     0.00  3082.00     5.44     0.09    0.08
> > 0.00    0.08   0.07   8.40
> > >
> > > -----Original Message-----
> > > From: Sage Weil [mailto:sweil@xxxxxxxxxx]
> > > Sent: 01 February 2017 15:34
> > > To: nick@xxxxxxxxxx
> > > Cc: ceph-devel@xxxxxxxxxxxxxxx
> > > Subject: bluestore prefer wal size
> > >
> > > Hey Nick,
> > >
> > > I've updated/improved the prefer wal size PR (that sends small writes
> > > through the wal).  See
> > >
> > > 	https://github.com/ceph/ceph/pull/13217
> > >
> > > if you want to try it out.
> > >
> > > Thanks!
> > > sage
> > >
> > >
> > >
> > >
> > >
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux