Re: bluestore high latency in hdd and ssd mixed scene

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 9 Dec 2018, Ning Yao wrote:
> Still I wonder can we flush the deferred io to the bdev asynchronously
> out of the function  _kv_sync_thread()?
> https://github.com/ceph/ceph/blob/master/src/os/bluestore/BlueStore.cc#L10071
> 
> The reason might not do it hope to delete the wal kv pair in bluefs as
> soon as possible so that the wal would not compact to the other level
> (such as level 1 or 2)? Otherwise  asynchronously flush the dirty data
> would be much more effiecient. anyone can explain that?

I suspect it's less a matter of which thread triggers it and more what the 
tunables are set for.  We already have things tunes for SSDs so that 
deferred ops get flushed more quickly, while on HDDs we do it more slowly 
to maximize batching (and we force small writes to be deferred).  This 
has a big impact on write latency on HDDs... and the extra compaction is, 
I suspect, less of an issue there. But it would be interesting to see some 
benchmarks that compares with different values of

bluestore_deferred_batch_ops{,_hdd,_ssd}
bluestore_prefer_deferred_size{,_hdd,_ssd}

> Ning Yao <zay11022@xxxxxxxxx> 于2018年7月18日周三 下午6:26写道:
> >
> > Hi, Sage
> >
> > Currently, we find that separate the block.db and wal to SSD cannot
> > significantly achieve the performance improvement. Especially for
> > those deferred IO, the main reason is caused by the force_flush in
> > _kv_sync_thread.
> > In _kv_sync_thread, it wants to flush all previous committed IO to the
> > hdd disk and then cleanup the wal.  The main problem is that it is a
> > single synchronize _kv_sync_thread to commit all IOs waited to be
> > committed, so lots of IO need wait the force_flush and, the client
> > latency is quite poor under this situation, which also affects the
> > burst IO like FileStore with SSD Journal.
> > I wonder if it is possible to make the force_flush and wal cleanup
> > process asynchronously so that we can achieve the burst write IO with
> > low/stable latency? I remember, at the beginning, it is asynchronous,
> > right?

Do you mean initiating the flush sooner bu tblocking in the same place?  
That might be possible, but we need to be very careful of ordering since 
this is about data safety.

The other option is to make more writes deferred, which means we would 
be able to skip the main device flush entirely.  That would only happen in 
cases where there are no large IOs on the large device, though, so.. 
not super reliable.  It might hlpe the average though, at the expense of 
more kv traffic, compaction, etc.

sage




> > Regards
> > Ning Yao
> >
> >
> > 2018-07-14 9:46 GMT+08:00 shuhao li <letterwuyu@xxxxxxxxx>:
> > > Hi all,
> > >
> > > I tested the performance of bluestore in a mixed scene of hdd and ssd.
> > > There are three nodes here, each node has 1 mon and 1 mgr 12 osd,
> > > Each osd has 1.7T HDD for block, 150G SSD partition for db and wal, 10Gib bond.
> > >
> > > system: CentOS Linux (3.10.0-862.3.2.el7.x86_64) 7 (Core)
> > > ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
> > >
> > > I tested that the performance is very poor and the delay is hug.
> > > I found that _kv_sync_thread force_flush took a long time to block the kv commit.
> > > so 4krandwrite 32dp IOPS=2469, BW=9878KiB/s lat 12955.49 us.
> > >
> > > I have some thinking about mixed scene of hdd and ssd:
> > > Hdd fdatasync is slow,Can we avoid fdatasync on the wite path(_kv_sync_thread)?
> > > batch sync to hdd in the background,For deferred_write can use wal for caching,
> > > Is it possible to do a cache on ssd for simple write?
> > >
> > > Have we had a discussion about the built-in ssd cache for bluestore?--
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux