Re: Write performance issue under rocksdb kvstore

Z Zhang <zhangz.david@xxxxxxxxxxx> · Tue, 20 Oct 2015 21:45:59 +0800

Haimao, you're right. I add such sync option as configurable for our test purpose.

Thanks.Zhi Zhang (David)

> Date: Tue, 20 Oct 2015 21:24:49 +0800
> From: haomaiwang@xxxxxxxxx
> To: zhangz.david@xxxxxxxxxxx
> CC: ceph-users@xxxxxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx
> Subject: Re: [ceph-users] Write performance issue under rocksdb kvstore
> 
> Actually keyvaluestore would submit transaction with sync flag
> too(rely to keyvaluedb impl journal/logfile).
> 
> Yes, if we disable sync flag, keyvaluestore's performance will
> increase a lot. But we dont provide with this option now
> 
> On Tue, Oct 20, 2015 at 9:22 PM, Z Zhang <zhangz.david@xxxxxxxxxxx> wrote:
> > Thanks, Sage, for pointing out the PR and ceph branch. I will take a closer
> > look. Yes, I am trying KVStore backend. The reason we are trying it is that
> > few user doesn't have such high requirement on data loss occasionally. It
> > seems KVStore backend without synchronized WAL could achieve better
> > performance than filestore. And only data still in page cache would get lost
> > on machine crashing, not process crashing, if we use WAL but no
> > synchronization. What do you think? ? ? Thanks. Zhi Zhang (David) Date: Tue,
> > 20 Oct 2015 05:47:44 -0700 From: sage@xxxxxxxxxxxx To:
> > zhangz.david@xxxxxxxxxxx CC: ceph-users@xxxxxxxxxxxxxx;
> > ceph-devel@xxxxxxxxxxxxxxx Subject: Re: [ceph-users] Write performance issue
> > under rocksdb kvstore On Tue, 20 Oct 2015, Z Zhang wrote: > Hi Guys, > > I
> > am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with > rocksdb
> > 3.11 as OSD backend. I use rbd to test performance and following > is my
> > cluster info. > > [ceph@xxx ~]$ ceph -s > ? ? cluster
> > b74f3944-d77f-4401-a531-fa5282995808 > ? ? ?health HEALTH_OK > ? ? ?monmap
> > e1: 1 mons at {xxx=xxx.xxx.xxx.xxx:6789/0} > ? ? ? ? ? ? election epoch 1,
> > quorum 0 xxx > ? ? ?osdmap e338: 44 osds: 44 up, 44 in > ? ? ? ? ? ? flags
> > sortbitwise > ? ? ? pgmap v1476: 2048 pgs, 1 pools, 158 MB data, 59 objects
> >> ? ? ? ? ? ? 1940 MB used, 81930 GB / 81932 GB avail > ? ? ? ? ? ? ? ? 2048
> > active+clean > > All the disks are spinning ones with write cache turning
> > on. Rocksdb's > WAL and sst files are on the same disk as every OSD. Are you
> > using the KeyValueStore backend? > Using fio to generate following write
> > load:? > fio -direct=1 -rw=randwrite -ioengine=sync -size=10M -bs=4K
> > -group_reporting -directory /mnt/rbd_test/ -name xxx.1 -numjobs=1?? > > Test
> > result: > WAL enabled + sync: false + disk write cache: on ?will get ~700
> > IOPS. > WAL enabled + sync: true (default) + disk write cache: on|off ?will
> > get only ~25 IOPS. > > I tuned some other rocksdb options, but with no lock.
> > The wip-newstore-frags branch sets some defaults for rocksdb that I think
> > look pretty reasonable (at least given how newstore is using rocksdb). > I
> > tracked down the rocksdb code and found each writer's Sync operation > would
> > take ~30ms to finish. And as shown above, it is strange that > performance
> > has no much difference no matters disk write cache is on or > off. > > Do
> > your guys encounter the similar issue? Or do I miss something to > cause
> > rocksdb's poor write performance? Yes, I saw the same thing. This PR
> > addresses the problem and is nearing merge upstream:
> > https://github.com/facebook/rocksdb/pull/746 There is also an XFS
> > performance bug that is contributing to the problem, but it looks like Dave
> > Chinner just put together a fix for that. But... we likely won't be using
> > KeyValueStore in its current form over rocksdb (or any other kv backend). It
> > stripes object data over key/value pairs, which IMO is not the best
> > approach. sage _______________________________________________ ceph-users
> > mailing list ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 
> -- 
> Best Regards,
> 
> Wheat
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com