Re: Write performance issue under rocksdb kvstore

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 20 Oct 2015 05:47:44 -0700 (PDT)

On Tue, 20 Oct 2015, Z Zhang wrote:
> Hi Guys,
> 
> I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with 
> rocksdb 3.11 as OSD backend. I use rbd to test performance and following 
> is my cluster info.
> 
> [ceph@xxx ~]$ ceph -s
>     cluster b74f3944-d77f-4401-a531-fa5282995808
>      health HEALTH_OK
>      monmap e1: 1 mons at {xxx=xxx.xxx.xxx.xxx:6789/0}
>             election epoch 1, quorum 0 xxx
>      osdmap e338: 44 osds: 44 up, 44 in
>             flags sortbitwise
>       pgmap v1476: 2048 pgs, 1 pools, 158 MB data, 59 objects
>             1940 MB used, 81930 GB / 81932 GB avail
>                 2048 active+clean
> 
> All the disks are spinning ones with write cache turning on. Rocksdb's 
> WAL and sst files are on the same disk as every OSD.

Are you using the KeyValueStore backend?

> Using fio to generate following write load: 
> fio -direct=1 -rw=randwrite -ioengine=sync -size=10M -bs=4K -group_reporting -directory /mnt/rbd_test/ -name xxx.1 -numjobs=1  
> 
> Test result:
> WAL enabled + sync: false + disk write cache: on  will get ~700 IOPS.
> WAL enabled + sync: true (default) + disk write cache: on|off  will get only ~25 IOPS.
> 
> I tuned some other rocksdb options, but with no lock.

The wip-newstore-frags branch sets some defaults for rocksdb that I think 
look pretty reasonable (at least given how newstore is using rocksdb).

> I tracked down the rocksdb code and found each writer's Sync operation 
> would take ~30ms to finish. And as shown above, it is strange that 
> performance has no much difference no matters disk write cache is on or 
> off.
> 
> Do your guys encounter the similar issue? Or do I miss something to 
> cause rocksdb's poor write performance?

Yes, I saw the same thing.  This PR addresses the problem and is nearing 
merge upstream:

	https://github.com/facebook/rocksdb/pull/746

There is also an XFS performance bug that is contributing to the problem, 
but it looks like Dave Chinner just put together a fix for that.

But... we likely won't be using KeyValueStore in its current form over 
rocksdb (or any other kv backend).  It stripes object data over key/value 
pairs, which IMO is not the best approach.

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com