Re: Write performance issue under rocksdb kvstore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Sage, for pointing out the PR and ceph branch. I will take a closer look. Yes, I am trying KVStore backend. The reason we are trying it is that few user doesn't have such high requirement on data loss occasionally. It seems KVStore backend without synchronized WAL could achieve better performance than filestore. And only data still in page cache would get lost on machine crashing, not process crashing, if we use WAL but no synchronization. What do you think? ? ? Thanks. Zhi Zhang (David) Date: Tue, 20 Oct 2015 05:47:44 -0700 From: sage@xxxxxxxxxxxx To: zhangz.david@xxxxxxxxxxx CC: ceph-users@xxxxxxxxxxxxxx; ceph-devel@xxxxxxxxxxxxxxx Subject: Re: [ceph-users] Write performance issue under rocksdb kvstore On Tue, 20 Oct 2015, Z Zhang wrote: > Hi Guys, > > I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with > rocksdb 3.11 as OSD backend. I use rbd to test performance and following > is my cluster info. > > [ceph@xxx ~]$ ceph -s > ? ? cluster b74f3944-d77f-4401-a531-fa5282995808 > ? ? ?health HEALTH_OK > ? ? ?monmap e1: 1 mons at {xxx=xxx.xxx.xxx.xxx:6789/0} > ? ? ? ? ? ? election epoch 1, quorum 0 xxx > ? ? ?osdmap e338: 44 osds: 44 up, 44 in > ? ? ? ? ? ? flags sortbitwise > ? ? ? pgmap v1476: 2048 pgs, 1 pools, 158 MB data, 59 objects > ? ? ? ? ? ? 1940 MB used, 81930 GB / 81932 GB avail > ? ? ? ? ? ? ? ? 2048 active+clean > > All the disks are spinning ones with write cache turning on. Rocksdb's > WAL and sst files are on the same disk as every OSD. Are you using the KeyValueStore backend? > Using fio to generate following write load:? > fio -direct=1 -rw=randwrite -ioengine=sync -size=10M -bs=4K -group_reporting -directory /mnt/rbd_test/ -name xxx.1 -numjobs=1?? > > Test result: > WAL enabled + sync: false + disk write cache: on ?will get ~700 IOPS. > WAL enabled + sync: true (default) + disk write cache: on|off ?will get only ~25 IOPS. > > I tuned some other rocksdb options, but with no lock. The wip-newstore-frags branch sets some defaults for rocksdb that I think look pretty reasonable (at least given how newstore is using rocksdb). > I tracked down the rocksdb code and found each writer's Sync operation > would take ~30ms to finish. And as shown above, it is strange that > performance has no much difference no matters disk write cache is on or > off. > > Do your guys encounter the similar issue? Or do I miss something to > cause rocksdb's poor write performance? Yes, I saw the same thing. This PR addresses the problem and is nearing merge upstream: https://github.com/facebook/rocksdb/pull/746 There is also an XFS performance bug that is contributing to the problem, but it looks like Dave Chinner just put together a fix for that. But... we likely won't be using KeyValueStore in its current form over rocksdb (or any other kv backend). It stripes object data over key/value pairs, which IMO is not the best approach. sage _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux