Re: Mon backing store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/05/2014 12:42 PM, Samuel Just wrote:
I am starting to wonder whether using leveldb for the mon is actually
introducing an excessive amount unnecessary complexity and
non-determinism.  Given that the monitor workload is read mostly,
except for failure conditions when it becomes write latency sensitive,
might we do better with a strict b-tree style backing db such as
berkely db even at the cost of some performance?  It seems like
something like that might provide more reliable latency properties.

I'm not against trying it, but I'm not convinced it's the right solution. If the 99th percentile latency is significantly better, that's obviously a win, but I think we are indeed going to take a big performance hit overall. I'm more in favor of trying rocksdb first. I'm certainly not as well versed in the leveldb interface as you or Joao are, but it appears much of our code in LevelDBStore would be reusable. I don't know that rocksdb won't have the same issues that leveldb does, but the rocksdb developers specifically mention leveldb's bad 99th percentile latencies as a driver for it's development:

"By contrast, we’ve published the RocksDB benchmark results for server side workloads on Flash. We also measured the performance of LevelDB on these server-workload benchmarks and found that RocksDB solidly outperforms LevelDB for these IO bound workloads. We found that LevelDB’s single-threaded compaction process was insufficient to drive server workloads. We saw frequent write-stalls with LevelDB that caused 99-percentile latency to be tremendously large. We found that mmap-ing a file into the OS cache introduced performance bottlenecks for reads. We could not make LevelDB consume all the IOs offered by the underlying Flash storage."

Compaction performance and high mmap/page fault/kswapd utilization during reads are two big issues we've hit in leveldb, so I'm inclined to think that rocksdb is at least worthy of some attention.

Here's the benchmark results on their wiki:

https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks


Thoughts?
-Sam
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux