Re: Blueprint: Add LevelDB support to ceph cluster backend store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Haomai,

On Wed, 31 Jul 2013, Haomai Wang wrote:
> Every node of ceph cluster has a backend filesystem such as btrfs,
> xfs and ext4 that provides storage for data objects, whose location
> are determined by CRUSH algorithm. There should exists an abstract
> interface sitting between osd and backend store, allowing different
> backend store implementation. Currently, we only have general 
> POSIX interface. LevelDB is a fast key-value storage library written at 
> Google that provides an ordered mapping from string keys to string 
> values. We could implement a LevelDB backend to support base 
> operations correspond to POSIX operations.  LevelDB driver enables 
> gateway to communicate with LevelDB to store objects on the node 
> basis.
> 
> 
> LevelDB driver is attractive by the folks who own a special use case 
> such as a write-heave system. If we can abstract a general interface, 
> we can choose other DBM if you find it more suitable, such as Kyoto 
> Cabinet, BDB. Futhermore, we can choose backen store for each OSD
> node. So we have different OSD type for special purpose.
> 
> Expected Results: Objects can be stored reliably to LevelDB. The IO 
> performance and recovery process can be comparable to original 
> stores. And for special case, LevelDB driver should have much better 
> performance than local filesystem backend driver. The snapshot and
> any features you think of are optional.

I added a comment in the wiki, but I'll reply here.

Much of what you're talking about is already in place:

 - There is an ObjectStore.h abstraction of the local storage.  The only 
   up to date implementation is FileStore, which uses a combination 
   of a local file system and leveldb, but other backends have been used 
   in the past, and new ones can we easily added in.

 - We currently use leveldb for the 'omap' component of rados objects.  
   That is, each rados object has a bytestream portion (like a file), 
   attr (like extended attributes), and an omap (keys/values).  All of 
   none of those interfaces can be used for any given object, although 
   most users only use one interface at a time.  The main limitation here 
   if you want to use leveldb only is that we still have an inode in the 
   file system to represent each object, even when it contains only 
   key/value pairs.

 - The use of leveldb itself is also well abstracted by a KeyValueDB 
   interface, so other key/value libraries could be swapped in in its 
   place.  The main other component is a middle layer that wraps the kv 
   store to provide copy-on-write type semantics for each object's set of 
   keys (to facilitate the snapshot functionality in rados/ceph).

If you have a workload that you want to be purgely key/value based, it 
would be possible to write a much simpler ObjectStore implementation that 
ignores or trivially implements the byte and attr portions of the object 
in leveldb (or the KeyValueDB abstraction).  It would have very different 
performance characteristics than what we're doing now, of course.  You 
might also be interested in looking at the HyperLevelDB project, which is 
a fork of leveldb that focuses on multithreading and compaction 
performance.

We've heard from other people who are interested in wiring different 
key/value backends into the OSD, so any work to make it easier to do that 
would be great!

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux