Regarding key/value interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 12 Sep 2014, Somnath Roy wrote:
> Thanks Sage...
> Basically, we are doing similar chunking in our current implementation which is derived from objectstore. 
> Moving to Key/value will save us from that :-)
> Also, I was thinking, we may want to do compression (later may be dedupe ?) on that Key/value layer as well.
> 
> Yes, partial read/write is definitely performance killer for object stores and our objectstore is no exception. We need to see how we can counter that.
> 
> But, I think these are enough reason for me now to move our implementation to the key/value interfaces. 

Sounds good.

By the way, hopefully this is a pretty painless process of wrapping your 
kv library with the KeyValueDB interface.  If not, that will be good to 
know.  I'm hoping it will fit well with a broad range of backends, but so 
far we've only done leveldb/rocksdb (same interface) and kinetic.  I'd 
like to see us try LMDB in this context as well...

sage

> 
> Regards
> Somnath
> 
> 
> -----Original Message-----
> From: Sage Weil [mailto:sweil at redhat.com] 
> Sent: Thursday, September 11, 2014 6:55 PM
> To: Somnath Roy
> Cc: Haomai Wang (haomaiwang at gmail.com); ceph-users at lists.ceph.com; ceph-devel at vger.kernel.org
> Subject: RE: Regarding key/value interface
> 
> On Fri, 12 Sep 2014, Somnath Roy wrote:
> > Make perfect sense Sage..
> > 
> > Regarding striping of filedata, You are saying KeyValue interface will do the following for me?
> > 
> > 1. Say in case of rbd image of order 4 MB, a write request coming to Key/Value interface, it will  chunk the object (say full 4MB) in smaller sizes (configurable ?) and stripe it as multiple key/value pair ?
> > 
> > 2. Also, while reading it will take care of accumulating and send it back.
> 
> Precisely.
> 
> A smarter thing we might want to make it do in the future would be to take a 4 KB write create a new key that logically overwrites part of the larger, say, 1MB key, and apply it on read.  And maybe give up and rewrite the entire 1MB stripe after too many small overwrites have accumulated.  
> Something along those lines to reduce the cost of small IOs to large objects.
> 
> sage
> 
> 
> 
>  > 
> > Thanks & Regards
> > Somnath
> > 
> > 
> > -----Original Message-----
> > From: Sage Weil [mailto:sweil at redhat.com]
> > Sent: Thursday, September 11, 2014 6:31 PM
> > To: Somnath Roy
> > Cc: Haomai Wang (haomaiwang at gmail.com); ceph-users at lists.ceph.com; 
> > ceph-devel at vger.kernel.org
> > Subject: Re: Regarding key/value interface
> > 
> > Hi Somnath,
> > 
> > On Fri, 12 Sep 2014, Somnath Roy wrote:
> > >
> > > Hi Sage/Haomai,
> > >
> > > If I have a key/value backend that support transaction, range 
> > > queries (and I don?t need any explicit caching etc.) and I want to 
> > > replace filestore (and leveldb omap) with that,  which interface you 
> > > recommend me to derive from , directly ObjectStore or  KeyValueDB ?
> > >
> > > I have already integrated this backend by deriving from ObjectStore 
> > > interfaces earlier (pre keyvalueinteface days) but not tested 
> > > thoroughly enough to see what functionality is broken (Basic 
> > > functionalities of RGW/RBD are working fine).
> > >
> > > Basically, I want to know what are the advantages (and 
> > > disadvantages) of deriving it from the new key/value interfaces ?
> > >
> > > Also, what state is it in ? Is it feature complete and supporting 
> > > all the ObjectStore interfaces like clone and all ?
> > 
> > Everything is supported, I think, for perhaps some IO hints that don't make sense in a k/v context.  The big things that you get by using KeyValueStore and plugging into the lower-level interface are:
> > 
> >  - striping of file data across keys
> >  - efficient clone
> >  - a zillion smaller methods that aren't conceptually difficult to implement bug tedious and to do so.
> > 
> > The other nice thing about reusing this code is that you can use a leveldb or rocksdb backend as a reference for testing or performance or whatever.
> > 
> > The main thing that will be a challenge going forward, I predict, is making storage of the object byte payload in key/value pairs efficient.  I think KeyValuestore is doing some simple striping, but it will suffer for small overwrites (like 512-byte or 4k writes from an RBD).  There are probably some pretty simple heuristics and tricks that can be done to mitigate the most common patterns, but there is no simple solution since the backends generally don't support partial value updates (I assume yours doesn't either?).  But, any work done here will benefit the other backends too so that would be a win..
> > 
> > sage
> > 
> > ________________________________
> > 
> > PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux