Hi Loic, Yes, db is designed to optimize the workloads on flash backends and uses only standard interfaces and system calls to achieve that. Varada -----Original Message----- From: Loic Dachary [mailto:loic@xxxxxxxxxxx] Sent: Tuesday, February 24, 2015 9:57 PM To: Somnath Roy; Varada Kari; Ceph Development Subject: Re: Adding a proprietary key value store to CEPH Hi, On 24/02/2015 17:13, Somnath Roy wrote:> Hi Loic, > This is an effort to make ceph interface pluggable to any proprietary k/v db available. The integrator has to implement a shim layer (dynamically loadable) by implementing these interfaces. That shim layer can do specific job for the k/v db of theirs. > Now, regarding our k/v db, yes, it is written keeping in mind that backend will be flash not HDD. This is the major difference between leveldb/rocksdb etc. Our db reduces the flash WA dramatically and the performance also should be similar or better than rocksdb. > Also, I think there should more of this proprietary dbs that people want to integrate with Ceph as I don't think leveldb/rocksdb will not be able to serve all kind of workload. Thanks for sharing these details :-) Would this db be specific to a line of product, for instance by making ioctl calls that only a specific driver for a specific hardware would understand ? Or is this a db that is designed to optimize workloads for flash drives using only standard and documented API or system calls ? > Thanks & Regards > Somnath > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Loic Dachary > Sent: Tuesday, February 24, 2015 6:30 AM > To: Varada Kari; Ceph Development > Subject: Re: Adding a proprietary key value store to CEPH > > Hi, > > I'm curious about the reasons why the key/value store you mention is not published as Free Software. Is it because it implements a proprietary interface to a specific hardware ? Because it has additional functionalities comparied to rocksdb etc. ? Because it performs better under some workloads ? > > Cheers > > On 24/02/2015 14:20, Varada Kari wrote: >> Hi Sage, >> >> We are trying to integrate a new proprietary key value store to CEPH. To integrate this KV-store, which is a closed source shared library, we propose a new class to CEPH called PropDBStore which does a dlopen and imports the required symbols. This framework will help in integrating vendor specific extensions to CEPH. >> >> The gist of the implementation is as follows. >> >> 1. Implement a wrapper around the proprietary KVStore. Let us call it as KVExtension. This is a shared library which implements all interfaces required by CEPH KeyValueStore. >> 2. A new class is derived from KeyValueDB called PropDBStore, which honors the semantics of KeyvalueStore and KeyValueDB. This class acts as mediator between CEPH and KVExtension. This class transforms bufferlist etc... to const char pointers or strings for the extension to understand. >> 3. PropDBStore, loads (dlopen) the KVExtension during OSD initialization. Path to the KVExtension can be mentioned in ceph.conf. >> 4. Interfaces that needs to be implemented in KVExtension, which are imported by the PropDBStore are added in a new header called PropDBWrapper.h. This header contains the signatures for the necessary interfaces like init(), close(), submit_transaction(), get() and get_iterator(). Similarly for Iterator functionality, PropDBIterator.h, which specifies the signatures of seek_to_first (), seek_to_last(), lower_bound() and upper_bound() etc... PropDBStore includes these headers to import the symbols, using dlsym(). >> 5. Choosing the proprietary DB as Backend to the OSD is controlled/managed by config options of the ceph (/etc/ceph/ceph.conf) like rocksdb or leveldb. >> 6. Rest of the existing functionality is not disturbed by this change. Changing the osd backend option will change backend implementation. But this change is not dynamic. The type of the backend should be chosen at osd creation time and osd will continue use that backend till that osd is reformatted again. >> 7. The new KVStore we are trying to integrate works on a raw partition, so we divided the osd drive into two partitions. One partition is given to osd Meta data (super block, fsid etc...), and the other is given to the new db to manage it. OSD partition is now not the entire disk, but 2-4GB which needed for the metadata. >> >> Please share your thoughts around this. >> Thanks, >> Varada >> >> >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies). >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" >> in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo >> info at http://vger.kernel.org/majordomo-info.html >> > > -- > Loïc Dachary, Artisan Logiciel Libre > -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html