On Dec 22, 2013, at 1:20 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Sat, 21 Dec 2013, Haomai Wang wrote: >> On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: >> >>> On Thu, 12 Dec 2013, Haomai Wang wrote: >>>> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: >>>>> [adding cc ceph-devel] >>> >>> [attempt 2] >>> >>>>> >>>>> On Wed, 11 Dec 2013, Haomai Wang wrote: >>>>>> Hi Sage, >>>>>> >>>>>> Since last CDS, you have pointed jobs see below: >>>>>> >>>>>> ============================ >>>>>> 2. DBObjectMap: refactor interface >>>>>> 1. expose underlying KeyValueDB transactions to caller, so they >>>>>> can bundle several DBObjectMap ops together and capture an entire >>>>>> ObjectStore::Transaction's worth of work) >>>>>> 2.expose the user prefixes in a generic way, instead of >>>>>> hard-coding in the omap, xattr, and various internal namespaces >>>>>> >>>>>> 3. stripe file data over keys >>>>>> 1. Build a class that will implement a file data interface (read >>>>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap >>>>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be >>>>>> the limit people are converging around) >>>>>> 3. store file size information in a metadata key. maybe this can >>>>>> be DBObjectMap::Header; maybe not >>>>>> 4. contemplate future optimizations that put small objects >>>>>> "inline" in the Header (or equivalent) key >>>>>> ============================ >>>>>> >>>>>> I'm interested to implement it and I don't know whether you or others >>>>>> started to do it. Now I want to describe my idea. >>>>> >>>>> Nobody is working on this just yet, although there is a lot of interest in >>>>> this area so your timing is very good! >>>>> >>>>>> According to your comments, I think about implementing strip file data >>>>>> over keys in KeyValueStore class. Add a field called "userdata" to >>>>>> DBObjectMap::Header which is explained by caller such as >>>>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for >>>>>> "userdata" field. So KeyValueStore will make use of "userdata" to >>>>>> manage stripped layer. Maybe a metadata table to map offset->key_name. >>>>> >>>>> Yes. My original thought is to make the DBObjectMap type fields a bit >>>>> more general (instead of the hard-coded #defines), but I don't think it >>>>> matters too much. >>>>> >>>>> For the metadata table, yes eventually.. but I would keep it simple for >>>>> the first pass and iterate from there. >>>>> >>>>>> Although DBObjectMap already implement clone operation on >>>>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent >>>>>> which will cause dependent lookup chain resulting to performance >>>>>> degrade just like librbd. And I suspect that if using the current >>>>>> DBObjectMap methods to manage cloned objects, it may occur performance >>>>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces >>>>>> called by KeyValueStore to store stripped keys which is controlled by >>>>>> a metadata table mentioned above. Others such as xattr and omap >>>>>> namespace won't be destroyed. Clone operation will be implemented via >>>>>> DBObjectMap::clone, actual object data won't be changed and only >>>>>> metadata table referenced to "userdata" will be copied. Any write >>>>>> operation will be redirected to new key. In other word, it may looks >>>>>> like librbd did, but here we implement it in ROW not COW. >>>>>> >>>>>> The reason to design like above contains: >>>>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is >>>>>> used by FileStore which will limit big changes >>>>> >>>>> Yes; we need to be a bit careful here. I'm hoping the main changes though >>>>> are really just moving the transaction create and submit boilerplate in >>>>> each method into the FileStore callers? >>>> >>>> In my mind, I don't want to change the caller codes such as FileStore. >>>> It works well now. ;-) >>> >>> True. We can also just make a second layer of methods (_foo() instead of >>> foo() or someting) that take the transaction as an argument. >>> >>> Or just fork DBObjectMap entirely so that we don't need to worry about >>> breaking FileStore ondisk compatibility; we will likely want/need to do >>> something like that eventually anyway! >> >> I'm confusing by "_remove" interface in FileStore that doesn't remove omap >> keys with corresponding object. And I try to dump transaction what >> "rados rm object -p data" doing, actually no delete operations with omap keys. >> >> So I'm wonder that it's the proper we don't remove omap keys? And I notice >> MemStore did omap erase operation: >> c->object_map.erase(oid); >> c->object_hash.erase(oid); > > FileStore::_remove() calls lfn_unlink(), which calls > object_map->clear(...) (if nlink == 0). > > I think that's what you're looking for? OH, it seemed that I missing it previously. Thank you. > > sage > > >> >>> >>> sage >>> >>>>> >>>>>> 2. Read/Write object is a more frequenter operation which different >>>>>> from OMap or xattr operations, we need more special handler now or >>>>>> future to optimize. >>>>>> 3. Different kv backend may have different features just like >>>>>> FileSystemBackend, we would like to deal with these at KeyValueStore >>>>>> not DBObjectMap or upper class. >>>>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. >>>>> >>>>> I'm not fully following this description, but it sounds like you're >>>>> thinking about the right issues. A few comments: >>>>> >>>>> - In the ideal case, we'd like to minimize the number of lookups/keys we >>>>> query to access an object. This is a bit less important for objects that >>>>> are cloned (they tend to be snapshots... mostly). >>>>> >>>>> - I think it makes sense to make the main header key for an object be able >>>>> to embed various bits of useful data, like >>>>> >>>>> - all of the xattrs, if there aren't many of them >>>>> - the file size >>>>> - the file content, if it is small >>>>> >>>>> No need for this in the initial implementation, but we should design >>>>> something that can accomodate it. >>>>> >>>>> - It would be nice to capture the striping CRUD stuff in a separate class; >>>>> a child of DBObjectMap or something similar. This will make it easy to >>>>> swap out and/or experiment with different approaches. >>>>> >>>>>> So in this proposal, DBObjectMap will serve as a bridge in the front >>>>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store >>>>>> stripped object and DBObjectMap::Header to store metadata. If so, my >>>>>> previous implementation could be fully make use of. :-) >>>>> >>>>> That's great news! Let me know if there is anything we can do to help >>>>> here. >>>>> >>>>> sage >>>> >>>> Thanks for your comments! >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Wheat >> >> Best regards, >> Wheats Best regards, Wheats -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html