On Sat, 21 Dec 2013, Haomai Wang wrote: > On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > > > On Thu, 12 Dec 2013, Haomai Wang wrote: > >> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > >>> [adding cc ceph-devel] > > > > [attempt 2] > > > >>> > >>> On Wed, 11 Dec 2013, Haomai Wang wrote: > >>>> Hi Sage, > >>>> > >>>> Since last CDS, you have pointed jobs see below: > >>>> > >>>> ============================ > >>>> 2. DBObjectMap: refactor interface > >>>> 1. expose underlying KeyValueDB transactions to caller, so they > >>>> can bundle several DBObjectMap ops together and capture an entire > >>>> ObjectStore::Transaction's worth of work) > >>>> 2.expose the user prefixes in a generic way, instead of > >>>> hard-coding in the omap, xattr, and various internal namespaces > >>>> > >>>> 3. stripe file data over keys > >>>> 1. Build a class that will implement a file data interface (read > >>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap > >>>> 2. stripe data over keys of size X (e.g., 1MB, which seems to be > >>>> the limit people are converging around) > >>>> 3. store file size information in a metadata key. maybe this can > >>>> be DBObjectMap::Header; maybe not > >>>> 4. contemplate future optimizations that put small objects > >>>> "inline" in the Header (or equivalent) key > >>>> ============================ > >>>> > >>>> I'm interested to implement it and I don't know whether you or others > >>>> started to do it. Now I want to describe my idea. > >>> > >>> Nobody is working on this just yet, although there is a lot of interest in > >>> this area so your timing is very good! > >>> > >>>> According to your comments, I think about implementing strip file data > >>>> over keys in KeyValueStore class. Add a field called "userdata" to > >>>> DBObjectMap::Header which is explained by caller such as > >>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for > >>>> "userdata" field. So KeyValueStore will make use of "userdata" to > >>>> manage stripped layer. Maybe a metadata table to map offset->key_name. > >>> > >>> Yes. My original thought is to make the DBObjectMap type fields a bit > >>> more general (instead of the hard-coded #defines), but I don't think it > >>> matters too much. > >>> > >>> For the metadata table, yes eventually.. but I would keep it simple for > >>> the first pass and iterate from there. > >>> > >>>> Although DBObjectMap already implement clone operation on > >>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent > >>>> which will cause dependent lookup chain resulting to performance > >>>> degrade just like librbd. And I suspect that if using the current > >>>> DBObjectMap methods to manage cloned objects, it may occur performance > >>>> problems. So DBObjectMap need to expose pure KeyValueDB interfaces > >>>> called by KeyValueStore to store stripped keys which is controlled by > >>>> a metadata table mentioned above. Others such as xattr and omap > >>>> namespace won't be destroyed. Clone operation will be implemented via > >>>> DBObjectMap::clone, actual object data won't be changed and only > >>>> metadata table referenced to "userdata" will be copied. Any write > >>>> operation will be redirected to new key. In other word, it may looks > >>>> like librbd did, but here we implement it in ROW not COW. > >>>> > >>>> The reason to design like above contains: > >>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is > >>>> used by FileStore which will limit big changes > >>> > >>> Yes; we need to be a bit careful here. I'm hoping the main changes though > >>> are really just moving the transaction create and submit boilerplate in > >>> each method into the FileStore callers? > >> > >> In my mind, I don't want to change the caller codes such as FileStore. > >> It works well now. ;-) > > > > True. We can also just make a second layer of methods (_foo() instead of > > foo() or someting) that take the transaction as an argument. > > > > Or just fork DBObjectMap entirely so that we don't need to worry about > > breaking FileStore ondisk compatibility; we will likely want/need to do > > something like that eventually anyway! > > I'm confusing by "_remove" interface in FileStore that doesn't remove omap > keys with corresponding object. And I try to dump transaction what > "rados rm object -p data" doing, actually no delete operations with omap keys. > > So I'm wonder that it's the proper we don't remove omap keys? And I notice > MemStore did omap erase operation: > c->object_map.erase(oid); > c->object_hash.erase(oid); FileStore::_remove() calls lfn_unlink(), which calls object_map->clear(...) (if nlink == 0). I think that's what you're looking for? sage > > > > > sage > > > >>> > >>>> 2. Read/Write object is a more frequenter operation which different > >>>> from OMap or xattr operations, we need more special handler now or > >>>> future to optimize. > >>>> 3. Different kv backend may have different features just like > >>>> FileSystemBackend, we would like to deal with these at KeyValueStore > >>>> not DBObjectMap or upper class. > >>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things. > >>> > >>> I'm not fully following this description, but it sounds like you're > >>> thinking about the right issues. A few comments: > >>> > >>> - In the ideal case, we'd like to minimize the number of lookups/keys we > >>> query to access an object. This is a bit less important for objects that > >>> are cloned (they tend to be snapshots... mostly). > >>> > >>> - I think it makes sense to make the main header key for an object be able > >>> to embed various bits of useful data, like > >>> > >>> - all of the xattrs, if there aren't many of them > >>> - the file size > >>> - the file content, if it is small > >>> > >>> No need for this in the initial implementation, but we should design > >>> something that can accomodate it. > >>> > >>> - It would be nice to capture the striping CRUD stuff in a separate class; > >>> a child of DBObjectMap or something similar. This will make it easy to > >>> swap out and/or experiment with different approaches. > >>> > >>>> So in this proposal, DBObjectMap will serve as a bridge in the front > >>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store > >>>> stripped object and DBObjectMap::Header to store metadata. If so, my > >>>> previous implementation could be fully make use of. :-) > >>> > >>> That's great news! Let me know if there is anything we can do to help > >>> here. > >>> > >>> sage > >> > >> Thanks for your comments! > >> > >> > >> -- > >> Best Regards, > >> > >> Wheat > > Best regards, > Wheats > > > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html