Re: Refactor DBObjectMap Proposal

Haomai Wang <haomaiwang@xxxxxxxxx> · Sun, 22 Dec 2013 17:44:12 +0800

On Dec 22, 2013, at 2:02 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:

> 
> On Dec 22, 2013, at 1:20 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> 
>> On Sat, 21 Dec 2013, Haomai Wang wrote:
>>> On Dec 13, 2013, at 1:01 AM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>> 
>>>> On Thu, 12 Dec 2013, Haomai Wang wrote:
>>>>> On Thu, Dec 12, 2013 at 1:26 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
>>>>>> [adding cc ceph-devel]
>>>> 
>>>> [attempt 2]
>>>> 
>>>>>> 
>>>>>> On Wed, 11 Dec 2013, Haomai Wang wrote:
>>>>>>> Hi Sage,
>>>>>>> 
>>>>>>> Since last CDS, you have pointed jobs see below:
>>>>>>> 
>>>>>>> ============================
>>>>>>> 2. DBObjectMap: refactor interface
>>>>>>>  1. expose underlying KeyValueDB transactions to caller, so they
>>>>>>> can bundle several DBObjectMap ops together and capture an entire
>>>>>>> ObjectStore::Transaction's worth of work)
>>>>>>>  2.expose the user prefixes in a generic way, instead of
>>>>>>> hard-coding in the omap, xattr, and various internal namespaces
>>>>>>> 
>>>>>>> 3. stripe file data over keys
>>>>>>>  1. Build a class that will implement a file data interface (read
>>>>>>> extent, write extent, truncate, zero, etc.) on top of DBObjectMap
>>>>>>>  2. stripe data over keys of size X (e.g., 1MB, which seems to be
>>>>>>> the limit people are converging around)
>>>>>>>  3. store file size information in a metadata key.  maybe this can
>>>>>>> be DBObjectMap::Header; maybe not
>>>>>>>  4. contemplate future optimizations that put small objects
>>>>>>> "inline" in the Header (or equivalent) key
>>>>>>> ============================
>>>>>>> 
>>>>>>> I'm interested to implement it and I don't know whether you or others
>>>>>>> started to do it. Now I want to describe my idea.
>>>>>> 
>>>>>> Nobody is working on this just yet, although there is a lot of interest in
>>>>>> this area so your timing is very good!
>>>>>> 
>>>>>>> According to your comments, I think about implementing strip file data
>>>>>>> over keys in KeyValueStore class. Add a field called "userdata" to
>>>>>>> DBObjectMap::Header which is explained by caller such as
>>>>>>> KeyValueStore. Of course, we need to add CRUD operation interfaces for
>>>>>>> "userdata" field. So KeyValueStore will make use of "userdata" to
>>>>>>> manage stripped layer. Maybe a metadata table to map offset->key_name.
>>>>>> 
>>>>>> Yes.  My original thought is to make the DBObjectMap type fields a bit
>>>>>> more general (instead of the hard-coded #defines), but I don't think it
>>>>>> matters too much.
>>>>>> 
>>>>>> For the metadata table, yes eventually.. but I would keep it simple for
>>>>>> the first pass and iterate from there.
>>>>>> 
>>>>>>> Although DBObjectMap already implement clone operation on
>>>>>>> "USER_PREFIX" keys, I really don't like operations like lookup_parent
>>>>>>> which will cause dependent lookup chain resulting to performance
>>>>>>> degrade just like librbd. And I suspect that if using the current
>>>>>>> DBObjectMap methods to manage cloned objects, it may occur performance
>>>>>>> problems.  So DBObjectMap need to expose pure KeyValueDB interfaces
>>>>>>> called by KeyValueStore to store stripped keys which is controlled by
>>>>>>> a metadata table mentioned above. Others such as xattr and omap
>>>>>>> namespace won't be destroyed. Clone operation will be implemented via
>>>>>>> DBObjectMap::clone, actual object data won't be changed and only
>>>>>>> metadata table referenced to "userdata" will be copied. Any write
>>>>>>> operation will be redirected to new key. In other word, it may looks
>>>>>>> like librbd did, but here we implement it in ROW not COW.
>>>>>>> 
>>>>>>> The reason to design like above contains:
>>>>>>> 1. Export more works to KeyValueStore not DBObjectMap, DBObjectMap is
>>>>>>> used by FileStore which will limit big changes
>>>>>> 
>>>>>> Yes; we need to be a bit careful here.  I'm hoping the main changes though
>>>>>> are really just moving the transaction create and submit boilerplate in
>>>>>> each method into the FileStore callers?
>>>>> 
>>>>> In my mind, I don't want to change the caller codes such as FileStore.
>>>>> It works well now. ;-)
>>>> 
>>>> True.  We can also just make a second layer of methods (_foo() instead of 
>>>> foo() or someting) that take the transaction as an argument.
>>>> 
>>>> Or just fork DBObjectMap entirely so that we don't need to worry about 
>>>> breaking FileStore ondisk compatibility; we will likely want/need to do 
>>>> something like that eventually anyway!
>>> 
>>> I'm confusing by "_remove" interface in FileStore that doesn't remove omap
>>> keys with corresponding object. And I try to dump transaction what
>>> "rados rm object -p data" doing, actually no delete operations with omap keys.
>>> 
>>> So I'm wonder that it's the proper we don't remove omap keys? And I notice
>>> MemStore did omap erase operation:
>>> c->object_map.erase(oid);
>>> c->object_hash.erase(oid);
>> 
>> FileStore::_remove() calls lfn_unlink(), which calls 
>> object_map->clear(...) (if nlink == 0).
>> 
>> I think that's what you're looking for?
> 
> OH, it seemed that I missing it previously. Thank you.
> 
>> 
>> sage
>> 
>> 
>>> 
>>>> 
>>>> sage
>>>> 
>>>>>> 
>>>>>>> 2. Read/Write object is a more frequenter operation which different
>>>>>>> from OMap or xattr operations, we need more special handler now or
>>>>>>> future to optimize.
>>>>>>> 3. Different kv backend may have different features just like
>>>>>>> FileSystemBackend, we would like to deal with these at KeyValueStore
>>>>>>> not DBObjectMap or upper class.
>>>>>>> 4. DBObjectMap is a little replicated and maybe not suitable to do more things.
>>>>>> 
>>>>>> I'm not fully following this description, but it sounds like you're
>>>>>> thinking about the right issues.  A few comments:
>>>>>> 
>>>>>> - In the ideal case, we'd like to minimize the number of lookups/keys we
>>>>>> query to access an object.  This is a bit less important for objects that
>>>>>> are cloned (they tend to be snapshots... mostly).
>>>>>> 
>>>>>> - I think it makes sense to make the main header key for an object be able
>>>>>> to embed various bits of useful data, like
>>>>>> 
>>>>>> - all of the xattrs, if there aren't many of them
>>>>>> - the file size
>>>>>> - the file content, if it is small
>>>>>> 
>>>>>> No need for this in the initial implementation, but we should design
>>>>>> something that can accomodate it.
>>>>>> 
>>>>>> - It would be nice to capture the striping CRUD stuff in a separate class;
>>>>>> a child of DBObjectMap or something similar.  This will make it easy to
>>>>>> swap out and/or experiment with different approaches.
>>>>>> 
>>>>>>> So in this proposal, DBObjectMap will serve as a bridge in the front
>>>>>>> of KeyValueDB. KeyValueStore mainly use DBObjectMap API to store
>>>>>>> stripped object and DBObjectMap::Header to store metadata. If so, my
>>>>>>> previous implementation could be fully make use of. :-)
>>>>>> 
>>>>>> That's great news!  Let me know if there is anything we can do to help
>>>>>> here.

Another problem: Because DBObjectMap API only accept ghobject_t and have no info
about the object which collection belong to. So if using DBObjectMap inherent API,
it can't handle with ObjectStore APIs such as "collection_list" and "collection_empty".

If adding "coll_t" argument to DBObjectMap API and new obj_name->key function
"DBObjectMap::ghobject_key_v1", it looks like most of API needed to be rewrite
that I don't want. 

Another way is that adding a collection-objects mapping, which may add the number of
operations each transactions.

>>>>>> 
>>>>>> sage
>>>>> 
>>>>> Thanks for your comments!
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Best Regards,
>>>>> 
>>>>> Wheat
>>> 
>>> Best regards,
>>> Wheats
> 
> Best regards,
> Wheats

Best regards,
Wheats

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html