Re: thought on storing bloom (hit) info

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 2 Oct 2013 17:07:23 -0700



On Wed, Oct 2, 2013 at 5:02 PM, Sage Weil <sage@xxxxxxxxxxx> wrote:
> If we make this a special internal object we need to complicate recovery
> and namespacing to keep is separate from user data.  We also need to
> implement a new API for retrieving, trimming, and so forth.
>
> Instead, we could just store the in-progress and completed bloom filters
> (or even explicit hit list) as regular rados objects in a separate
> namespace.  The namespace could be '.ceph' or similar by default, but
> configurable in case the user wants something different for some reason.
>
> Normal recovery should work unmodified.
>
> The normal rados API could be used to fetch (or even delete) old info.
>
> I think the main challenge is making an object_locator_t that maps cleanly
> into a specific PG so that a particular object is always stored exactly
> with the PG.  This should be a pretty easy change to object_locator_t.
> In the mapping process, all we're doing is hashing the key string and
> mixing in the pool hash; here we'd just be able to specify the resulting
> value explicitly.
>
> Thoughts?
> sage

PG splitting?
That and other internal mechanisms are already going to need to treat
it as a special object. I think recovery will as well; what happens if
we're serving up writes during a long-running recovery but haven't
gotten to recovering that object yet when we need to persist?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html