On Wed, Oct 2, 2013 at 5:02 PM, Sage Weil <sage@xxxxxxxxxxx> wrote: > If we make this a special internal object we need to complicate recovery > and namespacing to keep is separate from user data. We also need to > implement a new API for retrieving, trimming, and so forth. > > Instead, we could just store the in-progress and completed bloom filters > (or even explicit hit list) as regular rados objects in a separate > namespace. The namespace could be '.ceph' or similar by default, but > configurable in case the user wants something different for some reason. > > Normal recovery should work unmodified. > > The normal rados API could be used to fetch (or even delete) old info. > > I think the main challenge is making an object_locator_t that maps cleanly > into a specific PG so that a particular object is always stored exactly > with the PG. This should be a pretty easy change to object_locator_t. > In the mapping process, all we're doing is hashing the key string and > mixing in the pool hash; here we'd just be able to specify the resulting > value explicitly. > > Thoughts? > sage PG splitting? That and other internal mechanisms are already going to need to treat it as a special object. I think recovery will as well; what happens if we're serving up writes during a long-running recovery but haven't gotten to recovering that object yet when we need to persist? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html