Hi Li! On Sat, 23 May 2015, ??? wrote: > Hello! > > I'm a GSoC student this year and my job is to introduce Missing Rate > Curve (or reuse distance exactly) of objects into OSD. Now I'm trying > to find a proper algorithm to implement but there is a problem: Should > I take the number of objects tracked in an OSD as infinite or > constant? > > The point is that there is an algorithm that use hash to sample only > constant number of references to do the analysis and is proved to be > accurate, which makes it possible to do online MRC construction. That > accuracy is supported by the fact that the memory addresses is > bounded, while objects can be deleted and created again and again in > Ceph. Is is reasonable to think that an OSD only serves bounded number > of objects in its life time (or the time period that we want to > compute MRC)? I don't remember how the object count affects the MRC, but I suspect we will want to use a strategy similar to what the HitSets do: - a new HitSet is generated on a periodic basis - each time a new one is started, we size it based on the previous iteration: we can compare the number of HitSet (bloom filter) insertions we've done with the resulting filter density. I think we'll want to build periodic MRCs anyway since the workload will shift over time. Ceph explicitly tracks the number of objects within each PG (see pg_stats_t). Does that help? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html