http://131.107.65.14/pubs/176690/ColdDataClassification-icde2013-cr.pdf - identify hot/cold records for an in-memory database - in-memory lru is discarded ot of hand due to overhead - they do a simple log (or log a sample of say 10% of accesses) an present various algorithms for estimating K hottests items from that. - their 'backward' algorithm scans the log in reverse chronological order. once it figures out no further items can be found that compete with what is hottest so far it can terminate early. - they seem to assume that every record is in the log, or that anything not in the log is already known cold and not of interest. so, not quite the same problem as us unless we log for all time. Thought: We could only trim a hitset/bloom filter/whatever once every hash key that appears in that set but not later sets has been demoted/purged. In our case, that could mean: - initial pass that enumerates all object and pushes untouched stuff (as we've previosly discussed) - thereafter, the agent scans from 0..2^32 and enumerates any hash values appearing in the oldest sets but not newer ones and only pushes those down. Not sure how tractable that might be. If we explicitly listed object names in each hitset it would certainly work. --- http://dmclab.hanyang.ac.kr/wikidata/ssd/2012_ssd_seminar/MSST_2011/HotDataIdentification_DongchulPark_MSST_2011.pdf - identify hot data in an SSD - bloom filters because DRAM is precious (and mostly needed for FTL) - round-robin set of bloom filters - estimate both frequency (how many bf's does it appear in) and recency (oldest/newest access) Thoughts: - Any DRAM not spent on hot/cold tracking is spent on caching, which improves performance. - We could use counting bloom filters. Although that may not be that useful if we have multiple bins and can count how many bins accesses appear in. -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html