a couple hot/cold classification storage papers

Sage Weil <sage@xxxxxxxxxxx> · Thu, 3 Oct 2013 13:21:29 -0700 (PDT)

http://131.107.65.14/pubs/176690/ColdDataClassification-icde2013-cr.pdf

 - identify hot/cold records for an in-memory database
 - in-memory lru is discarded ot of hand due to overhead
 - they do a simple log (or log a sample of say 10% of accesses) an 
present various algorithms for estimating K hottests items from that.
 - their 'backward' algorithm scans the log in reverse chronological 
order.  once it figures out no further items can be found that compete 
with what is hottest so far it can terminate early.
 - they seem to assume that every record is in the log, or that anything 
not in the log is already known cold and not of interest.  so, not quite 
the same problem as us unless we log for all time.

Thought: 
 We could only trim a hitset/bloom filter/whatever once every hash key 
 that appears in that set but not later sets has been demoted/purged.  In 
 our case, that could mean:

  - initial pass that enumerates all object and pushes untouched stuff (as 
    we've previosly discussed)
  - thereafter, the agent scans from 0..2^32 and enumerates any hash 
    values appearing in the oldest sets but not newer ones and only pushes those 
    down.

 Not sure how tractable that might be.  If we explicitly listed object 
 names in each hitset it would certainly work.

---

http://dmclab.hanyang.ac.kr/wikidata/ssd/2012_ssd_seminar/MSST_2011/HotDataIdentification_DongchulPark_MSST_2011.pdf

 - identify hot data in an SSD
 - bloom filters because DRAM is precious (and mostly needed for FTL)
 - round-robin set of bloom filters
  - estimate both frequency (how many bf's does it appear in) and recency 
    (oldest/newest access)

Thoughts:
 - Any DRAM not spent on hot/cold tracking is spent on caching, which 
   improves performance.
 - We could use counting bloom filters.  Although that may not be that 
   useful if we have multiple bins and can count how many bins accesses
   appear in.

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html