Re: Caching policy in machine learning context

Zdenek Kabelac <zdenek.kabelac@gmail.com> · Mon, 13 Feb 2017 13:55:41 +0100

Dne 13.2.2017 v 11:58 Jonas Degrave napsal(a):
Hi,

We are a group of scientists, who work on reasonably sized datasets
(10-100GB). Because we had troubles managing our SSD's (everyone likes to have
their data on the SSD), I set up a caching system where the 500GB SSD caches
the 4TB HD. This way, everybody would have their data virtually on the SSD,
and only the first pass through the dataset would be slow. Afterwards, it
would be cached anyway, and the reads would be faster.

I used lvm-cache for this. Yet, it seems that the (only) smq-policy is very
reluctant in promoting data to the cache, whereas what we would need, is that
data is promoted basically upon the first read. Because if someone is using
the machine on certain data, they will most likely go over the dataset a
couple of hundred times in the following hours.

Right now, after a week of testing lvm-cache with the smq-policy, it looks
like this:

    jdgrave@kat:~$ sudo ./lvmstats
    start              0
    end                7516192768
    segment_type       cache
    md_block_size      8
    md_utilization     14353/1179648
    cache_block_size   128
    cache_utilization  7208960/7208960
    read_hits          19954892
    read_misses        84623959
    read_hit_ratio     19.08%
    write_hits         672621
    write_misses       7336700
    write_hit_ratio    8.40%
    demotions          151757
    promotions         151757
    dirty              0
    features           1

     jdgrave@kat:~$ sudo ./lvmcache-statistics.sh
    -------------------------------------------------------------------------
    LVM [2.02.133(2)] cache report of found device /dev/VG/lv
    -------------------------------------------------------------------------
    - Cache Usage: 100.0% - Metadata Usage: 1.2%
    - Read Hit Rate: 19.0% - Write Hit Rate: 8.3%
    - Demotions/Promotions/Dirty: 151757/151757/0
    - Feature arguments in use: writeback
    - Core arguments in use : migration_threshold 2048 smq 0
      - Cache Policy: stochastic multiqueue (smq)
    - Cache Metadata Mode: rw
    - MetaData Operation Health: ok

The number of promotions has been very low, even though the read hit rate is
low as well. This is with a cache of 450GB, and currently only 614GB of data
on the cached device. A read hit rate of lower than 20%, when just randomly
caching would have achieved 73% is not what I would have hoped to get.

Is there a way to make the caching way more aggressive? Some settings I can tweak?

Hi

You've not reported kernel version use.
Please provide results kernel 4.9.

Also note - cache will NOT cache blocks which are well enough covered by 
'page-cache' and it's also 'slow' moving case - so it needs couple repeated 
usage of blocks (without page-cache)  to be promoted to cache.

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/