Re: Caching policy in machine learning context

Jonas Degrave <Jonas.Degrave@ugent.be> · Wed, 15 Feb 2017 14:30:29 +0100

Thanks, I tried your suggestions, and tried going back to the mq policy and play with those parameters. In the end, I tried:
lvchange --cachesettings 'migration_threshold=20000000 sequential_threshold=10000000 read_promote_adjustment=1 write_promote_adjustment=4' VG

With little success. This is probably due to the mq-policy looking only at the hit-count, rather than the hit-rate. Or at least, that is what I make up from line 595 in the code: http://lxr.free-electrons.com/source/drivers/md/dm-cache-policy-mq.c?v=3.19#L595

I wrote a small script, so my users could empty the cache manually, if they want to:

#!/bin/bash
if [ "$(id -u)" != "0" ]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi
lvremove -y VG/lv_cache
lvcreate -L 445G -n lv_cache VG /dev/sda
lvcreate -L 1G -n lv_cache_meta VG /dev/sda
lvconvert -y --type cache-pool --poolmetadata VG/lv_cache_meta VG/lv_cache
lvchange --cachepolicy smq VG
lvconvert --type cache --cachepool VG/lv_cache VG/lv

So, the only remaining option for me, would to write my own policy. This should be quite simple, as you basically need to act as if the cache is not full yet.

Can someone point me in the right direction as to how to do this? I have tried to find the last version of the code, but the best I could find was a redhat CVS-server which times out when connecting.

cvs -d :pserver:cvs@sources.redhat.com:/cvs/dm login cvs
CVS password: 
cvs [login aborted]: connect to sources.redhat.com(209.132.183.64):2401 failed: Connection timed out

 Can someone direct me to the latest source of the smq-policy?

Yours sincerely,

Jonas

On 13 February 2017 at 15:33, Zdenek Kabelac <zdenek.kabelac@gmail.com> wrote:
Dne 13.2.2017 v 15:19 Jonas Degrave napsal(a):

I am on kernel version 4.4.0-62-generic. I cannot upgrade to kernel 4.9, as it

did not play nice with

CUDA-drivers: https://devtalk.nvidia.com/default/topic/974733/nvidia-linux-driver-367-57-and-up-do-not-install-on-kernel-4-9-0-rc2-and-higher/

<https://devtalk.nvidia.com/default/topic/974733/nvidia-linux-driver-367-57-and-up-do-not-install-on-kernel-4-9-0-rc2-and-higher/>

Yes, I understand the cache needs repeated usage of blocks, but my question is

basically how many? And if I can lower that number?

In our use case, you basically read a certain group of 100GB of data

completely about 100 times. Then another user logs in, and reads a different

group of data about 100 times. But after a couple of such users, I observe

that only 20GB in total has been promoted to the cache. Even though the cache

is 450GB big, and could easily fit all the data one user would need.

So, I come to the conclusion that I need a more aggressive policy.

I now have a reported hit rate of 19.0%, when there is so few data on the

volume that 73% of the data would fit in the cache. I could probably solve

this issue by making the caching policy more aggressive. I am looking for a

way to do that.

There are 2 'knobs' - one is 'sequential_threshold' where cache tries

to avoid promoting 'long' continuous reads into cache  - so if

you do 100G reads then these likely meet the criteria and are avoided from

being promoted (and I think this one is not configurable for smq.

Other is 'migration_threshold' which limit bandwidth load on cache device.

You can try to change its value:

lvchange --cachesettings migration_threshold=10000000  vg/cachedlv

(check with dmsetup status)

Not sure thought how are there things configurable with smq cache policy.

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/