Re: cephfs kernel client - page cache being invaildated.

Sergey Malinin <hell@xxxxxxxxxxx> · Sun, 14 Oct 2018 15:35:04 +0300

Try looking in /proc/slabinfo / slabtop during your tests.

> On 14.10.2018, at 15:21, jesper@xxxxxxxx wrote:
> 
> Hi
> 
> We have a dataset of ~300 GB on CephFS which as being used for computations
> over and over agian .. being refreshed daily or similar.
> 
> When hosting it on NFS after refresh, they are transferred, but from
> there - they would be sitting in the kernel page cache of the client
> until they are refreshed serverside.
> 
> On CephFS it look "similar" but "different". Where the "steady state"
> operation over NFS would give a client/server traffic of < 1MB/s ..
> CephFS contantly pulls 50-100MB/s over the network.  This has
> implications for the clients that end up spending unnessary time waiting
> for IO in the execution.
> 
> This is in a setting where the CephFS client mem look like this:
> 
> $ free -h
>              total        used        free      shared  buff/cache  
> available
> Mem:           377G         17G        340G        1.2G         19G       
> 354G
> Swap:          8.8G        430M        8.4G
> 
> 
> If I just repeatedly run (within a few minute) something that is using the
> files, then
> it is fully served out of client page cache (2GB'ish / s) ..  but it looks
> like
> it is being evicted way faster than in the NFS setting?
> 
> This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null -
> type on a total of 24GB data in 300'ish files.
> 
> $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600;
> time CMD ;
> 
>              total        used        free      shared  buff/cache  
> available
> Mem:           377G         16G        312G        1.2G         48G       
> 355G
> Swap:          8.8G        430M        8.4G
> 
> real    0m8.997s
> user    0m2.036s
> sys     0m6.915s
>              total        used        free      shared  buff/cache  
> available
> Mem:           377G         17G        277G        1.2G         82G       
> 354G
> Swap:          8.8G        430M        8.4G
> 
> real    3m25.904s
> user    0m2.794s
> sys     0m9.028s
>              total        used        free      shared  buff/cache  
> available
> Mem:           377G         17G        283G        1.2G         76G       
> 353G
> Swap:          8.8G        430M        8.4G
> 
> real    6m18.358s
> user    0m2.847s
> sys     0m10.651s
> 
> 
> Munin graphs of the system confirms that there has been zero memory
> pressure over the period.
> 
> Is there things in the CephFS case that can cause the page-cache to be
> invailated?
> Could less agressive "read-ahead" play a role?
> 
> Other thoughts on what root cause on the different behaviour could be?
> 
> Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
> that could impact ?
> 
> Jesper
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com