Try looking in /proc/slabinfo / slabtop during your tests. > On 14.10.2018, at 15:21, jesper@xxxxxxxx wrote: > > Hi > > We have a dataset of ~300 GB on CephFS which as being used for computations > over and over agian .. being refreshed daily or similar. > > When hosting it on NFS after refresh, they are transferred, but from > there - they would be sitting in the kernel page cache of the client > until they are refreshed serverside. > > On CephFS it look "similar" but "different". Where the "steady state" > operation over NFS would give a client/server traffic of < 1MB/s .. > CephFS contantly pulls 50-100MB/s over the network. This has > implications for the clients that end up spending unnessary time waiting > for IO in the execution. > > This is in a setting where the CephFS client mem look like this: > > $ free -h > total used free shared buff/cache > available > Mem: 377G 17G 340G 1.2G 19G > 354G > Swap: 8.8G 430M 8.4G > > > If I just repeatedly run (within a few minute) something that is using the > files, then > it is fully served out of client page cache (2GB'ish / s) .. but it looks > like > it is being evicted way faster than in the NFS setting? > > This is not scientific .. but the CMD is a cat /file/on/ceph > /dev/null - > type on a total of 24GB data in 300'ish files. > > $ free -h; time CMD ; sleep 1800; free -h; time CMD ; free -h; sleep 3600; > time CMD ; > > total used free shared buff/cache > available > Mem: 377G 16G 312G 1.2G 48G > 355G > Swap: 8.8G 430M 8.4G > > real 0m8.997s > user 0m2.036s > sys 0m6.915s > total used free shared buff/cache > available > Mem: 377G 17G 277G 1.2G 82G > 354G > Swap: 8.8G 430M 8.4G > > real 3m25.904s > user 0m2.794s > sys 0m9.028s > total used free shared buff/cache > available > Mem: 377G 17G 283G 1.2G 76G > 353G > Swap: 8.8G 430M 8.4G > > real 6m18.358s > user 0m2.847s > sys 0m10.651s > > > Munin graphs of the system confirms that there has been zero memory > pressure over the period. > > Is there things in the CephFS case that can cause the page-cache to be > invailated? > Could less agressive "read-ahead" play a role? > > Other thoughts on what root cause on the different behaviour could be? > > Clients are using 4.15 kernel.. Anyone aware of newer patches in this area > that could impact ? > > Jesper > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com