Re: CephFS and page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 10/26/2015 01:43 PM, Yan, Zheng wrote:
On Thu, Oct 22, 2015 at 2:55 PM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,


On 10/22/2015 02:54 AM, Gregory Farnum wrote:
On Sun, Oct 18, 2015 at 8:27 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,

I've noticed that CephFS (both ceph-fuse and kernel client in version
4.2.3)
remove files from page cache as soon as they are not in use by a process
anymore.

Is this intended behaviour? We use CephFS as a replacement for NFS in
our
HPC cluster. It should serve large files which are read by multiple jobs
on
multiple hosts, so keeping them in the page cache over the duration of
several job invocations is crucial.
Yes. MDS needs resource to track the cached data. We don't want MDS
use too much resource.
So if I'm reading things right, the code to drop the page cache for
ceph-fuse was added in https://github.com/ceph/ceph/pull/1594
(specifically 82015e409d09701a7048848f1d4379e51dd00892). I don't think
it's actually needed for the cap trimming stuff or to prevent MDS
cache pressure and it's actually not clear to me why it was added here
anyway. But you do say the PR as a whole fixed a lot of bugs. Do you
know if the page cache clearing was for any bugs in particular, Zheng?

In general I think proactively clearing the page cache is something we
really only want to do as part of our consistency and cap handling
story, and file closes don't really play into that. I've pushed a
TOTALLY UNTESTED (NOT EVEN COMPILED) branch client-pagecache-norevoke
based on master to the gitbuilders. If it does succeed in building you
should be able to download it and you can use it for testing, or
cherry-pick the top commit out of git and build your own packages.
Then set the (new to this branch) client_preserve_pagecache config
option to true (default: false) and it should avoid flushing the page
cache.

Thanks a lot for having a closer look at this. I'm currently preparing the
deployment of 0.94.4 (or 0.94.5 due to rbd bug), and need to add some
patches to ceph-fuse for correct permission handling. I'll cherry-pick the
changes of that branch and test the package.


I have wrote patches for both kernel and fuse clients. they are under testing

https://github.com/ceph/ceph/pull/6380
https://github.com/ceph/ceph-client/commit/dfbb503e4e12580fc3d2952269104f293b0ec7e8
Great! I've applied the changes of the fuse client to the current 0.94.5 source tree. Automatic cache invalidation does not occur any more:

start: 196280 cached Mem
cat'ing of some file on cephfs (~850MB): 1027556 cached Mem

After termination of the cat command the cached size stays at about 1 GB.

Unfortunatly we're only halfway there:

dd'ing the first MB of the same file should be handled by the page cache (file is not changed on any other node). But cache size drops to 203244 (~ start value above), so the file's content is evicted from cache by reopening the same file.

Debug output of ceph-fuse (debug_client = 10/10):

2015-10-28 17:40:38.647653 7f3a1ffff700 10 client.904899 renew_caps()
2015-10-28 17:40:38.647764 7f3a1ffff700 10 client.904899 renew_caps mds.0
2015-10-28 17:40:38.650445 7f3a1e7fc700 10 client.904899 handle_client_session client_session(renewcaps seq 24) v1 from mds.0
2015-10-28 17:40:43.529085 7f39f387b700  3 client.904899 ll_getattr 1.head
2015-10-28 17:40:43.529149 7f39f387b700 10 client.904899 _getattr mask pAsLsXsFs issued=1 2015-10-28 17:40:43.529370 7f39f387b700 10 client.904899 fill_stat on 1 snap/devhead mode 040755 mtime 2015-09-18 16:06:20.645030 ctime 2015-09-18 16:06:20.645030 2015-10-28 17:40:43.529407 7f39f387b700 3 client.904899 ll_getattr 1.head = 0
2015-10-28 17:40:43.529441 7f39f387b700  3 client.904899 ll_forget 1 1
2015-10-28 17:40:43.529876 7f3a01ffb700 3 client.904899 ll_lookup 0x7f3a0c01b320 volumes 2015-10-28 17:40:43.529911 7f3a01ffb700 10 client.904899 _lookup 1.head(ref=3 ll_ref=14 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-09-18 16:06:20.645030 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x7f3a0c01b320) volumes = 100009de0f2.head(ref=3 ll_ref=3 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-09-18 10:28:37.519639 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01dfd0 has_dir_layout 0x7f3a0c01d210) 2015-10-28 17:40:43.529998 7f3a01ffb700 10 client.904899 fill_stat on 100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639 ctime 2015-09-18 10:28:37.519639 2015-10-28 17:40:43.530014 7f3a01ffb700 3 client.904899 ll_lookup 0x7f3a0c01b320 volumes -> 0 (100009de0f2)
2015-10-28 17:40:43.530036 7f3a01ffb700  3 client.904899 ll_forget 1 1
2015-10-28 17:40:43.530527 7f3a017fa700 3 client.904899 ll_getattr 100009de0f2.head 2015-10-28 17:40:43.530570 7f3a017fa700 10 client.904899 _getattr mask pAsLsXsFs issued=1 2015-10-28 17:40:43.530584 7f3a017fa700 10 client.904899 fill_stat on 100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639 ctime 2015-09-18 10:28:37.519639 2015-10-28 17:40:43.530602 7f3a017fa700 3 client.904899 ll_getattr 100009de0f2.head = 0 2015-10-28 17:40:43.530635 7f3a017fa700 3 client.904899 ll_forget 100009de0f2 1 2015-10-28 17:40:43.531104 7f39fb180700 3 client.904899 ll_lookup 0x7f3a0c01d210 biodb 2015-10-28 17:40:43.531153 7f39fb180700 10 client.904899 _lookup 100009de0f2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=40755 size=0/0 mtime=2015-09-18 10:28:37.519639 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01dfd0 has_dir_layout 0x7f3a0c01d210) biodb = 100008169e0.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0 mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0) 2015-10-28 17:40:43.531230 7f39fb180700 10 client.904899 fill_stat on 100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030 ctime 2015-10-08 10:17:04.202030 2015-10-28 17:40:43.531241 7f39fb180700 3 client.904899 ll_lookup 0x7f3a0c01d210 biodb -> 0 (100008169e0) 2015-10-28 17:40:43.531271 7f39fb180700 3 client.904899 ll_forget 100009de0f2 1 2015-10-28 17:40:43.531748 7f39fb981700 3 client.904899 ll_getattr 100008169e0.head 2015-10-28 17:40:43.531771 7f39fb981700 10 client.904899 _getattr mask pAsLsXsFs issued=1 2015-10-28 17:40:43.531794 7f39fb981700 10 client.904899 fill_stat on 100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030 ctime 2015-10-08 10:17:04.202030 2015-10-28 17:40:43.531900 7f39fb981700 3 client.904899 ll_getattr 100008169e0.head = 0 2015-10-28 17:40:43.531947 7f39fb981700 3 client.904899 ll_forget 100008169e0 1 2015-10-28 17:40:43.532261 7f39f55a2700 3 client.904899 ll_lookup 0x7f3a0c01e9e0 asn1 2015-10-28 17:40:43.532299 7f39f55a2700 10 client.904899 _lookup 100008169e0.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0 mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0) asn1 = 100025145a2.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0 mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150) 2015-10-28 17:40:43.532400 7f39f55a2700 10 client.904899 fill_stat on 100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825 ctime 2015-09-15 15:35:47.286314 2015-10-28 17:40:43.532413 7f39f55a2700 3 client.904899 ll_lookup 0x7f3a0c01e9e0 asn1 -> 0 (100025145a2) 2015-10-28 17:40:43.532428 7f39f55a2700 3 client.904899 ll_forget 100008169e0 1 2015-10-28 17:40:43.532523 7f3a00ff9700 3 client.904899 ll_getattr 100025145a2.head 2015-10-28 17:40:43.532536 7f3a00ff9700 10 client.904899 _getattr mask pAsLsXsFs issued=1 2015-10-28 17:40:43.532544 7f3a00ff9700 10 client.904899 fill_stat on 100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825 ctime 2015-09-15 15:35:47.286314 2015-10-28 17:40:43.532585 7f3a00ff9700 3 client.904899 ll_getattr 100025145a2.head = 0 2015-10-28 17:40:43.532609 7f3a00ff9700 3 client.904899 ll_forget 100025145a2 1 2015-10-28 17:40:43.532676 7f39fa97f700 3 client.904899 ll_lookup 0x7f3a0c020150 nr.01.psq 2015-10-28 17:40:43.532695 7f39fa97f700 10 client.904899 _lookup 100025145a2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0 mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150) nr.01.psq = 1000261ff71.head(ref=2 ll_ref=3 cap_refs={1024=0,2048=0} open={1=0} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 2015-10-28 17:40:43.532748 7f39fa97f700 10 client.904899 fill_stat on 1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000 ctime 2015-09-15 15:26:39.155881 2015-10-28 17:40:43.532758 7f39fa97f700 3 client.904899 ll_lookup 0x7f3a0c020150 nr.01.psq -> 0 (1000261ff71) 2015-10-28 17:40:43.532796 7f39fa97f700 3 client.904899 ll_forget 100025145a2 1 2015-10-28 17:40:43.532847 7f39f387b700 3 client.904899 ll_getattr 1000261ff71.head 2015-10-28 17:40:43.532858 7f39f387b700 10 client.904899 _getattr mask pAsLsXsFs issued=1 2015-10-28 17:40:43.532867 7f39f387b700 10 client.904899 fill_stat on 1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000 ctime 2015-09-15 15:26:39.155881 2015-10-28 17:40:43.532880 7f39f387b700 3 client.904899 ll_getattr 1000261ff71.head = 0 2015-10-28 17:40:43.532894 7f39f387b700 3 client.904899 ll_forget 1000261ff71 1 2015-10-28 17:40:43.532956 7f3a01ffb700 3 client.904899 ll_open 1000261ff71.head 32768 2015-10-28 17:40:43.535627 7f3a01ffb700 10 client.904899 choose_target_mds from caps on inode 1000261ff71.head(ref=3 ll_ref=5 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 2015-10-28 17:40:43.535694 7f3a01ffb700 10 client.904899 send_request rebuilding request 8 for mds.0 2015-10-28 17:40:43.535714 7f3a01ffb700 10 client.904899 send_request client_request(unknown.0:8 open #1000261ff71 2015-10-28 17:40:43.535600) v2 to mds.0
2015-10-28 17:40:43.537945 7f3a1e7fc700 10 client.904899  mds.0 seq now 3
2015-10-28 17:40:43.538043 7f3a1e7fc700 5 client.904899 handle_cap_grant on in 1000261ff71 mds.0 seq 6 caps now pAsLsXsFscr was pAsLsXsFsc 2015-10-28 17:40:43.538065 7f3a1e7fc700 10 client.904899 update_inode_file_bits 1000261ff71.head(ref=3 ll_ref=5 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) pAsLsXsFsc mtime 2015-09-09 21:05:17.000000 2015-10-28 17:40:43.538106 7f3a1e7fc700 10 client.904899 grant, new caps are Fr 2015-10-28 17:40:43.538212 7f3a1e7fc700 10 client.904899 insert_trace from 2015-10-28 17:40:43.535710 mds.0 is_target=1 is_dentry=0 2015-10-28 17:40:43.538224 7f3a1e7fc700 10 client.904899 features 0x3ffffffffffff 2015-10-28 17:40:43.538228 7f3a1e7fc700 10 client.904899 update_snap_trace len 48 2015-10-28 17:40:43.538275 7f3a1e7fc700 10 client.904899 update_snap_trace snaprealm(1 nref=6 c=0 seq=1 parent=0 my_snaps=[] cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING 2015-10-28 17:40:43.538296 7f3a1e7fc700 10 client.904899 hrm is_target=1 is_dentry=0 2015-10-28 17:40:43.538320 7f3a1e7fc700 10 client.904899 add_update_cap issued pAsLsXsFscr -> pAsLsXsFscr from mds.0 on 1000261ff71.head(ref=3 ll_ref=5 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 2015-10-28 17:40:43.538412 7f3a01ffb700 10 client.904899 _create_fh 1000261ff71 mode 1 2015-10-28 17:40:43.538471 7f3a01ffb700 3 client.904899 ll_open 1000261ff71.head 32768 = 0 (0x7f39fc0f1e30) 2015-10-28 17:40:43.545244 7f3a01ffb700 3 client.904899 ll_forget 1000261ff71 1 2015-10-28 17:40:43.545282 7f3a1e7fc700 10 client.904899 put_inode on 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 2015-10-28 17:40:43.634719 7f3a017fa700 3 client.904899 ll_flush 0x7f39fc0f1e30 1000261ff71 2015-10-28 17:40:43.634765 7f3a017fa700 10 client.904899 _flush: 0x7f39fc0f1e30 on inode 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) no async_err state 2015-10-28 17:40:43.635217 7f39fb180700 3 client.904899 ll_read 0x7f39fc0f1e30 1000261ff71 0~131072 2015-10-28 17:40:43.635261 7f39fb180700 10 client.904899 get_caps 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need Fr want Fc but not Fc revoking - 2015-10-28 17:40:43.635294 7f39fb180700 10 client.904899 _read_async 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=1} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 0~131072 2015-10-28 17:40:43.635309 7f39fb180700 10 client.904899 max_byes=0 max_periods=4 2015-10-28 17:40:43.635865 7f39fb180700 5 client.904899 get_cap_ref got first FILE_CACHE ref on 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=1} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 2015-10-28 17:40:43.636382 7f39fb981700 3 client.904899 ll_read 0x7f39fc0f1e30 1000261ff71 131072~131072 2015-10-28 17:40:43.636398 7f39fb981700 10 client.904899 get_caps 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need Fr want Fc but not Fc revoking - 2015-10-28 17:40:43.636436 7f39fb981700 10 client.904899 _read_async 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 131072~131072 2015-10-28 17:40:43.636451 7f39fb981700 10 client.904899 max_byes=0 max_periods=4 2015-10-28 17:40:43.641047 7f39f55a2700 3 client.904899 ll_read 0x7f39fc0f1e30 1000261ff71 262144~131072 2015-10-28 17:40:43.641060 7f39f55a2700 10 client.904899 get_caps 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need Fr want Fc but not Fc revoking - 2015-10-28 17:40:43.641111 7f39f55a2700 10 client.904899 _read_async 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 262144~131072 2015-10-28 17:40:43.641126 7f39f55a2700 10 client.904899 max_byes=0 max_periods=4 2015-10-28 17:40:43.641932 7f3a00ff9700 3 client.904899 ll_read 0x7f39fc0f1e30 1000261ff71 393216~131072


.... (more read calls)

I tried to dig into the ceph-fuse code, but I was unable to find the fragment that is responsible for flushing the data from the page cache.


Regards,
Burkhard

--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux