On Thu, Oct 29, 2015 at 1:10 AM, Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi, > > > On 10/26/2015 01:43 PM, Yan, Zheng wrote: >> >> On Thu, Oct 22, 2015 at 2:55 PM, Burkhard Linke >> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> >>> On 10/22/2015 02:54 AM, Gregory Farnum wrote: >>>> >>>> On Sun, Oct 18, 2015 at 8:27 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >>>>> >>>>> On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke >>>>> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I've noticed that CephFS (both ceph-fuse and kernel client in version >>>>>> 4.2.3) >>>>>> remove files from page cache as soon as they are not in use by a >>>>>> process >>>>>> anymore. >>>>>> >>>>>> Is this intended behaviour? We use CephFS as a replacement for NFS in >>>>>> our >>>>>> HPC cluster. It should serve large files which are read by multiple >>>>>> jobs >>>>>> on >>>>>> multiple hosts, so keeping them in the page cache over the duration of >>>>>> several job invocations is crucial. >>>>> >>>>> Yes. MDS needs resource to track the cached data. We don't want MDS >>>>> use too much resource. >>>> >>>> So if I'm reading things right, the code to drop the page cache for >>>> ceph-fuse was added in https://github.com/ceph/ceph/pull/1594 >>>> (specifically 82015e409d09701a7048848f1d4379e51dd00892). I don't think >>>> it's actually needed for the cap trimming stuff or to prevent MDS >>>> cache pressure and it's actually not clear to me why it was added here >>>> anyway. But you do say the PR as a whole fixed a lot of bugs. Do you >>>> know if the page cache clearing was for any bugs in particular, Zheng? >>>> >>>> In general I think proactively clearing the page cache is something we >>>> really only want to do as part of our consistency and cap handling >>>> story, and file closes don't really play into that. I've pushed a >>>> TOTALLY UNTESTED (NOT EVEN COMPILED) branch client-pagecache-norevoke >>>> based on master to the gitbuilders. If it does succeed in building you >>>> should be able to download it and you can use it for testing, or >>>> cherry-pick the top commit out of git and build your own packages. >>>> Then set the (new to this branch) client_preserve_pagecache config >>>> option to true (default: false) and it should avoid flushing the page >>>> cache. >>> >>> >>> Thanks a lot for having a closer look at this. I'm currently preparing >>> the >>> deployment of 0.94.4 (or 0.94.5 due to rbd bug), and need to add some >>> patches to ceph-fuse for correct permission handling. I'll cherry-pick >>> the >>> changes of that branch and test the package. >>> >>> >> I have wrote patches for both kernel and fuse clients. they are under >> testing >> >> https://github.com/ceph/ceph/pull/6380 >> >> https://github.com/ceph/ceph-client/commit/dfbb503e4e12580fc3d2952269104f293b0ec7e8 > > Great! I've applied the changes of the fuse client to the current 0.94.5 > source tree. Automatic cache invalidation does not occur any more: > > start: 196280 cached Mem > cat'ing of some file on cephfs (~850MB): 1027556 cached Mem > > After termination of the cat command the cached size stays at about 1 GB. > > Unfortunatly we're only halfway there: > > dd'ing the first MB of the same file should be handled by the page cache > (file is not changed on any other node). But cache size drops to 203244 (~ > start value above), so the file's content is evicted from cache by reopening > the same file. > > Debug output of ceph-fuse (debug_client = 10/10): > > 2015-10-28 17:40:38.647653 7f3a1ffff700 10 client.904899 renew_caps() > 2015-10-28 17:40:38.647764 7f3a1ffff700 10 client.904899 renew_caps mds.0 > 2015-10-28 17:40:38.650445 7f3a1e7fc700 10 client.904899 > handle_client_session client_session(renewcaps seq 24) v1 from mds.0 > 2015-10-28 17:40:43.529085 7f39f387b700 3 client.904899 ll_getattr 1.head > 2015-10-28 17:40:43.529149 7f39f387b700 10 client.904899 _getattr mask > pAsLsXsFs issued=1 > 2015-10-28 17:40:43.529370 7f39f387b700 10 client.904899 fill_stat on 1 > snap/devhead mode 040755 mtime 2015-09-18 16:06:20.645030 ctime 2015-09-18 > 16:06:20.645030 > 2015-10-28 17:40:43.529407 7f39f387b700 3 client.904899 ll_getattr 1.head = > 0 > 2015-10-28 17:40:43.529441 7f39f387b700 3 client.904899 ll_forget 1 1 > 2015-10-28 17:40:43.529876 7f3a01ffb700 3 client.904899 ll_lookup > 0x7f3a0c01b320 volumes > 2015-10-28 17:40:43.529911 7f3a01ffb700 10 client.904899 _lookup > 1.head(ref=3 ll_ref=14 cap_refs={} open={} mode=40755 size=0/0 > mtime=2015-09-18 16:06:20.645030 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout > 0x7f3a0c01b320) volumes = 100009de0f2.head(ref=3 ll_ref=3 cap_refs={} > open={} mode=40755 size=0/0 mtime=2015-09-18 10:28:37.519639 > caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01dfd0 has_dir_layout > 0x7f3a0c01d210) > 2015-10-28 17:40:43.529998 7f3a01ffb700 10 client.904899 fill_stat on > 100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639 ctime > 2015-09-18 10:28:37.519639 > 2015-10-28 17:40:43.530014 7f3a01ffb700 3 client.904899 ll_lookup > 0x7f3a0c01b320 volumes -> 0 (100009de0f2) > 2015-10-28 17:40:43.530036 7f3a01ffb700 3 client.904899 ll_forget 1 1 > 2015-10-28 17:40:43.530527 7f3a017fa700 3 client.904899 ll_getattr > 100009de0f2.head > 2015-10-28 17:40:43.530570 7f3a017fa700 10 client.904899 _getattr mask > pAsLsXsFs issued=1 > 2015-10-28 17:40:43.530584 7f3a017fa700 10 client.904899 fill_stat on > 100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639 ctime > 2015-09-18 10:28:37.519639 > 2015-10-28 17:40:43.530602 7f3a017fa700 3 client.904899 ll_getattr > 100009de0f2.head = 0 > 2015-10-28 17:40:43.530635 7f3a017fa700 3 client.904899 ll_forget > 100009de0f2 1 > 2015-10-28 17:40:43.531104 7f39fb180700 3 client.904899 ll_lookup > 0x7f3a0c01d210 biodb > 2015-10-28 17:40:43.531153 7f39fb180700 10 client.904899 _lookup > 100009de0f2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=40755 size=0/0 > mtime=2015-09-18 10:28:37.519639 caps=pAsLsXsFs(0=pAsLsXsFs) > parents=0x7f3a0c01dfd0 has_dir_layout 0x7f3a0c01d210) biodb = > 100008169e0.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0 > mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs) > parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0) > 2015-10-28 17:40:43.531230 7f39fb180700 10 client.904899 fill_stat on > 100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030 ctime > 2015-10-08 10:17:04.202030 > 2015-10-28 17:40:43.531241 7f39fb180700 3 client.904899 ll_lookup > 0x7f3a0c01d210 biodb -> 0 (100008169e0) > 2015-10-28 17:40:43.531271 7f39fb180700 3 client.904899 ll_forget > 100009de0f2 1 > 2015-10-28 17:40:43.531748 7f39fb981700 3 client.904899 ll_getattr > 100008169e0.head > 2015-10-28 17:40:43.531771 7f39fb981700 10 client.904899 _getattr mask > pAsLsXsFs issued=1 > 2015-10-28 17:40:43.531794 7f39fb981700 10 client.904899 fill_stat on > 100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030 ctime > 2015-10-08 10:17:04.202030 > 2015-10-28 17:40:43.531900 7f39fb981700 3 client.904899 ll_getattr > 100008169e0.head = 0 > 2015-10-28 17:40:43.531947 7f39fb981700 3 client.904899 ll_forget > 100008169e0 1 > 2015-10-28 17:40:43.532261 7f39f55a2700 3 client.904899 ll_lookup > 0x7f3a0c01e9e0 asn1 > 2015-10-28 17:40:43.532299 7f39f55a2700 10 client.904899 _lookup > 100008169e0.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0 > mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs) > parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0) asn1 = > 100025145a2.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0 > mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs) > parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150) > 2015-10-28 17:40:43.532400 7f39f55a2700 10 client.904899 fill_stat on > 100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825 ctime > 2015-09-15 15:35:47.286314 > 2015-10-28 17:40:43.532413 7f39f55a2700 3 client.904899 ll_lookup > 0x7f3a0c01e9e0 asn1 -> 0 (100025145a2) > 2015-10-28 17:40:43.532428 7f39f55a2700 3 client.904899 ll_forget > 100008169e0 1 > 2015-10-28 17:40:43.532523 7f3a00ff9700 3 client.904899 ll_getattr > 100025145a2.head > 2015-10-28 17:40:43.532536 7f3a00ff9700 10 client.904899 _getattr mask > pAsLsXsFs issued=1 > 2015-10-28 17:40:43.532544 7f3a00ff9700 10 client.904899 fill_stat on > 100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825 ctime > 2015-09-15 15:35:47.286314 > 2015-10-28 17:40:43.532585 7f3a00ff9700 3 client.904899 ll_getattr > 100025145a2.head = 0 > 2015-10-28 17:40:43.532609 7f3a00ff9700 3 client.904899 ll_forget > 100025145a2 1 > 2015-10-28 17:40:43.532676 7f39fa97f700 3 client.904899 ll_lookup > 0x7f3a0c020150 nr.01.psq > 2015-10-28 17:40:43.532695 7f39fa97f700 10 client.904899 _lookup > 100025145a2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0 > mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs) > parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150) nr.01.psq = > 1000261ff71.head(ref=2 ll_ref=3 cap_refs={1024=0,2048=0} open={1=0} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) > 2015-10-28 17:40:43.532748 7f39fa97f700 10 client.904899 fill_stat on > 1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000 ctime > 2015-09-15 15:26:39.155881 > 2015-10-28 17:40:43.532758 7f39fa97f700 3 client.904899 ll_lookup > 0x7f3a0c020150 nr.01.psq -> 0 (1000261ff71) > 2015-10-28 17:40:43.532796 7f39fa97f700 3 client.904899 ll_forget > 100025145a2 1 > 2015-10-28 17:40:43.532847 7f39f387b700 3 client.904899 ll_getattr > 1000261ff71.head > 2015-10-28 17:40:43.532858 7f39f387b700 10 client.904899 _getattr mask > pAsLsXsFs issued=1 > 2015-10-28 17:40:43.532867 7f39f387b700 10 client.904899 fill_stat on > 1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000 ctime > 2015-09-15 15:26:39.155881 > 2015-10-28 17:40:43.532880 7f39f387b700 3 client.904899 ll_getattr > 1000261ff71.head = 0 > 2015-10-28 17:40:43.532894 7f39f387b700 3 client.904899 ll_forget > 1000261ff71 1 > 2015-10-28 17:40:43.532956 7f3a01ffb700 3 client.904899 ll_open > 1000261ff71.head 32768 > 2015-10-28 17:40:43.535627 7f3a01ffb700 10 client.904899 choose_target_mds > from caps on inode 1000261ff71.head(ref=3 ll_ref=5 cap_refs={1024=0,2048=0} > open={1=1} mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) > 2015-10-28 17:40:43.535694 7f3a01ffb700 10 client.904899 send_request > rebuilding request 8 for mds.0 > 2015-10-28 17:40:43.535714 7f3a01ffb700 10 client.904899 send_request > client_request(unknown.0:8 open #1000261ff71 2015-10-28 17:40:43.535600) v2 > to mds.0 > 2015-10-28 17:40:43.537945 7f3a1e7fc700 10 client.904899 mds.0 seq now 3 > 2015-10-28 17:40:43.538043 7f3a1e7fc700 5 client.904899 handle_cap_grant on > in 1000261ff71 mds.0 seq 6 caps now pAsLsXsFscr was pAsLsXsFsc > 2015-10-28 17:40:43.538065 7f3a1e7fc700 10 client.904899 > update_inode_file_bits 1000261ff71.head(ref=3 ll_ref=5 > cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 > mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc) > objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] > parents=0x7f3a0c0226b0 0x7f3a0c0217d0) pAsLsXsFsc mtime 2015-09-09 > 21:05:17.000000 > 2015-10-28 17:40:43.538106 7f3a1e7fc700 10 client.904899 grant, new caps > are Fr > 2015-10-28 17:40:43.538212 7f3a1e7fc700 10 client.904899 insert_trace from > 2015-10-28 17:40:43.535710 mds.0 is_target=1 is_dentry=0 > 2015-10-28 17:40:43.538224 7f3a1e7fc700 10 client.904899 features > 0x3ffffffffffff > 2015-10-28 17:40:43.538228 7f3a1e7fc700 10 client.904899 update_snap_trace > len 48 > 2015-10-28 17:40:43.538275 7f3a1e7fc700 10 client.904899 update_snap_trace > snaprealm(1 nref=6 c=0 seq=1 parent=0 my_snaps=[] cached_snapc=1=[]) seq 1 > <= 1 and same parent, SKIPPING > 2015-10-28 17:40:43.538296 7f3a1e7fc700 10 client.904899 hrm is_target=1 > is_dentry=0 > 2015-10-28 17:40:43.538320 7f3a1e7fc700 10 client.904899 add_update_cap > issued pAsLsXsFscr -> pAsLsXsFscr from mds.0 on 1000261ff71.head(ref=3 > ll_ref=5 cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 > mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) > objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] > parents=0x7f3a0c0226b0 0x7f3a0c0217d0) > 2015-10-28 17:40:43.538412 7f3a01ffb700 10 client.904899 _create_fh > 1000261ff71 mode 1 > 2015-10-28 17:40:43.538471 7f3a01ffb700 3 client.904899 ll_open > 1000261ff71.head 32768 = 0 (0x7f39fc0f1e30) > 2015-10-28 17:40:43.545244 7f3a01ffb700 3 client.904899 ll_forget > 1000261ff71 1 > 2015-10-28 17:40:43.545282 7f3a1e7fc700 10 client.904899 put_inode on > 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) > 2015-10-28 17:40:43.634719 7f3a017fa700 3 client.904899 ll_flush > 0x7f39fc0f1e30 1000261ff71 > 2015-10-28 17:40:43.634765 7f3a017fa700 10 client.904899 _flush: > 0x7f39fc0f1e30 on inode 1000261ff71.head(ref=3 ll_ref=4 > cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0 > mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) > objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] > parents=0x7f3a0c0226b0 0x7f3a0c0217d0) no async_err state > 2015-10-28 17:40:43.635217 7f39fb180700 3 client.904899 ll_read > 0x7f39fc0f1e30 1000261ff71 0~131072 > 2015-10-28 17:40:43.635261 7f39fb180700 10 client.904899 get_caps > 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need > Fr want Fc but not Fc revoking - > 2015-10-28 17:40:43.635294 7f39fb180700 10 client.904899 _read_async > 1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=1} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 0~131072 > 2015-10-28 17:40:43.635309 7f39fb180700 10 client.904899 max_byes=0 > max_periods=4 > 2015-10-28 17:40:43.635865 7f39fb180700 5 client.904899 get_cap_ref got > first FILE_CACHE ref on 1000261ff71.head(ref=3 ll_ref=4 > cap_refs={1024=0,2048=1} open={1=1} mode=100664 size=845295759/0 > mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr) > objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0] > parents=0x7f3a0c0226b0 0x7f3a0c0217d0) > 2015-10-28 17:40:43.636382 7f39fb981700 3 client.904899 ll_read > 0x7f39fc0f1e30 1000261ff71 131072~131072 > 2015-10-28 17:40:43.636398 7f39fb981700 10 client.904899 get_caps > 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need > Fr want Fc but not Fc revoking - > 2015-10-28 17:40:43.636436 7f39fb981700 10 client.904899 _read_async > 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 131072~131072 > 2015-10-28 17:40:43.636451 7f39fb981700 10 client.904899 max_byes=0 > max_periods=4 > 2015-10-28 17:40:43.641047 7f39f55a2700 3 client.904899 ll_read > 0x7f39fc0f1e30 1000261ff71 262144~131072 > 2015-10-28 17:40:43.641060 7f39f55a2700 10 client.904899 get_caps > 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr need > Fr want Fc but not Fc revoking - > 2015-10-28 17:40:43.641111 7f39f55a2700 10 client.904899 _read_async > 1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1} > mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000 > caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202 > dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 262144~131072 > 2015-10-28 17:40:43.641126 7f39f55a2700 10 client.904899 max_byes=0 > max_periods=4 > 2015-10-28 17:40:43.641932 7f3a00ff9700 3 client.904899 ll_read > 0x7f39fc0f1e30 1000261ff71 393216~131072 > > > .... (more read calls) > > I tried to dig into the ceph-fuse code, but I was unable to find the > fragment that is responsible for flushing the data from the page cache. > fuse kernel code invalidates page cache on opening file. you can disable this behaviour by setting ""fuse use invalidate cb" config option to true. Regards Yan, Zheng > > Regards, > Burkhard > > -- > Dr. rer. nat. Burkhard Linke > Bioinformatics and Systems Biology > Justus-Liebig-University Giessen > 35392 Giessen, Germany > Phone: (+49) (0)641 9935810 > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com