Hi,
On 10/26/2015 01:43 PM, Yan, Zheng wrote:
On Thu, Oct 22, 2015 at 2:55 PM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,
On 10/22/2015 02:54 AM, Gregory Farnum wrote:
On Sun, Oct 18, 2015 at 8:27 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Sat, Oct 17, 2015 at 1:42 AM, Burkhard Linke
<Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hi,
I've noticed that CephFS (both ceph-fuse and kernel client in version
4.2.3)
remove files from page cache as soon as they are not in use by a process
anymore.
Is this intended behaviour? We use CephFS as a replacement for NFS in
our
HPC cluster. It should serve large files which are read by multiple jobs
on
multiple hosts, so keeping them in the page cache over the duration of
several job invocations is crucial.
Yes. MDS needs resource to track the cached data. We don't want MDS
use too much resource.
So if I'm reading things right, the code to drop the page cache for
ceph-fuse was added in https://github.com/ceph/ceph/pull/1594
(specifically 82015e409d09701a7048848f1d4379e51dd00892). I don't think
it's actually needed for the cap trimming stuff or to prevent MDS
cache pressure and it's actually not clear to me why it was added here
anyway. But you do say the PR as a whole fixed a lot of bugs. Do you
know if the page cache clearing was for any bugs in particular, Zheng?
In general I think proactively clearing the page cache is something we
really only want to do as part of our consistency and cap handling
story, and file closes don't really play into that. I've pushed a
TOTALLY UNTESTED (NOT EVEN COMPILED) branch client-pagecache-norevoke
based on master to the gitbuilders. If it does succeed in building you
should be able to download it and you can use it for testing, or
cherry-pick the top commit out of git and build your own packages.
Then set the (new to this branch) client_preserve_pagecache config
option to true (default: false) and it should avoid flushing the page
cache.
Thanks a lot for having a closer look at this. I'm currently preparing the
deployment of 0.94.4 (or 0.94.5 due to rbd bug), and need to add some
patches to ceph-fuse for correct permission handling. I'll cherry-pick the
changes of that branch and test the package.
I have wrote patches for both kernel and fuse clients. they are under testing
https://github.com/ceph/ceph/pull/6380
https://github.com/ceph/ceph-client/commit/dfbb503e4e12580fc3d2952269104f293b0ec7e8
Great! I've applied the changes of the fuse client to the current 0.94.5
source tree. Automatic cache invalidation does not occur any more:
start: 196280 cached Mem
cat'ing of some file on cephfs (~850MB): 1027556 cached Mem
After termination of the cat command the cached size stays at about 1 GB.
Unfortunatly we're only halfway there:
dd'ing the first MB of the same file should be handled by the page cache
(file is not changed on any other node). But cache size drops to 203244
(~ start value above), so the file's content is evicted from cache by
reopening the same file.
Debug output of ceph-fuse (debug_client = 10/10):
2015-10-28 17:40:38.647653 7f3a1ffff700 10 client.904899 renew_caps()
2015-10-28 17:40:38.647764 7f3a1ffff700 10 client.904899 renew_caps mds.0
2015-10-28 17:40:38.650445 7f3a1e7fc700 10 client.904899
handle_client_session client_session(renewcaps seq 24) v1 from mds.0
2015-10-28 17:40:43.529085 7f39f387b700 3 client.904899 ll_getattr 1.head
2015-10-28 17:40:43.529149 7f39f387b700 10 client.904899 _getattr mask
pAsLsXsFs issued=1
2015-10-28 17:40:43.529370 7f39f387b700 10 client.904899 fill_stat on 1
snap/devhead mode 040755 mtime 2015-09-18 16:06:20.645030 ctime
2015-09-18 16:06:20.645030
2015-10-28 17:40:43.529407 7f39f387b700 3 client.904899 ll_getattr
1.head = 0
2015-10-28 17:40:43.529441 7f39f387b700 3 client.904899 ll_forget 1 1
2015-10-28 17:40:43.529876 7f3a01ffb700 3 client.904899 ll_lookup
0x7f3a0c01b320 volumes
2015-10-28 17:40:43.529911 7f3a01ffb700 10 client.904899 _lookup
1.head(ref=3 ll_ref=14 cap_refs={} open={} mode=40755 size=0/0
mtime=2015-09-18 16:06:20.645030 caps=pAsLsXsFs(0=pAsLsXsFs)
has_dir_layout 0x7f3a0c01b320) volumes = 100009de0f2.head(ref=3 ll_ref=3
cap_refs={} open={} mode=40755 size=0/0 mtime=2015-09-18 10:28:37.519639
caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x7f3a0c01dfd0 has_dir_layout
0x7f3a0c01d210)
2015-10-28 17:40:43.529998 7f3a01ffb700 10 client.904899 fill_stat on
100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639
ctime 2015-09-18 10:28:37.519639
2015-10-28 17:40:43.530014 7f3a01ffb700 3 client.904899 ll_lookup
0x7f3a0c01b320 volumes -> 0 (100009de0f2)
2015-10-28 17:40:43.530036 7f3a01ffb700 3 client.904899 ll_forget 1 1
2015-10-28 17:40:43.530527 7f3a017fa700 3 client.904899 ll_getattr
100009de0f2.head
2015-10-28 17:40:43.530570 7f3a017fa700 10 client.904899 _getattr mask
pAsLsXsFs issued=1
2015-10-28 17:40:43.530584 7f3a017fa700 10 client.904899 fill_stat on
100009de0f2 snap/devhead mode 040755 mtime 2015-09-18 10:28:37.519639
ctime 2015-09-18 10:28:37.519639
2015-10-28 17:40:43.530602 7f3a017fa700 3 client.904899 ll_getattr
100009de0f2.head = 0
2015-10-28 17:40:43.530635 7f3a017fa700 3 client.904899 ll_forget
100009de0f2 1
2015-10-28 17:40:43.531104 7f39fb180700 3 client.904899 ll_lookup
0x7f3a0c01d210 biodb
2015-10-28 17:40:43.531153 7f39fb180700 10 client.904899 _lookup
100009de0f2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=40755 size=0/0
mtime=2015-09-18 10:28:37.519639 caps=pAsLsXsFs(0=pAsLsXsFs)
parents=0x7f3a0c01dfd0 has_dir_layout 0x7f3a0c01d210) biodb =
100008169e0.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0
mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs)
parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0)
2015-10-28 17:40:43.531230 7f39fb180700 10 client.904899 fill_stat on
100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030
ctime 2015-10-08 10:17:04.202030
2015-10-28 17:40:43.531241 7f39fb180700 3 client.904899 ll_lookup
0x7f3a0c01d210 biodb -> 0 (100008169e0)
2015-10-28 17:40:43.531271 7f39fb180700 3 client.904899 ll_forget
100009de0f2 1
2015-10-28 17:40:43.531748 7f39fb981700 3 client.904899 ll_getattr
100008169e0.head
2015-10-28 17:40:43.531771 7f39fb981700 10 client.904899 _getattr mask
pAsLsXsFs issued=1
2015-10-28 17:40:43.531794 7f39fb981700 10 client.904899 fill_stat on
100008169e0 snap/devhead mode 042775 mtime 2015-10-08 10:17:04.202030
ctime 2015-10-08 10:17:04.202030
2015-10-28 17:40:43.531900 7f39fb981700 3 client.904899 ll_getattr
100008169e0.head = 0
2015-10-28 17:40:43.531947 7f39fb981700 3 client.904899 ll_forget
100008169e0 1
2015-10-28 17:40:43.532261 7f39f55a2700 3 client.904899 ll_lookup
0x7f3a0c01e9e0 asn1
2015-10-28 17:40:43.532299 7f39f55a2700 10 client.904899 _lookup
100008169e0.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0
mtime=2015-10-08 10:17:04.202030 caps=pAsLsXsFs(0=pAsLsXsFs)
parents=0x7f3a0c01f740 has_dir_layout 0x7f3a0c01e9e0) asn1 =
100025145a2.head(ref=3 ll_ref=3 cap_refs={} open={} mode=42775 size=0/0
mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs)
parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150)
2015-10-28 17:40:43.532400 7f39f55a2700 10 client.904899 fill_stat on
100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825
ctime 2015-09-15 15:35:47.286314
2015-10-28 17:40:43.532413 7f39f55a2700 3 client.904899 ll_lookup
0x7f3a0c01e9e0 asn1 -> 0 (100025145a2)
2015-10-28 17:40:43.532428 7f39f55a2700 3 client.904899 ll_forget
100008169e0 1
2015-10-28 17:40:43.532523 7f3a00ff9700 3 client.904899 ll_getattr
100025145a2.head
2015-10-28 17:40:43.532536 7f3a00ff9700 10 client.904899 _getattr mask
pAsLsXsFs issued=1
2015-10-28 17:40:43.532544 7f3a00ff9700 10 client.904899 fill_stat on
100025145a2 snap/devhead mode 042775 mtime 2015-09-15 15:26:59.173825
ctime 2015-09-15 15:35:47.286314
2015-10-28 17:40:43.532585 7f3a00ff9700 3 client.904899 ll_getattr
100025145a2.head = 0
2015-10-28 17:40:43.532609 7f3a00ff9700 3 client.904899 ll_forget
100025145a2 1
2015-10-28 17:40:43.532676 7f39fa97f700 3 client.904899 ll_lookup
0x7f3a0c020150 nr.01.psq
2015-10-28 17:40:43.532695 7f39fa97f700 10 client.904899 _lookup
100025145a2.head(ref=3 ll_ref=5 cap_refs={} open={} mode=42775 size=0/0
mtime=2015-09-15 15:26:59.173825 caps=pAsLsXsFs(0=pAsLsXsFs)
parents=0x7f3a0c020e40 has_dir_layout 0x7f3a0c020150) nr.01.psq =
1000261ff71.head(ref=2 ll_ref=3 cap_refs={1024=0,2048=0} open={1=0}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFsc(0=pAsLsXsFsc) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0)
2015-10-28 17:40:43.532748 7f39fa97f700 10 client.904899 fill_stat on
1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000
ctime 2015-09-15 15:26:39.155881
2015-10-28 17:40:43.532758 7f39fa97f700 3 client.904899 ll_lookup
0x7f3a0c020150 nr.01.psq -> 0 (1000261ff71)
2015-10-28 17:40:43.532796 7f39fa97f700 3 client.904899 ll_forget
100025145a2 1
2015-10-28 17:40:43.532847 7f39f387b700 3 client.904899 ll_getattr
1000261ff71.head
2015-10-28 17:40:43.532858 7f39f387b700 10 client.904899 _getattr mask
pAsLsXsFs issued=1
2015-10-28 17:40:43.532867 7f39f387b700 10 client.904899 fill_stat on
1000261ff71 snap/devhead mode 0100664 mtime 2015-09-09 21:05:17.000000
ctime 2015-09-15 15:26:39.155881
2015-10-28 17:40:43.532880 7f39f387b700 3 client.904899 ll_getattr
1000261ff71.head = 0
2015-10-28 17:40:43.532894 7f39f387b700 3 client.904899 ll_forget
1000261ff71 1
2015-10-28 17:40:43.532956 7f3a01ffb700 3 client.904899 ll_open
1000261ff71.head 32768
2015-10-28 17:40:43.535627 7f3a01ffb700 10 client.904899
choose_target_mds from caps on inode 1000261ff71.head(ref=3 ll_ref=5
cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0
mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc)
objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0]
parents=0x7f3a0c0226b0 0x7f3a0c0217d0)
2015-10-28 17:40:43.535694 7f3a01ffb700 10 client.904899 send_request
rebuilding request 8 for mds.0
2015-10-28 17:40:43.535714 7f3a01ffb700 10 client.904899 send_request
client_request(unknown.0:8 open #1000261ff71 2015-10-28 17:40:43.535600)
v2 to mds.0
2015-10-28 17:40:43.537945 7f3a1e7fc700 10 client.904899 mds.0 seq now 3
2015-10-28 17:40:43.538043 7f3a1e7fc700 5 client.904899
handle_cap_grant on in 1000261ff71 mds.0 seq 6 caps now pAsLsXsFscr was
pAsLsXsFsc
2015-10-28 17:40:43.538065 7f3a1e7fc700 10 client.904899
update_inode_file_bits 1000261ff71.head(ref=3 ll_ref=5
cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0
mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFsc(0=pAsLsXsFsc)
objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0]
parents=0x7f3a0c0226b0 0x7f3a0c0217d0) pAsLsXsFsc mtime 2015-09-09
21:05:17.000000
2015-10-28 17:40:43.538106 7f3a1e7fc700 10 client.904899 grant, new
caps are Fr
2015-10-28 17:40:43.538212 7f3a1e7fc700 10 client.904899 insert_trace
from 2015-10-28 17:40:43.535710 mds.0 is_target=1 is_dentry=0
2015-10-28 17:40:43.538224 7f3a1e7fc700 10 client.904899 features
0x3ffffffffffff
2015-10-28 17:40:43.538228 7f3a1e7fc700 10 client.904899
update_snap_trace len 48
2015-10-28 17:40:43.538275 7f3a1e7fc700 10 client.904899
update_snap_trace snaprealm(1 nref=6 c=0 seq=1 parent=0 my_snaps=[]
cached_snapc=1=[]) seq 1 <= 1 and same parent, SKIPPING
2015-10-28 17:40:43.538296 7f3a1e7fc700 10 client.904899 hrm
is_target=1 is_dentry=0
2015-10-28 17:40:43.538320 7f3a1e7fc700 10 client.904899 add_update_cap
issued pAsLsXsFscr -> pAsLsXsFscr from mds.0 on 1000261ff71.head(ref=3
ll_ref=5 cap_refs={1024=0,2048=0} open={1=1} mode=100664
size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0)
2015-10-28 17:40:43.538412 7f3a01ffb700 10 client.904899 _create_fh
1000261ff71 mode 1
2015-10-28 17:40:43.538471 7f3a01ffb700 3 client.904899 ll_open
1000261ff71.head 32768 = 0 (0x7f39fc0f1e30)
2015-10-28 17:40:43.545244 7f3a01ffb700 3 client.904899 ll_forget
1000261ff71 1
2015-10-28 17:40:43.545282 7f3a1e7fc700 10 client.904899 put_inode on
1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0)
2015-10-28 17:40:43.634719 7f3a017fa700 3 client.904899 ll_flush
0x7f39fc0f1e30 1000261ff71
2015-10-28 17:40:43.634765 7f3a017fa700 10 client.904899 _flush:
0x7f39fc0f1e30 on inode 1000261ff71.head(ref=3 ll_ref=4
cap_refs={1024=0,2048=0} open={1=1} mode=100664 size=845295759/0
mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr)
objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0]
parents=0x7f3a0c0226b0 0x7f3a0c0217d0) no async_err state
2015-10-28 17:40:43.635217 7f39fb180700 3 client.904899 ll_read
0x7f39fc0f1e30 1000261ff71 0~131072
2015-10-28 17:40:43.635261 7f39fb180700 10 client.904899 get_caps
1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=0} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr
need Fr want Fc but not Fc revoking -
2015-10-28 17:40:43.635294 7f39fb180700 10 client.904899 _read_async
1000261ff71.head(ref=3 ll_ref=4 cap_refs={1024=0,2048=1} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 0~131072
2015-10-28 17:40:43.635309 7f39fb180700 10 client.904899 max_byes=0
max_periods=4
2015-10-28 17:40:43.635865 7f39fb180700 5 client.904899 get_cap_ref got
first FILE_CACHE ref on 1000261ff71.head(ref=3 ll_ref=4
cap_refs={1024=0,2048=1} open={1=1} mode=100664 size=845295759/0
mtime=2015-09-09 21:05:17.000000 caps=pAsLsXsFscr(0=pAsLsXsFscr)
objectset[1000261ff71 ts 0/0 objects 202 dirty_or_tx 0]
parents=0x7f3a0c0226b0 0x7f3a0c0217d0)
2015-10-28 17:40:43.636382 7f39fb981700 3 client.904899 ll_read
0x7f39fc0f1e30 1000261ff71 131072~131072
2015-10-28 17:40:43.636398 7f39fb981700 10 client.904899 get_caps
1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr
need Fr want Fc but not Fc revoking -
2015-10-28 17:40:43.636436 7f39fb981700 10 client.904899 _read_async
1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 131072~131072
2015-10-28 17:40:43.636451 7f39fb981700 10 client.904899 max_byes=0
max_periods=4
2015-10-28 17:40:43.641047 7f39f55a2700 3 client.904899 ll_read
0x7f39fc0f1e30 1000261ff71 262144~131072
2015-10-28 17:40:43.641060 7f39f55a2700 10 client.904899 get_caps
1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=1} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) have pAsLsXsFscr
need Fr want Fc but not Fc revoking -
2015-10-28 17:40:43.641111 7f39f55a2700 10 client.904899 _read_async
1000261ff71.head(ref=4 ll_ref=4 cap_refs={1024=1,2048=2} open={1=1}
mode=100664 size=845295759/0 mtime=2015-09-09 21:05:17.000000
caps=pAsLsXsFscr(0=pAsLsXsFscr) objectset[1000261ff71 ts 0/0 objects 202
dirty_or_tx 0] parents=0x7f3a0c0226b0 0x7f3a0c0217d0) 262144~131072
2015-10-28 17:40:43.641126 7f39f55a2700 10 client.904899 max_byes=0
max_periods=4
2015-10-28 17:40:43.641932 7f3a00ff9700 3 client.904899 ll_read
0x7f39fc0f1e30 1000261ff71 393216~131072
.... (more read calls)
I tried to dig into the ceph-fuse code, but I was unable to find the
fragment that is responsible for flushing the data from the page cache.
Regards,
Burkhard
--
Dr. rer. nat. Burkhard Linke
Bioinformatics and Systems Biology
Justus-Liebig-University Giessen
35392 Giessen, Germany
Phone: (+49) (0)641 9935810
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com