Dear CephFSers.
We are running ceph/cephfs in 10.2.2. All infrastructure is in
the same version (rados cluster, mons, mds and cephfs clients). We
mount cephfs using ceph-fuse.
Last week I triggered some of my heavy users to delete data. In
the following example, the user in question decreased his usage
from ~4.5TB to ~ 600TB. However, some clients still did not update
the usage (although several days have passed by) while others are
ok.
From a point of view of the MDS, both types of client have
healthy sessions. See detailed info after this email.
Trying to kick the session does not solve the issue. Probably
only a remount but users are heavily using the filesystem and I do
not want to break things for them now.
The only difference I can actually dig out between "good"/"bad"
clients is that the user continues with active bash sessions in
the "bad" client (from where he triggered the deletions)
# lsof | grep user1 | grep ceph
bash 15737 user1 cwd DIR 0,24
5285584388909 1099514070586 /coepp/cephfs/mel/user1
vim 19233 user1 cwd DIR 0,24
24521126 1099514340633
/coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
vim 19233 user1 5u REG
0,24 16384 1099557935412
/coepp/cephfs/mel/user1/Analysis/ssdilep/scripts/.histmgr.py.swp
bash 24187 user1 cwd DIR 0,24
826758558 1099514314315 /coepp/cephfs/mel/user1/Analysis
bash 24256 user1 cwd DIR 0,24
147600 1099514340621
/coepp/cephfs/mel/user1/Analysis/ssdilep/run
bash 24327 user1 cwd DIR 0,24
151068 1099514340590
/coepp/cephfs/mel/user1/Analysis/ssdilep/algs
bash 24394 user1 cwd DIR 0,24
151068 1099514340590
/coepp/cephfs/mel/user1/Analysis/ssdilep/algs
bash 24461 user1 cwd DIR 0,24
356436 1099514340614
/coepp/cephfs/mel/user1/Analysis/ssdilep/samples
bash 24528 user1 cwd DIR 0,24
24521126 1099514340633
/coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
bash 24601 user1 cwd DIR 0,24
24521126 1099514340633
/coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
Is there a particular way to force the client to update these
info? Do we actually know what it is taking so so long to update
it?
Cheers
Goncalo
--- * ---
1) Reports from a client which shows "obsolete" file/directory
sizes:
# ll -h /coepp/cephfs/mel/ | grep user1
drwxr-xr-x 1 user1 coepp_mel 4.9T Oct 7 00:20 user1
# getfattr -d -m ceph /coepp/cephfs/mel/user1
getfattr: Removing leading '/' from absolute path names
# file: coepp/cephfs/mel/user1
ceph.dir.entries="10"
ceph.dir.files="1"
ceph.dir.rbytes="5285584388909"
ceph.dir.rctime="1480390891.09882864298"
ceph.dir.rentries="161047"
ceph.dir.rfiles="149669"
ceph.dir.rsubdirs="11378"
ceph.dir.subdirs="9"
---> Running following command in the client:
# ceph daemon /var/run/ceph/ceph-client.mount_user.asok
mds_sessions
{
"id": 616794,
"sessions": [
{
"mds": 0,
"addr": "<MDS IP>:6800\/1457",
"seq": 4884237,
"cap_gen": 0,
"cap_ttl": "2016-12-04 22:45:53.046697",
"last_cap_renew_request": "2016-12-04
22:44:53.046697",
"cap_renew_seq": 166765,
"num_caps": 1567318,
"state": "open"
}
],
"mdsmap_epoch": 5224
}
---> Running the following command in the mds:
# ceph daemon mds.rccephmds session ls
(...)
{
"id": 616794,
"num_leases": 0,
"num_caps": 21224,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.616794 <BAD CLIENT
IP>:0\/68088301",
"client_metadata": {
"ceph_sha1":
"45107e21c568dd033c2f0a3107dec8f0b0e58374",
"ceph_version": "ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374)",
"entity_id": "mount_user",
"hostname": "badclient.my.domain",
"mount_point": "\/coepp\/cephfs",
"root": "\/cephfs"
}
},
2) Reports from a client which shows "good" file/directory sizes:
# ll -h /coepp/cephfs/mel/ | grep user1
drwxr-xr-x 1 user1 coepp_mel 576G Oct 7 00:20 user1
# getfattr -d -m ceph /coepp/cephfs/mel/user1
getfattr: Removing leading '/' from absolute path names
# file: coepp/cephfs/mel/user1
ceph.dir.entries="10"
ceph.dir.files="1"
ceph.dir.rbytes="617756983774"
ceph.dir.rctime="1480844101.09560671770"
ceph.dir.rentries="96519"
ceph.dir.rfiles="95091"
ceph.dir.rsubdirs="1428"
ceph.dir.subdirs="9"
---> Running following command in the client:
# ceph daemon /var/run/ceph/ceph-client.mount_user.asok
mds_sessions
{
"id": 616338,
"sessions": [
{
"mds": 0,
"addr": "<MDS IP>:6800\/1457",
"seq": 7851161,
"cap_gen": 0,
"cap_ttl": "2016-12-04 23:32:30.041978",
"last_cap_renew_request": "2016-12-04
23:31:30.041978",
"cap_renew_seq": 169143,
"num_caps": 311386,
"state": "open"
}
],
"mdsmap_epoch": 5224
}
---> Running following command in the mds:
{
"id": 616338,
"num_leases": 0,
"num_caps": 16078,
"state": "open",
"replay_requests": 0,
"completed_requests": 0,
"reconnecting": false,
"inst": "client.616338 <GOOD CLIENT
IP>:0\/3807825927",
"client_metadata": {
"ceph_sha1":
"45107e21c568dd033c2f0a3107dec8f0b0e58374",
"ceph_version": "ceph version 10.2.2
(45107e21c568dd033c2f0a3107dec8f0b0e58374)",
"entity_id": "mount_user",
"hostname": "goodclient.my.domain",
"mount_point": "\/coepp\/cephfs",
"root": "\/cephfs"
}
},
--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW 2006
T: +61 2 93511937
|