Re: ceph-fuse clients taking too long to update dir sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 4, 2016 at 11:51 PM, Goncalo Borges
<goncalo.borges@xxxxxxxxxxxxx> wrote:
> Dear CephFSers.
>
> We are running ceph/cephfs in 10.2.2. All infrastructure is in the same
> version (rados cluster, mons, mds and cephfs clients). We mount cephfs using
> ceph-fuse.
>
> Last week I triggered some of my heavy users to delete data. In the
> following example, the user in question decreased his usage from ~4.5TB to ~
> 600TB. However, some clients still did not update the usage (although
> several days have passed by) while others are ok.
>
> From a point of view of the MDS, both types of client have healthy sessions.
> See detailed info after this email.
>
> Trying to kick the session does not solve the issue. Probably only a remount
> but users are heavily using the filesystem and I do not want to break things
> for them now.
>
>
> The only difference I can actually dig out between "good"/"bad" clients is
> that the user continues with active bash sessions in the "bad" client (from
> where he triggered the deletions)

You're saying that the clients that actually did the deletions are the
ones with the bad rstats, but the other clients are getting the
updates?  Really weird.

> # lsof | grep user1 | grep ceph
> bash      15737   user1  cwd       DIR               0,24 5285584388909
> 1099514070586 /coepp/cephfs/mel/user1
> vim       19233   user1  cwd       DIR               0,24      24521126
> 1099514340633 /coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
> vim       19233   user1    5u      REG               0,24         16384
> 1099557935412
> /coepp/cephfs/mel/user1/Analysis/ssdilep/scripts/.histmgr.py.swp
> bash      24187   user1  cwd       DIR               0,24     826758558
> 1099514314315 /coepp/cephfs/mel/user1/Analysis
> bash      24256   user1  cwd       DIR               0,24        147600
> 1099514340621 /coepp/cephfs/mel/user1/Analysis/ssdilep/run
> bash      24327   user1  cwd       DIR               0,24        151068
> 1099514340590 /coepp/cephfs/mel/user1/Analysis/ssdilep/algs
> bash      24394   user1  cwd       DIR               0,24        151068
> 1099514340590 /coepp/cephfs/mel/user1/Analysis/ssdilep/algs
> bash      24461   user1  cwd       DIR               0,24        356436
> 1099514340614 /coepp/cephfs/mel/user1/Analysis/ssdilep/samples
> bash      24528   user1  cwd       DIR               0,24      24521126
> 1099514340633 /coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
> bash      24601   user1  cwd       DIR               0,24      24521126
> 1099514340633 /coepp/cephfs/mel/user1/Analysis/ssdilep/scripts
>
> Is there a particular way to force the client to update these info? Do we
> actually know what it is taking so so long to update it?

Recursive statistics are meant to be updated somewhat lazily, but
obviously they are meant to *eventually* update, so if days are going
by without them catching up then that's a bug.

Could you try and come up with a simple reproducer, perhaps with just
two clients involved?

John

> Cheers
>
> Goncalo
>
> --- * ---
>
>
> 1) Reports from a client which shows "obsolete" file/directory sizes:
>
> # ll -h /coepp/cephfs/mel/ | grep user1
> drwxr-xr-x 1 user1      coepp_mel 4.9T Oct  7 00:20 user1
>
> # getfattr -d -m ceph /coepp/cephfs/mel/user1
> getfattr: Removing leading '/' from absolute path names
> # file: coepp/cephfs/mel/user1
> ceph.dir.entries="10"
> ceph.dir.files="1"
> ceph.dir.rbytes="5285584388909"
> ceph.dir.rctime="1480390891.09882864298"
> ceph.dir.rentries="161047"
> ceph.dir.rfiles="149669"
> ceph.dir.rsubdirs="11378"
> ceph.dir.subdirs="9"
>
> ---> Running following command in the client:
> # ceph daemon /var/run/ceph/ceph-client.mount_user.asok mds_sessions
> {
>     "id": 616794,
>     "sessions": [
>         {
>             "mds": 0,
>             "addr": "<MDS IP>:6800\/1457",
>             "seq": 4884237,
>             "cap_gen": 0,
>             "cap_ttl": "2016-12-04 22:45:53.046697",
>             "last_cap_renew_request": "2016-12-04 22:44:53.046697",
>             "cap_renew_seq": 166765,
>             "num_caps": 1567318,
>             "state": "open"
>         }
>     ],
>     "mdsmap_epoch": 5224
> }
>
> ---> Running the following command in the mds:
> # ceph daemon mds.rccephmds session ls
> (...)
>
>    {
>         "id": 616794,
>         "num_leases": 0,
>         "num_caps": 21224,
>         "state": "open",
>         "replay_requests": 0,
>         "completed_requests": 0,
>         "reconnecting": false,
>         "inst": "client.616794 <BAD CLIENT IP>:0\/68088301",
>         "client_metadata": {
>             "ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",
>             "ceph_version": "ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)",
>             "entity_id": "mount_user",
>             "hostname": "badclient.my.domain",
>             "mount_point": "\/coepp\/cephfs",
>             "root": "\/cephfs"
>         }
>     },
>
> 2) Reports from a client which shows "good" file/directory sizes:
>
> # ll -h /coepp/cephfs/mel/ | grep user1
> drwxr-xr-x 1 user1      coepp_mel 576G Oct  7 00:20 user1
>
> # getfattr -d -m ceph /coepp/cephfs/mel/user1
> getfattr: Removing leading '/' from absolute path names
> # file: coepp/cephfs/mel/user1
> ceph.dir.entries="10"
> ceph.dir.files="1"
> ceph.dir.rbytes="617756983774"
> ceph.dir.rctime="1480844101.09560671770"
> ceph.dir.rentries="96519"
> ceph.dir.rfiles="95091"
> ceph.dir.rsubdirs="1428"
> ceph.dir.subdirs="9"
>
> ---> Running following command in the client:
> # ceph daemon /var/run/ceph/ceph-client.mount_user.asok mds_sessions
> {
>     "id": 616338,
>     "sessions": [
>         {
>             "mds": 0,
>             "addr": "<MDS IP>:6800\/1457",
>             "seq": 7851161,
>             "cap_gen": 0,
>             "cap_ttl": "2016-12-04 23:32:30.041978",
>             "last_cap_renew_request": "2016-12-04 23:31:30.041978",
>             "cap_renew_seq": 169143,
>             "num_caps": 311386,
>             "state": "open"
>         }
>     ],
>     "mdsmap_epoch": 5224
> }
>
>
>     ---> Running following command in the mds:
>
>     {
>         "id": 616338,
>         "num_leases": 0,
>         "num_caps": 16078,
>         "state": "open",
>         "replay_requests": 0,
>         "completed_requests": 0,
>         "reconnecting": false,
>         "inst": "client.616338 <GOOD CLIENT IP>:0\/3807825927",
>         "client_metadata": {
>             "ceph_sha1": "45107e21c568dd033c2f0a3107dec8f0b0e58374",
>             "ceph_version": "ceph version 10.2.2
> (45107e21c568dd033c2f0a3107dec8f0b0e58374)",
>             "entity_id": "mount_user",
>             "hostname": "goodclient.my.domain",
>             "mount_point": "\/coepp\/cephfs",
>             "root": "\/cephfs"
>         }
>     },
>
>
> --
> Goncalo Borges
> Research Computing
> ARC Centre of Excellence for Particle Physics at the Terascale
> School of Physics A28 | University of Sydney, NSW  2006
> T: +61 2 93511937
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux