Hi, You said you dropped caches -- can you try again echo 3 > /proc/sys/vm/drop_caches ? Otherwise, does umount then mount from one of the clients clear the warning? (I don't believe this is due to a "busy client", but rather a kernel client bug where it doesn't release caps in some cases -- we've seen this in the past but not recently). -- Dan On Fri, Oct 30, 2020 at 10:13 AM Frank Schilder <frans@xxxxxx> wrote: > > Dear cephers, > > I have a somewhat strange situation. I have the health warning: > > # ceph health detail > HEALTH_WARN 3 clients failing to respond to capability release > MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release > mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30716617 > mdsceph-12(mds.0): Client sn269.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30717358 > mdsceph-12(mds.0): Client sn009.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30749150 > > However, these clients are not busy right now. Also, they hold almost nothing; see snippets from "session ls" below. It is possible that a very IO intensive application was running on these nodes and these release requests got stuck. How do I resolve this issue? Can I just evict the client? > > Version is mimic 13.2.8. Note that we execute a drop cache command after a job finishes on these clients. Its possible that the clients dropped the caps already before the MDS request was handled/received. > > Best regards, > Frank > > { > "id": 30717358, > "num_leases": 0, > "num_caps": 44, > "state": "open", > "request_load_avg": 0, > "uptime": 6632206.332307, > "replay_requests": 0, > "completed_requests": 0, > "reconnecting": false, > "inst": "client.30717358 192.168.57.140:0/3212676185", > "client_metadata": { > "features": "00000000000000ff", > "entity_id": "con-fs2-hpc", > "hostname": "sn269.hpc.ait.dtu.dk", > "kernel_version": "3.10.0-957.12.2.el7.x86_64", > "root": "/hpc/home" > } > }, > -- > { > "id": 30716617, > "num_leases": 0, > "num_caps": 48, > "state": "open", > "request_load_avg": 1, > "uptime": 6632206.336307, > "replay_requests": 0, > "completed_requests": 1, > "reconnecting": false, > "inst": "client.30716617 192.168.56.233:0/2770977433", > "client_metadata": { > "features": "00000000000000ff", > "entity_id": "con-fs2-hpc", > "hostname": "sn106.hpc.ait.dtu.dk", > "kernel_version": "3.10.0-957.12.2.el7.x86_64", > "root": "/hpc/home" > } > }, > -- > { > "id": 30749150, > "num_leases": 0, > "num_caps": 44, > "state": "open", > "request_load_avg": 0, > "uptime": 6632206.338307, > "replay_requests": 0, > "completed_requests": 0, > "reconnecting": false, > "inst": "client.30749150 192.168.56.136:0/2578719015", > "client_metadata": { > "features": "00000000000000ff", > "entity_id": "con-fs2-hpc", > "hostname": "sn009.hpc.ait.dtu.dk", > "kernel_version": "3.10.0-957.12.2.el7.x86_64", > "root": "/hpc/home" > } > }, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx