MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear cephers,

I have a somewhat strange situation. I have the health warning:

# ceph health detail
HEALTH_WARN 3 clients failing to respond to capability release
MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release
    mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30716617
    mdsceph-12(mds.0): Client sn269.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30717358
    mdsceph-12(mds.0): Client sn009.hpc.ait.dtu.dk:con-fs2-hpc failing to respond to capability release client_id: 30749150

However, these clients are not busy right now. Also, they hold almost nothing; see snippets from "session ls" below. It is possible that a very IO intensive application was running on these nodes and these release requests got stuck. How do I resolve this issue? Can I just evict the client?

Version is mimic 13.2.8. Note that we execute a drop cache command after a job finishes on these clients. Its possible that the clients dropped the caps already before the MDS request was handled/received.

Best regards,
Frank

    {
        "id": 30717358,
        "num_leases": 0,
        "num_caps": 44,
        "state": "open",
        "request_load_avg": 0,
        "uptime": 6632206.332307,
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.30717358 192.168.57.140:0/3212676185",
        "client_metadata": {
            "features": "00000000000000ff",
            "entity_id": "con-fs2-hpc",
            "hostname": "sn269.hpc.ait.dtu.dk",
            "kernel_version": "3.10.0-957.12.2.el7.x86_64",
            "root": "/hpc/home"
        }
    },
--
    {
        "id": 30716617,
        "num_leases": 0,
        "num_caps": 48,
        "state": "open",
        "request_load_avg": 1,
        "uptime": 6632206.336307,
        "replay_requests": 0,
        "completed_requests": 1,
        "reconnecting": false,
        "inst": "client.30716617 192.168.56.233:0/2770977433",
        "client_metadata": {
            "features": "00000000000000ff",
            "entity_id": "con-fs2-hpc",
            "hostname": "sn106.hpc.ait.dtu.dk",
            "kernel_version": "3.10.0-957.12.2.el7.x86_64",
            "root": "/hpc/home"
        }
    },
--
    {
        "id": 30749150,
        "num_leases": 0,
        "num_caps": 44,
        "state": "open",
        "request_load_avg": 0,
        "uptime": 6632206.338307,
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.30749150 192.168.56.136:0/2578719015",
        "client_metadata": {
            "features": "00000000000000ff",
            "entity_id": "con-fs2-hpc",
            "hostname": "sn009.hpc.ait.dtu.dk",
            "kernel_version": "3.10.0-957.12.2.el7.x86_64",
            "root": "/hpc/home"
        }
    },

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux