ceph fs service outage: currently failed to authpin, subtree is being exported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Guten Tag.

Our file system is out of operation (mimic 13.2.10). Our MDSes are choking on an operation:

2021-09-19 02:23:36.432664 mon.ceph-01 mon.0 192.168.32.65:6789/0 185676 : cluster [WRN] Health check failed: 1 MDSs repor
t slow requests (MDS_SLOW_REQUEST)
[...]
2021-09-19 02:23:34.909269 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1662 : cluster [WRN] 33 slow requests, 5 included below; oldest blocked for > 32.729621 secs
2021-09-19 02:23:34.909277 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1663 : cluster [WRN] slow request 31.104289 seconds old, received at 2021-09-19 02:23:03.804307: client_request(client.44559846:1121833 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909280 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1664 : cluster [WRN] slow request 31.104254 seconds old, received at 2021-09-19 02:23:03.804343: client_request(client.44559846:1121834 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909283 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1665 : cluster [WRN] slow request 31.104231 seconds old, received at 2021-09-19 02:23:03.804365: client_request(client.44559846:1121835 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909285 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1666 : cluster [WRN] slow request 31.104213 seconds old, received at 2021-09-19 02:23:03.804384: client_request(client.44559846:1121836 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported
2021-09-19 02:23:34.909288 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1667 : cluster [WRN] slow request 31.104142 seconds old, received at 2021-09-19 02:23:03.804455: client_request(client.44559846:1121837 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported

By now, several thousand authpin operations are stuck for hours already. The file system is basically inoperational and work is piling up:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming; 20 large omap objects
MDS_SLOW_REQUEST 1 MDSs report slow requests
    mdsceph-15(mds.3): 1554 slow requests are blocked > 30 secs
MDS_TRIM 2 MDSs behind on trimming
    mdsceph-23(mds.0): Behind on trimming (7651/128) max_segments: 128, num_segments: 7651
    mdsceph-15(mds.3): Behind on trimming (4888/128) max_segments: 128, num_segments: 4888

I would be grateful for advice on how to get out of this. Current fs status is:

# ceph fs status
con-fs2 - 1636 clients
=======
+------+--------+---------+---------------+-------+-------+
| Rank | State  |   MDS   |    Activity   |  dns  |  inos |
+------+--------+---------+---------------+-------+-------+
|  0   | active | ceph-23 | Reqs:    2 /s | 2024k | 2019k |
|  1   | active | ceph-12 | Reqs:    0 /s | 1382k | 1374k |
|  2   | active | ceph-08 | Reqs:    0 /s |  998k |  926k |
|  3   | active | ceph-15 | Reqs:    0 /s | 1373k | 1272k |
+------+--------+---------+---------------+-------+-------+
+---------------------+----------+-------+-------+
|         Pool        |   type   |  used | avail |
+---------------------+----------+-------+-------+
|    con-fs2-meta1    | metadata |  102G | 1252G |
|    con-fs2-meta2    |   data   |    0  | 1252G |
|     con-fs2-data    |   data   | 1359T | 6003T |
| con-fs2-data-ec-ssd |   data   |  239G | 4006G |
|    con-fs2-data2    |   data   | 56.0T | 5457T |
+---------------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|   ceph-16   |
|   ceph-14   |
|   ceph-13   |
|   ceph-17   |
|   ceph-10   |
|   ceph-24   |
|   ceph-09   |
|   ceph-11   |
+-------------+
MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable)

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux