It looks similar to these: https://tracker.ceph.com/issues/39987 [The user reporting this was me as well] https://tracker.ceph.com/issues/42338 Issue 39987 was fixed a long time ago by Zheng Yan. A search for "currently failed to authpin, subtree is being exported" only returns hits regarding the 2 issues above. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 19 September 2021 10:11:27 To: ceph-users Subject: ceph fs service outage: currently failed to authpin, subtree is being exported Guten Tag. Our file system is out of operation (mimic 13.2.10). Our MDSes are choking on an operation: 2021-09-19 02:23:36.432664 mon.ceph-01 mon.0 192.168.32.65:6789/0 185676 : cluster [WRN] Health check failed: 1 MDSs repor t slow requests (MDS_SLOW_REQUEST) [...] 2021-09-19 02:23:34.909269 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1662 : cluster [WRN] 33 slow requests, 5 included below; oldest blocked for > 32.729621 secs 2021-09-19 02:23:34.909277 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1663 : cluster [WRN] slow request 31.104289 seconds old, received at 2021-09-19 02:23:03.804307: client_request(client.44559846:1121833 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported 2021-09-19 02:23:34.909280 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1664 : cluster [WRN] slow request 31.104254 seconds old, received at 2021-09-19 02:23:03.804343: client_request(client.44559846:1121834 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported 2021-09-19 02:23:34.909283 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1665 : cluster [WRN] slow request 31.104231 seconds old, received at 2021-09-19 02:23:03.804365: client_request(client.44559846:1121835 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported 2021-09-19 02:23:34.909285 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1666 : cluster [WRN] slow request 31.104213 seconds old, received at 2021-09-19 02:23:03.804384: client_request(client.44559846:1121836 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported 2021-09-19 02:23:34.909288 mds.ceph-15 mds.3 192.168.32.79:6872/3838509256 1667 : cluster [WRN] slow request 31.104142 seconds old, received at 2021-09-19 02:23:03.804455: client_request(client.44559846:1121837 lookup #0x100000167f6/naikar 2021-09-19 02:23:03.803359 caller_uid=260500, caller_gid=260500{}) currently failed to authpin, subtree is being exported By now, several thousand authpin operations are stuck for hours already. The file system is basically inoperational and work is piling up: # ceph health detail HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming; 20 large omap objects MDS_SLOW_REQUEST 1 MDSs report slow requests mdsceph-15(mds.3): 1554 slow requests are blocked > 30 secs MDS_TRIM 2 MDSs behind on trimming mdsceph-23(mds.0): Behind on trimming (7651/128) max_segments: 128, num_segments: 7651 mdsceph-15(mds.3): Behind on trimming (4888/128) max_segments: 128, num_segments: 4888 I would be grateful for advice on how to get out of this. Current fs status is: # ceph fs status con-fs2 - 1636 clients ======= +------+--------+---------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+---------+---------------+-------+-------+ | 0 | active | ceph-23 | Reqs: 2 /s | 2024k | 2019k | | 1 | active | ceph-12 | Reqs: 0 /s | 1382k | 1374k | | 2 | active | ceph-08 | Reqs: 0 /s | 998k | 926k | | 3 | active | ceph-15 | Reqs: 0 /s | 1373k | 1272k | +------+--------+---------+---------------+-------+-------+ +---------------------+----------+-------+-------+ | Pool | type | used | avail | +---------------------+----------+-------+-------+ | con-fs2-meta1 | metadata | 102G | 1252G | | con-fs2-meta2 | data | 0 | 1252G | | con-fs2-data | data | 1359T | 6003T | | con-fs2-data-ec-ssd | data | 239G | 4006G | | con-fs2-data2 | data | 56.0T | 5457T | +---------------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ceph-16 | | ceph-14 | | ceph-13 | | ceph-17 | | ceph-10 | | ceph-24 | | ceph-09 | | ceph-11 | +-------------+ MDS version: ceph version 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable) Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx