"Failed to authpin" results in large number of blocked requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We're running a Ceph mimic (13.2.4) cluster which is predominantly used for CephFS. We have recently switched to using multiple active MDSes to cope with load on the cluster, but are experiencing problems with large numbers of blocked requests when research staff run large experiments. The error associated with the block is:

2019-03-28 09:31:34.246326 [WRN]  6 slow requests, 0 included below; oldest blocked for > 423.987868 secs 2019-03-28 09:31:29.246202 [WRN]  slow request 62.572806 seconds old, received at 2019-03-28 09:30:26.673298: client_request(client.5882168:1404749 lookup #0x10000000441/run_output 2019-03-28 09:30:26.653089 caller_uid=0, caller_gid=0{}) currently failed to authpin, subtree is being exported

Eventually, many hundreds of requests are blocked for hours.

It appears (As alluded to by the subtree is being exported error) that this is related to the MDSes remapping entries between ranks under load, as it is always accompanied by messages along the lines of "mds.0.migrator nicely exporting to mds.1". Migrations that occur when the cluster is not under heavy load complete OK, but under load it seems the operation is not completed or entering deadlock for some reason.

We can clear the immediate problem by restarting the affected MDS, and have a partial solution by using subtree pinning on everything but this is far from ideal.  Does anyone have any pointers where else we should be looking to troubleshoot this?

Thanks,

Zoe.


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux