Dear Cephers, We've upgraded the back end of our cluster from Jewel (10.2.10) to Luminous (12.2.2). The upgrade went smoothly for the most part, except we seem to be hitting an issue with cephfs. After about a day or two of use, the MDS start complaining about clients failing to respond to cache pressure: [root@cephmon00 ~]# ceph -s We are using exclusively the 12.2.2 fuse client on about 350 nodes or so (out of which it seems 100 are not responding to cache pressure in this log). When this happens, clients appear pretty sluggish also (listing directories, etc.). After bouncing the MDS, everything returns on normal after the failover for a while. Ignore the message about 1 OSD down, that corresponds to a failed drive and all data has been re-replicated since. We were also using the 12.2.2 fuse client with the Jewel back end before the upgrade, and have not seen this issue. We are running with a larger MDS cache than usual, we have mds_cache_size set to 4 million. All other MDS configs are the defaults. Is this a known issue? If not, any hints on how to further diagnose the problem? Andras |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com