Dear All, We have a new cluster based on v12.2.1 After three days of copying 300TB data into cephfs, we have started getting the following Health errors: # ceph health HEALTH_WARN 9 clients failing to advance oldest client/flush tid; 1 MDSs report slow requests; 1 MDSs behind on trimming ceph-mds.ceph1.log shows entries like: 2017-10-09 08:42:30.935955 7feeaf263700 0 log_channel(cluster) log [WRN] : client.5023 does not advance its oldest_client_tid (5760998), 100000 completed requests recorded in session Performance has been very good; parallel rsync was running at 1.1 > 2GB/s, allowing us to copy 300TB of data in 72 hours. [root@ceph1 ceph]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 730T 330T 400T 54.80 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS ecpool 1 316T 62.24 153T 89269703 mds_nvme 2 188G 8.18 706G 368806 The cluster has 10 nodes, each with 10x 8TB drives. We are using EC8+2, no upper tier, i.e. allow_ec_overwrites true. Four nodes have nvme drives, used for 3x replicated MDS metadata. We have a single MDS server, snapshot cephfs every 10 minutes, then delete all snapshots older than 24 hours, apart from midnight snapshots. We use ceph-fuse client on all OSD nodes. The parallel rsync is run directly on them. Hardware consists of dual Xeon E5-2620v4, with 64GB ram, 10Gb eth, OS is SL 7.4. Any ideas? thanks, Jake -- Jake Grimmett _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com