On Mon, Oct 9, 2017 at 9:21 AM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote: > Dear All, > > We have a new cluster based on v12.2.1 > > After three days of copying 300TB data into cephfs, > we have started getting the following Health errors: > > # ceph health > HEALTH_WARN 9 clients failing to advance oldest client/flush tid; > > 1 MDSs report slow requests; 1 MDSs behind on trimming > > ceph-mds.ceph1.log shows entries like: > > 2017-10-09 08:42:30.935955 7feeaf263700 0 log_channel(cluster) log > [WRN] : client.5023 does not advance its oldest_client_tid (5760998), > 100000 completed requests recorded in session This is something to be quite wary of -- because something is going wrong with the client completing its requests, the MDS is unable to drop its in-memory record of the client requests, and it will be consuming an increasing amount of memory over time, and trying to write ever-larger sessions to disk. Eventually, the MDS will become unable to write its session table, which is a pretty bad position to be in. If it was my cluster, I would be inclined to schedule a nightly unmount,mount of the client, to keep the system safe while you're investigating the issue. > Performance has been very good; parallel rsync was running at 1.1 > > 2GB/s, allowing us to copy 300TB of data in 72 hours. > > [root@ceph1 ceph]# ceph df > GLOBAL: > SIZE AVAIL RAW USED %RAW USED > 730T 330T 400T 54.80 > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > ecpool 1 316T 62.24 153T 89269703 > mds_nvme 2 188G 8.18 706G 368806 > > > The cluster has 10 nodes, each with 10x 8TB drives. > We are using EC8+2, no upper tier, i.e. allow_ec_overwrites true. > Four nodes have nvme drives, used for 3x replicated MDS metadata. > > We have a single MDS server, snapshot cephfs every 10 minutes, then > delete all snapshots older than 24 hours, apart from midnight snapshots. The use of snapshots would be where I'd start investigating: if you stop making snapshots, and mount a fresh client, does that client still have the issue when it does a bunch of requests? You can check how the client is doing with the "ceph tell mds.<id> session ls" output: if the "completed requests" value keeps going up indefinitely, you're having the buggy behaviour. (Hopefully you got the message about snapshots being experimental when you enabled the feature.) > We use ceph-fuse client on all OSD nodes. The parallel rsync is run > directly on them. Hardware consists of dual Xeon E5-2620v4, with 64GB > ram, 10Gb eth, OS is SL 7.4. Just to check, the ceph-fuse packages on the clients are also 12.2.1? John > > Any ideas? > > thanks, > > Jake > > -- > Jake Grimmett > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com