Hi John, Many thanks for getting back to me. Yes, I did see the "experimental" label on snapshots... After reading other posts, I got the impression that cephfs snapshots might be OK; provided you used a single active MDS and the latest ceph fuse client, both of which we have. Anyhow as you predicted, the flush errors led to our MDS server crashing, and crashing badly; The MDS now refuses to restart, giving journal replay errors in it's log like this: /root/ceph/ceph-12.2.1/src/mds/journal.cc: In function 'virtual void EOpen::replay(MDSRan k*)' thread 7f7950d4a700 time 2017-10-09 17:14:54.115094 /root/ceph/ceph-12.2.1/src/mds/journal.cc: 2214: FAILED assert(in) Thankfully this cluster is only used to mirror scratch data, so nothing of great value has been lost. I can just wipe everything... :) However, given the fantastic performance we were getting, and economy of erasure encoding, *if* cephfs snapshots were "Bullet-proof", I could easily see ourselves, and other places using ceph for large data sets. This is frustratingly close to perfect, so can I ask if reliable snapshots are very far away? Is this a bug patch in Luminous, or sorry, wait for Mimic? In the meantime, can anything else be done to reduce the failure rate? i.e. would it be significantly safer to make a single daily snapshot, and only keep 7 of these? Does snapshot reliability decrease if there is a large delta in the number of files, or large amount of data in each snapshot ? any other tricks that you can suggest are most welcome... again, many thanks for your time, Jake On 09/10/17 16:37, John Spray wrote: > On Mon, Oct 9, 2017 at 9:21 AM, Jake Grimmett <jog@xxxxxxxxxxxxxxxxx> wrote: >> Dear All, >> >> We have a new cluster based on v12.2.1 >> >> After three days of copying 300TB data into cephfs, >> we have started getting the following Health errors: >> >> # ceph health >> HEALTH_WARN 9 clients failing to advance oldest client/flush tid; >> >> 1 MDSs report slow requests; 1 MDSs behind on trimming >> >> ceph-mds.ceph1.log shows entries like: >> >> 2017-10-09 08:42:30.935955 7feeaf263700 0 log_channel(cluster) log >> [WRN] : client.5023 does not advance its oldest_client_tid (5760998), >> 100000 completed requests recorded in session > > This is something to be quite wary of -- because something is going > wrong with the client completing its requests, the MDS is unable to > drop its in-memory record of the client requests, and it will be > consuming an increasing amount of memory over time, and trying to > write ever-larger sessions to disk. Eventually, the MDS will become > unable to write its session table, which is a pretty bad position to > be in. > > If it was my cluster, I would be inclined to schedule a nightly > unmount,mount of the client, to keep the system safe while you're > investigating the issue. > > > >> Performance has been very good; parallel rsync was running at 1.1 > >> 2GB/s, allowing us to copy 300TB of data in 72 hours. >> >> [root@ceph1 ceph]# ceph df >> GLOBAL: >> SIZE AVAIL RAW USED %RAW USED >> 730T 330T 400T 54.80 >> POOLS: >> NAME ID USED %USED MAX AVAIL OBJECTS >> ecpool 1 316T 62.24 153T 89269703 >> mds_nvme 2 188G 8.18 706G 368806 >> >> >> The cluster has 10 nodes, each with 10x 8TB drives. >> We are using EC8+2, no upper tier, i.e. allow_ec_overwrites true. >> Four nodes have nvme drives, used for 3x replicated MDS metadata. >> >> We have a single MDS server, snapshot cephfs every 10 minutes, then >> delete all snapshots older than 24 hours, apart from midnight snapshots. > > The use of snapshots would be where I'd start investigating: if you > stop making snapshots, and mount a fresh client, does that client > still have the issue when it does a bunch of requests? You can check > how the client is doing with the "ceph tell mds.<id> session ls" > output: if the "completed requests" value keeps going up indefinitely, > you're having the buggy behaviour. > > (Hopefully you got the message about snapshots being experimental when > you enabled the feature.) > >> We use ceph-fuse client on all OSD nodes. The parallel rsync is run >> directly on them. Hardware consists of dual Xeon E5-2620v4, with 64GB >> ram, 10Gb eth, OS is SL 7.4. > > Just to check, the ceph-fuse packages on the clients are also 12.2.1? > > John > >> >> Any ideas? >> >> thanks, >> >> Jake >> >> -- >> Jake Grimmett >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com