On Fri, Oct 7, 2016 at 8:20 AM, Kjetil Jørgensen <kjetil@xxxxxxxxxxxx> wrote: > And - I just saw another recent thread - > http://tracker.ceph.com/issues/17177 - can be an explanation of most/all of > the above ? > > Next question(s) would then be: > > How would one deal with duplicate stray(s) Here is an untested method list omap keys in objects 600.00000000 ~ 609.00000000. find all duplicated keys for each duplicated keys, use ceph-dencoder to decode their values, find the one has the biggest version and delete the rest (ceph-dencoder type inode_t skip 9 import /tmp/ decode dump_json) Regards Yan, Zheng > How would one deal with mismatch between head items and fnode.fragstat, ceph > daemon mds.foo scrub_path ? > > -KJ > > On Thu, Oct 6, 2016 at 5:05 PM, Kjetil Jørgensen <kjetil@xxxxxxxxxxxx> > wrote: >> >> Hi, >> >> context (i.e. what we're doing): We're migrating (or trying to) migrate >> off of an nfs server onto cephfs, for a workload that's best described as >> "big piles" of hardlinks. Essentially, we have a set of "sources": >> foo/01/<aa><rest-of-md5> >> foo/0b/<0b><rest-of-md5> >> .. and so on >> bar/02/.. >> bar/0c/.. >> .. and so on >> >> foo/bar/friends have been "cloned" numerous times to a set of names that >> over the course of weeks end up being recycled again, the clone is >> essentially cp -L foo copy-1-of-foo. >> >> We're doing "incremental" rsyncs of this onto cephfs, so the sense of "the >> original source of the hardlink" will end up moving around, depending on the >> whims of rsync. (if it matters, I found some allusion to "if the original >> file hardlinked is deleted, ...". >> >> For RBD the ceph cluster have mostly been rather well behaved, the >> problems we have had have for the most part been self-inflicted. Before >> introducing the hardlink spectacle to cephfs, the same filesystem were used >> for light-ish read-mostly loads, beint mostly un-eventful. (That being said, >> we did patch it for >> >> Cluster is v10.2.2 (mds v10.2.2+4d15eb12298e007744486e28924a6f0ae071bd06), >> clients are ubuntu's 4.4.0-32 kernel(s), and elrepo v4.4.4. >> >> The problems we're facing: >> >> Maybe a "non-problem" I have ~6M strays sitting around >> Slightly more problematic, I have duplicate stray(s) ? See log excercepts >> below. Also; rados -p cephfs_metadata listomapkeys 60X.00000000 did/does >> seem to agree with there being duplicate strays (assuming 60X.00000000 is >> the directory indexes for the stray catalogs), caveat "not a perfect >> snapshot", listomapkeys issued in serial fashion. >> We stumbled across (http://tracker.ceph.com/issues/17177 - mostly here for >> more context) >> There's been a couple of instances of invalid backtrace(s), mostly solved >> by either mds:scrub_path or just unlinking the files/directories in question >> and re-rsync-ing. >> mismatch between head items and fnode.fragstat (See below for more of the >> log excercept), appeared to have been solved by mds:scrub_path >> >> >> Duplicate stray(s), ceph-mds complains (a lot, during rsync): >> 2016-09-30 20:00:21.978314 7ffb653b8700 0 mds.0.cache.dir(603) _fetched >> badness: got (but i already had) [inode 10003f25eaf [...2,head] >> ~mds0/stray0/10003f25eaf auth v38836572 s=8998 nl=5 n(v0 b8998 1=1+0) >> (iversion lock) 0x561082e6b520] mode 33188 mtime 2016-07-25 03:02:50.000000 >> 2016-09-30 20:00:21.978336 7ffb653b8700 -1 log_channel(cluster) log [ERR] >> : loaded dup inode 10003f25eaf [2,head] v36792929 at >> ~mds0/stray3/10003f25eaf, but inode 10003f25eaf.head v38836572 already >> exists at ~mds0/stray0/10003f25eaf >> >> I briefly ran ceph-mds with debug_mds=20/20 which didn't yield anything >> immediately useful, beyond slightly-easier-to-follow the control-flow of >> src/mds/CDir.cc without becoming much wiser. >> 2016-09-30 20:43:51.910754 7ffb653b8700 20 mds.0.cache.dir(606) _fetched >> pos 310473 marker 'I' dname '100022e8617 [2,head] >> 2016-09-30 20:43:51.910757 7ffb653b8700 20 mds.0.cache.dir(606) lookup >> (head, '100022e8617') >> 2016-09-30 20:43:51.910759 7ffb653b8700 20 mds.0.cache.dir(606) miss -> >> (10002a81c10,head) >> 2016-09-30 20:43:51.910762 7ffb653b8700 0 mds.0.cache.dir(606) _fetched >> badness: got (but i already had) [inode 100022e8617 [...2,head] >> ~mds0/stray9/100022e8617 auth v39303851 s=11470 nl=10 n(v0 b11470 1=1+0) >> (iversion lock) 0x560c013904b8] mode 33188 mtime 2016-07-25 03:38:01.000000 >> 2016-09-30 20:43:51.910772 7ffb653b8700 -1 log_channel(cluster) log [ERR] >> : loaded dup inode 100022e8617 [2,head] v39284583 at >> ~mds0/stray6/100022e8617, but inode 100022e8617.head v39303851 already >> exists at ~mds0/stray9/100022e8617 >> >> >> 2016-09-25 06:23:50.947761 7ffb653b8700 1 mds.0.cache.dir(10003439a33) >> mismatch between head items and fnode.fragstat! printing dentries >> 2016-09-25 06:23:50.947779 7ffb653b8700 1 mds.0.cache.dir(10003439a33) >> get_num_head_items() = 36; fnode.fragstat.nfiles=53 >> fnode.fragstat.nsubdirs=0 >> 2016-09-25 06:23:50.947782 7ffb653b8700 1 mds.0.cache.dir(10003439a33) >> mismatch between child accounted_rstats and my rstats! >> 2016-09-25 06:23:50.947803 7ffb653b8700 1 mds.0.cache.dir(10003439a33) >> total of child dentrys: n(v0 b19365007 36=36+0) >> 2016-09-25 06:23:50.947806 7ffb653b8700 1 mds.0.cache.dir(10003439a33) my >> rstats: n(v2 rc2016-08-28 04:48:37.685854 b49447206 53=53+0) >> >> The slightly sad thing is - I suspect all of this is probably from >> something that "happened at some time in the past", and running mds with >> debugging will make my users very unhappy as writing/formatting all that log >> is not exactly cheap. (debug_mds=20/20, quickly ended up with mds beacon >> marked as laggy). >> >> Bonus question: In terms of "understanding how cephfs works" is >> doc/dev/mds_internals it ? :) Given that making "minimal reproducible >> test-cases" so far is turning to be quite elusive from the "top down" >> approach, I'm finding myself looking inside the box to try to figure out how >> we got where we are. >> >> (And many thanks for ceph-dencoder, it satisfies my pathological need to >> look inside of things). >> >> Cheers, >> -- >> Kjetil Joergensen <kjetil@xxxxxxxxxxxx> >> SRE, Medallia Inc >> Phone: +1 (650) 739-6580 > > > > > -- > Kjetil Joergensen <kjetil@xxxxxxxxxxxx> > SRE, Medallia Inc > Phone: +1 (650) 739-6580 > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com