Is there any way to repair pgs/cephfs gracefully? -Mykola From: Yan, Zheng On Wed, Oct 5, 2016 at 2:27 PM, Mykola Dvornik <mykola.dvornik@xxxxxxxxx> wrote: > Hi Zheng, > > Many thanks for you reply. > > This indicates the MDS metadata is corrupted. Did you do any unusual > operation on the cephfs? (e.g reset journal, create new fs using > existing metadata pool) > > No, nothing has been explicitly done to the MDS. I had a few inconsistent > PGs that belonged to the (3 replica) metadata pool. The symptoms were > similar to http://tracker.ceph.com/issues/17177 . The PGs were eventually > repaired and no data corruption was expected as explained in the ticket. > I'm afraid that issue does cause corruption. > BTW, when I posted this issue on the ML the amount of ground state stry > objects was around 7.5K. Now it went up to 23K. No inconsistent PGs or any > other problems happened to the cluster within this time scale. > > -Mykola > > On 5 October 2016 at 05:49, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >> >> On Mon, Oct 3, 2016 at 5:48 AM, Mykola Dvornik <mykola.dvornik@xxxxxxxxx> >> wrote: >> > Hi Johan, >> > >> > Many thanks for your reply. I will try to play with the mds tunables and >> > report back to your ASAP. >> > >> > So far I see that mds log contains a lot of errors of the following >> > kind: >> > >> > 2016-10-02 11:58:03.002769 7f8372d54700 0 mds.0.cache.dir(100056ddecd) >> > _fetched badness: got (but i already had) [inode 10005729a77 [2,head] >> > ~mds0/stray1/10005729a77 auth v67464942 s=196728 nl=0 n(v0 b196728 >> > 1=1+0) >> > (iversion lock) 0x7f84acae82a0] mode 33204 mtime 2016-08-07 >> > 23:06:29.776298 >> > >> > 2016-10-02 11:58:03.002789 7f8372d54700 -1 log_channel(cluster) log >> > [ERR] : >> > loaded dup inode 10005729a77 [2,head] v68621 at >> > >> > /users/mykola/mms/NCSHNO/final/120nm-uniform-h8200/j002654.out/m_xrange192-320_yrange192-320_016232.dump, >> > but inode 10005729a77.head v67464942 already exists at >> > ~mds0/stray1/10005729a77 >> >> This indicates the MDS metadata is corrupted. Did you do any unusual >> operation on the cephfs? (e.g reset journal, create new fs using >> existing metadata pool) >> >> > >> > Those folders within mds.0.cache.dir that got badness report a size of >> > 16EB >> > on the clients. rm on them fails with 'Directory not empty'. >> > >> > As for the "Client failing to respond to cache pressure", I have 2 >> > kernel >> > clients on 4.4.21, 1 on 4.7.5 and 16 fuse clients always running the >> > most >> > recent release version of ceph-fuse. The funny thing is that every >> > single >> > client misbehaves from time to time. I am aware of quite discussion >> > about >> > this issue on the ML, but cannot really follow how to debug it. >> > >> > Regards, >> > >> > -Mykola >> > >> > On 2 October 2016 at 22:27, John Spray <jspray@xxxxxxxxxx> wrote: >> >> >> >> On Sun, Oct 2, 2016 at 11:09 AM, Mykola Dvornik >> >> <mykola.dvornik@xxxxxxxxx> wrote: >> >> > After upgrading to 10.2.3 we frequently see messages like >> >> >> >> From which version did you upgrade? >> >> >> >> > 'rm: cannot remove '...': No space left on device >> >> > >> >> > The folders we are trying to delete contain approx. 50K files 193 KB >> >> > each. >> >> >> >> My guess would be that you are hitting the new >> >> mds_bal_fragment_size_max check. This limits the number of entries >> >> that the MDS will create in a single directory fragment, to avoid >> >> overwhelming the OSD with oversized objects. It is 100000 by default. >> >> This limit also applies to "stray" directories where unlinked files >> >> are put while they wait to be purged, so you could get into this state >> >> while doing lots of deletions. There are ten stray directories that >> >> get a roughly even share of files, so if you have more than about one >> >> million files waiting to be purged, you could see this condition. >> >> >> >> The "Client failing to respond to cache pressure" messages may play a >> >> part here -- if you have misbehaving clients then they may cause the >> >> MDS to delay purging stray files, leading to a backlog. If your >> >> clients are by any chance older kernel clients, you should upgrade >> >> them. You can also unmount/remount them to clear this state, although >> >> it will reoccur until the clients are updated (or until the bug is >> >> fixed, if you're running latest clients already). >> >> >> >> The high level counters for strays are part of the default output of >> >> "ceph daemonperf mds.<id>" when run on the MDS server (the "stry" and >> >> "purg" columns). You can look at these to watch how fast the MDS is >> >> clearing out strays. If your backlog is just because it's not doing >> >> it fast enough, then you can look at tuning mds_max_purge_files and >> >> mds_max_purge_ops to adjust the throttles on purging. Those settings >> >> can be adjusted without restarting the MDS using the "injectargs" >> >> command >> >> >> >> (http://docs.ceph.com/docs/master/rados/operations/control/#mds-subsystem) >> >> >> >> Let us know how you get on. >> >> >> >> John >> >> >> >> >> >> > The cluster state and storage available are both OK: >> >> > >> >> > cluster 98d72518-6619-4b5c-b148-9a781ef13bcb >> >> > health HEALTH_WARN >> >> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache >> >> > pressure >> >> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache >> >> > pressure >> >> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache >> >> > pressure >> >> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache >> >> > pressure >> >> > mds0: Client XXX.XXX.XXX.XXX failing to respond to cache >> >> > pressure >> >> > monmap e1: 1 mons at {000-s-ragnarok=XXX.XXX.XXX.XXX:6789/0} >> >> > election epoch 11, quorum 0 000-s-ragnarok >> >> > fsmap e62643: 1/1/1 up {0=000-s-ragnarok=up:active} >> >> > osdmap e20203: 16 osds: 16 up, 16 in >> >> > flags sortbitwise >> >> > pgmap v15284654: 1088 pgs, 2 pools, 11263 GB data, 40801 >> >> > kobjects >> >> > 23048 GB used, 6745 GB / 29793 GB avail >> >> > 1085 active+clean >> >> > 2 active+clean+scrubbing >> >> > 1 active+clean+scrubbing+deep >> >> > >> >> > >> >> > Has anybody experienced this issue so far? >> >> > >> >> > Regards, >> >> > -- >> >> > Mykola >> >> > >> >> > _______________________________________________ >> >> > ceph-users mailing list >> >> > ceph-users@xxxxxxxxxxxxxx >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > >> > >> > >> > >> > >> > -- >> > Mykola >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > > > > -- > Mykola |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com