On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > > > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > > > Hi Zheng, > > > > > > We also just saw this today and got a bit worried. > > > Should we change to: > > > > > > > What is the error message (on stray dir or other dir)? does the > > cluster ever enable multi-acitive mds? > > > > It was during an upgrade from v12.2.8 to v12.2.10. 5 active MDS's > during the upgrade. > > 2019-01-22 10:08:22.629545 mds.p01001532184554 mds.2 > 128.142.39.144:6800/2644448398 36 : cluster [WRN] replayed op > client.54045065:2282648,2282514 used ino 0x3001c85b193 but session > next is 0x3001c28f018 > 2019-01-22 10:08:22.629617 mds.p01001532184554 mds.2 > 128.142.39.144:6800/2644448398 37 : cluster [WRN] replayed op > client.54045065:2282649,2282514 used ino 0x3001c85b194 but session > next is 0x3001c28f018 > 2019-01-22 10:08:22.629652 mds.p01001532184554 mds.2 > 128.142.39.144:6800/2644448398 38 : cluster [WRN] replayed op > client.54045065:2282650,2282514 used ino 0x3001c85b195 but session > next is 0x3001c28f018 > 2019-01-22 10:08:37.373704 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2748 : cluster [INF] daemon mds.p01001532184554 > is now active in filesystem cephfs as rank 2 > 2019-01-22 10:08:37.805675 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2749 : cluster [INF] Health check cleared: > FS_DEGRADED (was: 1 filesystem is degraded) > 2019-01-22 10:08:39.784260 mds.p01001532184554 mds.2 > 128.142.39.144:6800/2644448398 547 : cluster [ERR] bad/negative dir > size on 0x61b f(v27 m2019-01-22 10:07:38.509466 0=-1+1) > 2019-01-22 10:08:39.784271 mds.p01001532184554 mds.2 > 128.142.39.144:6800/2644448398 548 : cluster [ERR] unmatched fragstat > on 0x61b, inode has f(v28 m2019-01-22 10:07:38.509466 0=-1+1), > dirfrags have f(v0 m2019-01-22 10:07:38.509466 1=0+1) Incorrect fragstat on stray dir is not big deal. mds uses it only for printing debug/warning message. But incorrect fragstat on other dir may need manual intervention. So I'd like not to change it to 'warning' message. Regards Yan, Zheng > 2019-01-22 10:10:02.605036 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2803 : cluster [INF] Health check cleared: > MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons > available) > 2019-01-22 10:10:02.605089 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2804 : cluster [INF] Cluster is now healthy > > > > > > > > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc > > > index e8c1bc8bc1..e2539390fb 100644 > > > --- a/src/mds/CInode.cc > > > +++ b/src/mds/CInode.cc > > > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type) > > > > > > if (pf->fragstat.nfiles < 0 || > > > pf->fragstat.nsubdirs < 0) { > > > - clog->error() << "bad/negative dir size on " > > > + clog->warn() << "bad/negative dir size on " > > > << dir->dirfrag() << " " << pf->fragstat; > > > assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter); > > > > > > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type) > > > if (state_test(CInode::STATE_REPAIRSTATS)) { > > > dout(20) << " dirstat mismatch, fixing" << dendl; > > > } else { > > > - clog->error() << "unmatched fragstat on " << ino() << ", inode has " > > > + clog->warn() << "unmatched fragstat on " << ino() << ", inode has " > > > << pi->dirstat << ", dirfrags have " << dirstat; > > > assert(!"unmatched fragstat" == g_conf->mds_verify_scatter); > > > } > > > > > > > > > Cheers, Dan > > > > > > > > > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote: > > >> > > >> no action is required. mds fixes this type of error atomically. > > >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke > > >> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > > >> > > > >> > Hi, > > >> > > > >> > > > >> > upon failover or restart, or MDS complains that something is wrong with > > >> > one of the stray directories: > > >> > > > >> > > > >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log > > >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 > > >> > 12:51:12.016360 -4=-5+1) > > >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log > > >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19 > > >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360 > > >> > 1=0+1) > > >> > > > >> > > > >> > How do we handle this problem? > > >> > > > >> > > > >> > Regards, > > >> > > > >> > Burkhard > > >> > > > >> > > > >> > _______________________________________________ > > >> > ceph-users mailing list > > >> > ceph-users@xxxxxxxxxxxxxx > > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@xxxxxxxxxxxxxx > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com