Re: Broken CephFS stray entries?

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 23 Jan 2019 09:43:34 +0800

On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> >
> > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi Zheng,
> > >
> > > We also just saw this today and got a bit worried.
> > > Should we change to:
> > >
> >
> > What is the error message (on stray dir or other dir)? does the
> > cluster ever enable multi-acitive mds?
> >
>
> It was during an upgrade from v12.2.8 to v12.2.10. 5 active MDS's
> during the upgrade.
>
> 2019-01-22 10:08:22.629545 mds.p01001532184554 mds.2
> 128.142.39.144:6800/2644448398 36 : cluster [WRN]  replayed op
> client.54045065:2282648,2282514 used ino 0x3001c85b193 but session
> next is 0x3001c28f018
> 2019-01-22 10:08:22.629617 mds.p01001532184554 mds.2
> 128.142.39.144:6800/2644448398 37 : cluster [WRN]  replayed op
> client.54045065:2282649,2282514 used ino 0x3001c85b194 but session
> next is 0x3001c28f018
> 2019-01-22 10:08:22.629652 mds.p01001532184554 mds.2
> 128.142.39.144:6800/2644448398 38 : cluster [WRN]  replayed op
> client.54045065:2282650,2282514 used ino 0x3001c85b195 but session
> next is 0x3001c28f018
> 2019-01-22 10:08:37.373704 mon.cephflax-mon-9b406e0261 mon.0
> 137.138.121.135:6789/0 2748 : cluster [INF] daemon mds.p01001532184554
> is now active in filesystem cephfs as rank 2
> 2019-01-22 10:08:37.805675 mon.cephflax-mon-9b406e0261 mon.0
> 137.138.121.135:6789/0 2749 : cluster [INF] Health check cleared:
> FS_DEGRADED (was: 1 filesystem is degraded)
> 2019-01-22 10:08:39.784260 mds.p01001532184554 mds.2
> 128.142.39.144:6800/2644448398 547 : cluster [ERR] bad/negative dir
> size on 0x61b f(v27 m2019-01-22 10:07:38.509466 0=-1+1)
> 2019-01-22 10:08:39.784271 mds.p01001532184554 mds.2
> 128.142.39.144:6800/2644448398 548 : cluster [ERR] unmatched fragstat
> on 0x61b, inode has f(v28 m2019-01-22 10:07:38.509466 0=-1+1),
> dirfrags have f(v0 m2019-01-22 10:07:38.509466 1=0+1)

Incorrect fragstat on stray dir is not big deal. mds uses it only for
printing debug/warning message. But incorrect fragstat on other dir
may need manual intervention. So I'd like not to change it to
'warning' message.

Regards
Yan, Zheng

> 2019-01-22 10:10:02.605036 mon.cephflax-mon-9b406e0261 mon.0
> 137.138.121.135:6789/0 2803 : cluster [INF] Health check cleared:
> MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons
> available)
> 2019-01-22 10:10:02.605089 mon.cephflax-mon-9b406e0261 mon.0
> 137.138.121.135:6789/0 2804 : cluster [INF] Cluster is now healthy
>
>
>
>
>
> > > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
> > > index e8c1bc8bc1..e2539390fb 100644
> > > --- a/src/mds/CInode.cc
> > > +++ b/src/mds/CInode.cc
> > > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type)
> > >
> > >         if (pf->fragstat.nfiles < 0 ||
> > >             pf->fragstat.nsubdirs < 0) {
> > > -         clog->error() << "bad/negative dir size on "
> > > +         clog->warn() << "bad/negative dir size on "
> > >               << dir->dirfrag() << " " << pf->fragstat;
> > >           assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter);
> > >
> > > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type)
> > >           if (state_test(CInode::STATE_REPAIRSTATS)) {
> > >             dout(20) << " dirstat mismatch, fixing" << dendl;
> > >           } else {
> > > -           clog->error() << "unmatched fragstat on " << ino() << ", inode has "
> > > +           clog->warn() << "unmatched fragstat on " << ino() << ", inode has "
> > >                           << pi->dirstat << ", dirfrags have " << dirstat;
> > >             assert(!"unmatched fragstat" == g_conf->mds_verify_scatter);
> > >           }
> > >
> > >
> > > Cheers, Dan
> > >
> > >
> > > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> > >>
> > >> no action is required. mds fixes this type of error atomically.
> > >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke
> > >> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> >
> > >> > upon failover or restart, or MDS complains that something is wrong with
> > >> > one of the stray directories:
> > >> >
> > >> >
> > >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log
> > >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19
> > >> > 12:51:12.016360 -4=-5+1)
> > >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log
> > >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19
> > >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360
> > >> > 1=0+1)
> > >> >
> > >> >
> > >> > How do we handle this problem?
> > >> >
> > >> >
> > >> > Regards,
> > >> >
> > >> > Burkhard
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > ceph-users mailing list
> > >> > ceph-users@xxxxxxxxxxxxxx
> > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@xxxxxxxxxxxxxx
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com