Re: Broken CephFS stray entries?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 22 Jan 2019 15:41:25 +0100

On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>
> On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Hi Zheng,
> >
> > We also just saw this today and got a bit worried.
> > Should we change to:
> >
>
> What is the error message (on stray dir or other dir)? does the
> cluster ever enable multi-acitive mds?
>

It was during an upgrade from v12.2.8 to v12.2.10. 5 active MDS's
during the upgrade.

2019-01-22 10:08:22.629545 mds.p01001532184554 mds.2
128.142.39.144:6800/2644448398 36 : cluster [WRN]  replayed op
client.54045065:2282648,2282514 used ino 0x3001c85b193 but session
next is 0x3001c28f018
2019-01-22 10:08:22.629617 mds.p01001532184554 mds.2
128.142.39.144:6800/2644448398 37 : cluster [WRN]  replayed op
client.54045065:2282649,2282514 used ino 0x3001c85b194 but session
next is 0x3001c28f018
2019-01-22 10:08:22.629652 mds.p01001532184554 mds.2
128.142.39.144:6800/2644448398 38 : cluster [WRN]  replayed op
client.54045065:2282650,2282514 used ino 0x3001c85b195 but session
next is 0x3001c28f018
2019-01-22 10:08:37.373704 mon.cephflax-mon-9b406e0261 mon.0
137.138.121.135:6789/0 2748 : cluster [INF] daemon mds.p01001532184554
is now active in filesystem cephfs as rank 2
2019-01-22 10:08:37.805675 mon.cephflax-mon-9b406e0261 mon.0
137.138.121.135:6789/0 2749 : cluster [INF] Health check cleared:
FS_DEGRADED (was: 1 filesystem is degraded)
2019-01-22 10:08:39.784260 mds.p01001532184554 mds.2
128.142.39.144:6800/2644448398 547 : cluster [ERR] bad/negative dir
size on 0x61b f(v27 m2019-01-22 10:07:38.509466 0=-1+1)
2019-01-22 10:08:39.784271 mds.p01001532184554 mds.2
128.142.39.144:6800/2644448398 548 : cluster [ERR] unmatched fragstat
on 0x61b, inode has f(v28 m2019-01-22 10:07:38.509466 0=-1+1),
dirfrags have f(v0 m2019-01-22 10:07:38.509466 1=0+1)
2019-01-22 10:10:02.605036 mon.cephflax-mon-9b406e0261 mon.0
137.138.121.135:6789/0 2803 : cluster [INF] Health check cleared:
MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons
available)
2019-01-22 10:10:02.605089 mon.cephflax-mon-9b406e0261 mon.0
137.138.121.135:6789/0 2804 : cluster [INF] Cluster is now healthy

> > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc
> > index e8c1bc8bc1..e2539390fb 100644
> > --- a/src/mds/CInode.cc
> > +++ b/src/mds/CInode.cc
> > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type)
> >
> >         if (pf->fragstat.nfiles < 0 ||
> >             pf->fragstat.nsubdirs < 0) {
> > -         clog->error() << "bad/negative dir size on "
> > +         clog->warn() << "bad/negative dir size on "
> >               << dir->dirfrag() << " " << pf->fragstat;
> >           assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter);
> >
> > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type)
> >           if (state_test(CInode::STATE_REPAIRSTATS)) {
> >             dout(20) << " dirstat mismatch, fixing" << dendl;
> >           } else {
> > -           clog->error() << "unmatched fragstat on " << ino() << ", inode has "
> > +           clog->warn() << "unmatched fragstat on " << ino() << ", inode has "
> >                           << pi->dirstat << ", dirfrags have " << dirstat;
> >             assert(!"unmatched fragstat" == g_conf->mds_verify_scatter);
> >           }
> >
> >
> > Cheers, Dan
> >
> >
> > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> >>
> >> no action is required. mds fixes this type of error atomically.
> >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke
> >> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >> >
> >> > Hi,
> >> >
> >> >
> >> > upon failover or restart, or MDS complains that something is wrong with
> >> > one of the stray directories:
> >> >
> >> >
> >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log
> >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19
> >> > 12:51:12.016360 -4=-5+1)
> >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log
> >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19
> >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360
> >> > 1=0+1)
> >> >
> >> >
> >> > How do we handle this problem?
> >> >
> >> >
> >> > Regards,
> >> >
> >> > Burkhard
> >> >
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@xxxxxxxxxxxxxx
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com