Re: mds crash loop

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 6 Nov 2019 21:15:45 +0800

On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen <karsten@xxxxxxxxxx> wrote:
>
> -----Original message-----
> From:   Yan, Zheng <ukernel@xxxxxxxxx>
> Sent:   Wed 06-11-2019 08:15
> Subject:        Re:  mds crash loop
> To:     Karsten Nielsen <karsten@xxxxxxxxxx>;
> CC:     ceph-users@xxxxxxx;
> > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen <karsten@xxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > Last week I upgraded my ceph cluster from luminus to mimic 13.2.6
> > > It was running fine for a while but yesterday my mds went into a crash loop.
> > >
> > > I have 1 active and 1 standby mds for my cephfs both of which is running the
> > same crash loop.
> > > I am running ceph based on https://hub.docker.com/r/ceph/daemon version
> > v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv store.
> > >
> > > Log details are: https://paste.debian.net/1113943/
> > >
> >
> > please try again with debug_mds=20.  Thanks
> >
> > Yan, Zheng
>
> Yes I have set that and had to move to pastebin.com as debian apperently only supports 150k
>
>
> https://pastebin.com/Gv7c5h54
>

Looks like on-disk root inode is corrupted. have you encountered any
unusually things during the upgrade?

please run 'rados -p <cephfs metadata pool> stat 1.00000000.inode' ,
check if the object is modified before or after the 'luminous ->
13.2.6' upgrade.
To fix the corrupted object.  Run  'cephfs-data-scan init
--force-init'. Then restart mds. After mds become active, run 'ceph
daemon mds.x scrub_path / force repair'

> - Karsten
>
> >
> > > Thanks for any hints
> > > - Karsten
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx