Re: MDS stuck in up:rejoin

Venky Shankar <vshankar@xxxxxxxxxx> · Tue, 5 Dec 2023 18:20:56 +0530

Hi Eric,

On Tue, Dec 5, 2023 at 3:43 PM Eric Tittley <Eric.Tittley@xxxxxxxx> wrote:
>
> Hi Venky,
>
> > The recently crashed daemon is likely the MDS which you mentioned in
> > your subsequent email.
>
> The "recently crashed daemon" was the osd.51 daemon which was in the
> metadata pool.
>
> But yes, in the process of trying to get the system running, I probably
> did a few steps that were unnecessary. The steps generally moved me in
> the right direction until I got to MDS state "up:rejoin" where things
> paused, then got much worse.
>
> Now I'm certainly in the phase of monkeying around trying desperately to
> get a heartbeat out of the system and probably doing more damage than
> good. If only I could ask the system "what are you actually trying to
> do?" Scrolling through the source code doesn't help too much. My next
> step will be to insert some useful debugging messages in the vicinity of
> the error to extract more information. Failing on an assert() has
> advantages, but also massive disadvantages when it comes to debugging.

Those asserts have significant value to not let the system do funny
things at a later point in time.

As far your issue is concerned, is it possible to just throw away this
fs and use a new one?

>
> Cheers,
> Eric
>
> On 05/12/2023 06:10, Venky Shankar wrote:
> > This email was sent to you by someone outside the University.
> > You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
> >
> > Hi Eric,
> >
> > On Mon, Nov 27, 2023 at 8:00 PM Eric Tittley <Eric.Tittley@xxxxxxxx> wrote:
> >> Hi all,
> >>
> >> For about a week our CephFS has experienced issues with its MDS.
> >>
> >> Currently the MDS is stuck in "up:rejoin"
> >>
> >> Issues become apparent when simple commands like "mv foo bar/" hung.
> > I assume the MDS was active at this point in time where the command
> > hung. Would that be correct?
> >
> >> I unmounted CephFS offline on the clients, evicted those remaining, and then issued
> >>
> >> ceph config set mds.0 mds_wipe_sessions true
> >> ceph config set mds.1 mds_wipe_sessions true
> >>
> >> which allowed me to delete the hung requests.
> > Most likely, the above steps weren't really required. The hung command
> > is possibly a deadlock in the MDS (during rename).
> >
> >> I've lost the exact commands I used, but something like
> >> rados -p cephfs_metadata ls | grep mds
> >> rados rm -p cephfs_metadata mds0_openfiles.0
> >>
> >> etc
> >>
> >> This allowed the MDS to get to "up:rejoin" where it has been stuck ever since which is getting on five days.
> >>
> >> # ceph mds stat
> >> cephfs:1/1 {0=cephfs.ceph00.uvlkrw=up:rejoin} 2 up:standby
> >>
> >>
> >>
> >> root@ceph00:/var/log/ceph/a614303a-5eb5-11ed-b492-011f01e12c9a# ceph -s
> >>    cluster:
> >>      id:     a614303a-5eb5-11ed-b492-011f01e12c9a
> >>      health: HEALTH_WARN
> >>              1 filesystem is degraded
> >>              1 pgs not deep-scrubbed in time
> >>              2 pool(s) do not have an application enabled
> >>              1 daemons have recently crashed
> >>
> >>    services:
> >>      mon: 3 daemons, quorum ceph00,ceph01,ceph02 (age 57m)
> >>      mgr: ceph01.lvdgyr(active, since 2h), standbys: ceph00.gpwpgs
> >>      mds: 1/1 daemons up, 2 standby
> >>      osd: 91 osds: 90 up (since 78m), 90 in (since 112m)
> >>
> >>    data:
> >>      volumes: 0/1 healthy, 1 recovering
> >>      pools:   5 pools, 1539 pgs
> >>      objects: 138.83M objects, 485 TiB
> >>      usage:   971 TiB used, 348 TiB / 1.3 PiB avail
> >>      pgs:     1527 active+clean
> >>               12   active+clean+scrubbing+deep
> >>
> >>    io:
> >>      client:   3.1 MiB/s rd, 3.16k op/s rd, 0 op/s wr
> >>
> >>
> >> # ceph --version
> >> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
> >>
> >>
> >> I've tried failing the MDS so it switches.  Rebooted a couple of times.
> >> I've added more OSDs to the metadata pool and took one out as I thought it might be a bad metadata OSD (The "recently crashed" daemon).
> > This isn't really going to do any good btw.
> >
> > The recently crashed daemon is likely the MDS which you mentioned in
> > your subsequent email.
> >
> >> The error logs are full of
> >> (prefix to all are:
> >> Nov 27 14:02:44 ceph00 bash[2145]: debug 2023-11-27T14:02:44.619+0000 7f74e845e700  1 -- [v2:192.168.1.128:6800/2157301677,v1:192.168.1.128:6801/2157301677] --> [v2:192.168.1.133:6896/4289132926,v1:192.168.1.133:6897/4289132926]
> >> )
> >>
> >> crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).send_message enqueueing message m=0x559be00adc00 type=42 osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
> >> crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).write_message sending message m=0x559be00adc00 seq=8142643 osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
> >> crc :-1 s=THROTTLE_DONE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message got 154 + 0 + 30 byte message. envelope type=43 src osd.89 off 0
> >> crc :-1 s=READ_MESSAGE_COMPLETE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 tx=0).handle_message received message m=0x559be01f4480 seq=8142643 from=osd.89 type=43 osd_op_reply(8142873 1.00000000 [getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8
> >> osd_op_reply(8142873 1.00000000 [getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8 ==== 154+0+30 (crc 0 0 0) 0x559be01f4480 con 0x559be00ad800
> >> osd_op(unknown.0.36244:8142874 3.ff 3:ff5b34d6:::1.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8 -- 0x559be2caec00 con 0x559be00ad800
> >>
> >>
> >>
> >>
> >> Repeating multiple times a second (and filling /var)
> >> Prior to taking one of the cephfs_metadata OSDs offline, these came from communications from ceph00 to the node hosting the suspected bad OSD.
> >> Now they are between ceph00 and the host of the replacement metadata OSD.
> >>
> >> Does anyone have any suggestion on how to get the MDS to switch from "up:rejoin" to "up:active"?
> >>
> >> Is there any way to debug this, to determine what issue really is? I'm unable to interpret the debug log.
> >>
> >> Cheers,
> >> Eric
> >>
> >> ________________________________________________________
> >> Dr Eric Tittley
> >> Research Computing Officer www.roe.ac.uk/~ert<http://www.roe.ac.uk/~ert>
> >> Institute for Astronomy Royal Observatory, Edinburgh
> >>
> >>
> >>
> >>
> >> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > --
> > Cheers,
> > Venky
> >
>

-- 
Cheers,
Venky
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx