Re: MDS stuck in up:rejoin

Eric Tittley <Eric.Tittley@xxxxxxxx> · Mon, 4 Dec 2023 14:49:30 +0000

I rebooted the servers and now the MDS won't start at all.

They give the (truncated) error:
    -1> 2023-12-04T14:44:41.354+0000 7f6351715640 -1 
./src/mds/MDCache.cc: In function 'void MDCache::rejoin_send_rejoins()' 
thread 7f6351715640 time 2023-12-04T14:44:41.354292+0000
./src/mds/MDCache.cc: 4084: FAILED ceph_assert(auth >= 0)

Is this a file permission problem?

Eric

On 27/11/2023 14:29, Eric Tittley wrote:
Hi all,

For about a week our CephFS has experienced issues with its MDS.

Currently the MDS is stuck in "up:rejoin"

Issues become apparent when simple commands like "mv foo bar/" hung.

I unmounted CephFS offline on the clients, evicted those remaining, 
and then issued

ceph config set mds.0 mds_wipe_sessions true
ceph config set mds.1 mds_wipe_sessions true

which allowed me to delete the hung requests.

I've lost the exact commands I used, but something like
rados -p cephfs_metadata ls | grep mds
rados rm -p cephfs_metadata mds0_openfiles.0

etc

This allowed the MDS to get to "up:rejoin" where it has been stuck 
ever since which is getting on five days.

# ceph mds stat
cephfs:1/1 {0=cephfs.ceph00.uvlkrw=up:rejoin} 2 up:standby

root@ceph00:/var/log/ceph/a614303a-5eb5-11ed-b492-011f01e12c9a# ceph -s
 cluster:
   id:     a614303a-5eb5-11ed-b492-011f01e12c9a
   health: HEALTH_WARN
           1 filesystem is degraded
           1 pgs not deep-scrubbed in time
           2 pool(s) do not have an application enabled
           1 daemons have recently crashed

 services:
   mon: 3 daemons, quorum ceph00,ceph01,ceph02 (age 57m)
   mgr: ceph01.lvdgyr(active, since 2h), standbys: ceph00.gpwpgs
   mds: 1/1 daemons up, 2 standby
   osd: 91 osds: 90 up (since 78m), 90 in (since 112m)

 data:
   volumes: 0/1 healthy, 1 recovering
   pools:   5 pools, 1539 pgs
   objects: 138.83M objects, 485 TiB
   usage:   971 TiB used, 348 TiB / 1.3 PiB avail
   pgs:     1527 active+clean
            12   active+clean+scrubbing+deep

 io:
   client:   3.1 MiB/s rd, 3.16k op/s rd, 0 op/s wr

# ceph --version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)

I've tried failing the MDS so it switches.  Rebooted a couple of times.
I've added more OSDs to the metadata pool and took one out as I 
thought it might be a bad metadata OSD (The "recently crashed" daemon).

The error logs are full of
(prefix to all are:
Nov 27 14:02:44 ceph00 bash[2145]: debug 2023-11-27T14:02:44.619+0000 
7f74e845e700  1 -- 
[v2:192.168.1.128:6800/2157301677,v1:192.168.1.128:6801/2157301677] 
--> [v2:192.168.1.133:6896/4289132926,v1:192.168.1.133:6897/4289132926]
)

crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
tx=0).send_message enqueueing message m=0x559be00adc00 type=42 
osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr 
parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=READY pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
tx=0).write_message sending message m=0x559be00adc00 seq=8142643 
osd_op(mds.0.36244:8142873 3.ff 3:ff5b34d6:::1.00000000:head [getxattr 
parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) v8
crc :-1 s=THROTTLE_DONE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp 
rx=0 tx=0).handle_message got 154 + 0 + 30 byte message. envelope 
type=43 src osd.89 off 0
crc :-1 s=READ_MESSAGE_COMPLETE pgs=12 cs=0 l=1 rev1=1 crypto rx=0 
tx=0 comp rx=0 tx=0).handle_message received message m=0x559be01f4480 
seq=8142643 from=osd.89 type=43 osd_op_reply(8142873 1.00000000 
[getxattr (30) out=30b] v0'0 uv560123 ondisk = 0) v8
osd_op_reply(8142873 1.00000000 [getxattr (30) out=30b] v0'0 uv560123 
ondisk = 0) v8 ==== 154+0+30 (crc 0 0 0) 0x559be01f4480 con 
0x559be00ad800
osd_op(unknown.0.36244:8142874 3.ff 3:ff5b34d6:::1.00000000:head 
[getxattr parent in=6b] snapc 0=[] 
ondisk+read+known_if_redirected+full_force+supports_pool_eio e32465) 
v8 -- 0x559be2caec00 con 0x559be00ad800

Repeating multiple times a second (and filling /var)
Prior to taking one of the cephfs_metadata OSDs offline, these came 
from communications from ceph00 to the node hosting the suspected bad 
OSD.
Now they are between ceph00 and the host of the replacement metadata OSD.

Does anyone have any suggestion on how to get the MDS to switch from 
"up:rejoin" to "up:active"?

Is there any way to debug this, to determine what issue really is? I'm 
unable to interpret the debug log.

Cheers,
Eric

________________________________________________________
Dr Eric Tittley
Research Computing Officer www.roe.ac.uk/~ert<http://www.roe.ac.uk/~ert>
Institute for Astronomy Royal Observatory, Edinburgh

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336. Is e buidheann 
carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, 
àireamh clàraidh SC005336.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx