Re: mds crashes with 18.2.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Actually, there is a problem with this tarball:
https://github.com/ceph/ceph/archive/refs/tags/v18.2.1.tar.gz
corresponding to an older commit, e3fce6809130d78ac0058fc87e537ecd926cd213, which misses some important fixes.

Maybe it should be fixed there.

The src.rpms use 7fe91d5d5842e04be3b4f514d6dd990c54b29c76, and with the tarball from there, the mds works better and does not crash as bellow. Still, lost+found is problematic and cannot be touched.

Best,
Andrej

On 27/12/2023 15:27, Andrej Filipčič wrote:

Hi,

I just upgraded from 17.2.6 to 18.2.1 and have some issues with mds.

mds started crashing with
2023-12-27T13:21:30.491+0100 7f717b5886c0  1 mds.f9sn015 Updating MDS map to version 2689280 from mon.5 2023-12-27T13:21:30.491+0100 7f717b5886c0  1 mds.0.2689276 handle_mds_map i am now mds.0.2689276 2023-12-27T13:21:30.491+0100 7f717b5886c0  1 mds.0.2689276 handle_mds_map state change up:clientreplay --> up:active
2023-12-27T13:21:30.491+0100 7f717b5886c0  1 mds.0.2689276 active_start
2023-12-27T13:21:30.524+0100 7f717b5886c0  1 mds.0.2689276 cluster recovered. 2023-12-27T13:21:30.551+0100 7f7176d7f6c0 -1 /var/tmp/portage/sys-cluster/ceph-18.2.1-r2/work/ceph-18.2.1/src/mds/Server.cc: In funct ion 'CInode* Server::prepare_new_inode(MDRequestRef&, CDir*, inodeno_t, unsigned int, const file_layout_t*)' thread 7f7176d7f6c0 time
 2023-12-27T13:21:30.548697+0100
/var/tmp/portage/sys-cluster/ceph-18.2.1-r2/work/ceph-18.2.1/src/mds/Server.cc: 3441: FAILED ceph_assert(_inode->gid != (unsigned)-1)

and I could not bring it back again. As a workaround I was able to start mds 17.2.6 and it somehow recovered.

Then I started 18 mds again, which soon after startup finds this corruption:
[
    {
         "damage_type": "dentry",
        "id": 4247331390,
        "ino": 1,
        "frag": "*",
        "dname": "lost+found",
        "snap_id": "head",
        "path": "/lost+found"
    }
]

There are few corrupted files in some other directories ( leftovers from several releases before I never managed to fix), and if I start mds scrub there, mds crashes again, maybe because of corrupted lost+found.

If I try to remove lost+found, mds crashes again.

Do you have any hint how to recover from this?

Best regards,
Andrej



--
_____________________________________________________________
   prof. dr. Andrej Filipcic,   E-mail: Andrej.Filipcic@xxxxxx
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-477-3166
-------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux