proxmox, nautilus: recurrent cephfs corruption resulting in assert crash in mds

Eric Le Lay <eric.lelay@xxxxxxxxxxxxx> · Tue, 27 Jul 2021 18:32:09 +0200

Dear list,

we run a 7 node proxmox cluster with ceph nautilus (14.2.18), with 2 
ceph filesystems, mounted in debian buster VMs using the cephfs kernel 
module.

4 times in the last 6 months we had all mds servers failing one after 
the other with an assert, either in the rename_prepare or unlink_local 
functions.

Here is the latest one, from Yesterday:

    -1> 2021-07-26 17:05:37.046 7fc518787700 -1 
/build/ceph/ceph-14.2.18/src/mds/Server.cc: In function 'void 
Server::_rename_prepare(MDRequestRef&, EMetaBlob*, ceph::bufferlist*, 
CDentry*, CDentry*, CDentry*)' thread 7fc518787700 time 2021-07-26 
17:05:37.049487
/build/ceph/ceph-14.2.18/src/mds/Server.cc: 8435: FAILED 
ceph_assert(srci->first <= destdn->first)

 ceph version 14.2.18 (0cf7f22162b9b2809afe64b2e01779bdd70b850c) 
nautilus (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x152) [0x7fc522ac5cf2]
 2: (()+0x277eca) [0x7fc522ac5eca]
 3: (Server::_rename_prepare(boost::intrusive_ptr<MDRequestImpl>&, 
EMetaBlob*, ceph::buffer::v14_2_0::list*, CDentry*, CDentry*, 
CDentry*)+0x2d3e) [0x561fd3be626e]
 4: 
(Server::handle_client_rename(boost::intrusive_ptr<MDRequestImpl>&)+0xc21) 
[0x561fd3be6f71]
 5: 
(Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xcd4) 
[0x561fd3bf6454]
 6: 
(MDCache::dispatch_request(boost::intrusive_ptr<MDRequestImpl>&)+0x38) 
[0x561fd3c82c68]
 7: (MDSContext::complete(int)+0x7f) [0x561fd3df745f]
 8: (MDSRank::_advance_queues()+0xac) [0x561fd3b724dc]
 9: (MDSRank::ProgressThread::entry()+0x3d) [0x561fd3b72a6d]
 10: (()+0x7fa3) [0x7fc521e8dfa3]
 11: (clone()+0x3f) [0x7fc52181f4cf]

Restarting the mds is not sufficient: the in-kernel cephfs client would 
re-issue the command and the crash would re-occur.

It is "solved" by identifying the VM issuing the rename, shutting it 
down and restarting the mds.
Then the folder containing the offending file(s) is identified, moved to 
quarantine and replaced by a copy.
We were each time able to copy the folder using rsync. Copies didn't 
trigger the assert.

There wasn't any failure prior to the assert: Yesterday I had marked 4 
OSDs out a few minutes ago and had checked ceph status:
they were backfilling OK.

1. Did this happen to somebody else?

2. How can we identify broken files? How can we delete them or repair 
the metadata?

I'll be happy to provide more information if necessary.

Thanks,

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx