Re: MDS crashes to damaged metadata

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Wed, 30 Nov 2022 19:26:34 -0500

You can run this tool. Be sure to read the comments.

https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py

As of now what causes the damage is not yet known but we are trying to
reproduce it. If your workload reliably produces the damage, a
debug_mds=20 MDS log would be extremely helpful.

On Wed, Nov 30, 2022 at 6:15 PM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote:
>
> Hi Patrick,
>
> it does seem like it. We are not using postgres on cephfs as far as i know. We narrowed it down to three damaged inodes, but files in question had been xlsx, pdf or pst.
>
> Do you have any suggestion how to fix this?
>
> Is there a way to scan the cephfs for damaged inodes?
>
>
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> Am 30.11.2022 um 22:49 schrieb Patrick Donnelly <pdonnell@xxxxxxxxxx>:
>
> On Wed, Nov 30, 2022 at 3:10 PM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote:
>
>
> Hey guys,
>
> our mds daemons are crashing constantly when someone is trying to delete a file:
>
> -26> 2022-11-29T12:32:58.807+0100 7f081b458700 -1 /build/ceph-16.2.10/src/mds/Server.cc<http://server.cc/>: In function 'void Server::_unlink_local(MDRequestRef&, CDentry*, CDentry*)' thread 7f081b458700 time 2022-11-29T12:32:58.808844+0100
>
> 2022-11-29T12:32:58.807+0100 7f081b458700  4 mds.0.server handle_client_request client_request(client.1189402075:14014394 unlink #0x100197fa8e0/~$29.11. T.xlsx 2022-11-29T12:32:23.711889+0100 RETRY=1 caller_uid=133365,
>
> I observed that the corresponding object in the cephfs data pool does not exist. Basically our MDS Daemons are crashing each time, when somone tries to delete a file which does not exist in the data pool but metadata says otherwise.
>
> Any suggestions how to fix this problem?
>
>
> Is this it?
>
> https://tracker.ceph.com/issues/38452
>
> Are you running postgres on CephFS by chance?
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx