Hi Janek, All this indicates is that you have some files with binary keys that cannot be decoded as utf-8. Unfortunately, the rados python library assumes that omap keys can be decoded this way. I have a ticket here: https://tracker.ceph.com/issues/59716 I hope to have a fix soon. On Thu, May 4, 2023 at 3:15 AM Janek Bevendorff <janek.bevendorff@xxxxxxxxxxxxx> wrote: > > After running the tool for 11 hours straight, it exited with the > following exception: > > Traceback (most recent call last): > File "/home/webis/first-damage.py", line 156, in <module> > traverse(f, ioctx) > File "/home/webis/first-damage.py", line 84, in traverse > for (dnk, val) in it: > File "rados.pyx", line 1389, in rados.OmapIterator.__next__ > File "rados.pyx", line 318, in rados.decode_cstr > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8: > invalid start byte > > Does that mean that the last inode listed in the output file is corrupt? > Any way I can fix it? > > The output file has 14 million lines. We have about 24.5 million objects > in the metadata pool. > > Janek > > > On 03/05/2023 14:20, Patrick Donnelly wrote: > > On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff > > <janek.bevendorff@xxxxxxxxxxxxx> wrote: > >> Hi Patrick, > >> > >>> I'll try that tomorrow and let you know, thanks! > >> I was unable to reproduce the crash today. Even with > >> mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up > >> correctly (though they took forever to rejoin with logs set to 20). > >> > >> To me it looks like the issue has resolved itself overnight. I had run a > >> recursive scrub on the file system and another snapshot was taken, in > >> case any of those might have had an effect on this. It could also be the > >> case that the (supposedly) corrupt journal entry has simply been > >> committed now and hence doesn't trigger the assertion any more. Is there > >> any way I can verify this? > > You can run: > > > > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py > > > > Just do: > > > > python3 first-damage.py --memo run.1 <meta pool> > > > > No need to do any of the other steps if you just want a read-only check. > > > -- > > Bauhaus-Universität Weimar > Bauhausstr. 9a, R308 > 99423 Weimar, Germany > > Phone: +49 3643 58 3577 > www.webis.de > -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx