Re: MDS "newly corrupt dentry" after patch version upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Janek,

All this indicates is that you have some files with binary keys  that
cannot be decoded as utf-8. Unfortunately, the rados python library
assumes that omap keys can be decoded this way. I have a ticket here:

https://tracker.ceph.com/issues/59716

I hope to have a fix soon.

On Thu, May 4, 2023 at 3:15 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> After running the tool for 11 hours straight, it exited with the
> following exception:
>
> Traceback (most recent call last):
>    File "/home/webis/first-damage.py", line 156, in <module>
>      traverse(f, ioctx)
>    File "/home/webis/first-damage.py", line 84, in traverse
>      for (dnk, val) in it:
>    File "rados.pyx", line 1389, in rados.OmapIterator.__next__
>    File "rados.pyx", line 318, in rados.decode_cstr
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8:
> invalid start byte
>
> Does that mean that the last inode listed in the output file is corrupt?
> Any way I can fix it?
>
> The output file has 14 million lines. We have about 24.5 million objects
> in the metadata pool.
>
> Janek
>
>
> On 03/05/2023 14:20, Patrick Donnelly wrote:
> > On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
> > <janek.bevendorff@xxxxxxxxxxxxx> wrote:
> >> Hi Patrick,
> >>
> >>> I'll try that tomorrow and let you know, thanks!
> >> I was unable to reproduce the crash today. Even with
> >> mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
> >> correctly (though they took forever to rejoin with logs set to 20).
> >>
> >> To me it looks like the issue has resolved itself overnight. I had run a
> >> recursive scrub on the file system and another snapshot was taken, in
> >> case any of those might have had an effect on this. It could also be the
> >> case that the (supposedly) corrupt journal entry has simply been
> >> committed now and hence doesn't trigger the assertion any more. Is there
> >> any way I can verify this?
> > You can run:
> >
> > https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py
> >
> > Just do:
> >
> > python3 first-damage.py --memo run.1 <meta pool>
> >
> > No need to do any of the other steps if you just want a read-only check.
> >
> --
>
> Bauhaus-Universität Weimar
> Bauhausstr. 9a, R308
> 99423 Weimar, Germany
>
> Phone: +49 3643 58 3577
> www.webis.de
>


-- 
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux