After running the tool for 11 hours straight, it exited with the
following exception:
Traceback (most recent call last):
File "/home/webis/first-damage.py", line 156, in <module>
traverse(f, ioctx)
File "/home/webis/first-damage.py", line 84, in traverse
for (dnk, val) in it:
File "rados.pyx", line 1389, in rados.OmapIterator.__next__
File "rados.pyx", line 318, in rados.decode_cstr
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 8:
invalid start byte
Does that mean that the last inode listed in the output file is corrupt?
Any way I can fix it?
The output file has 14 million lines. We have about 24.5 million objects
in the metadata pool.
Janek
On 03/05/2023 14:20, Patrick Donnelly wrote:
On Wed, May 3, 2023 at 4:33 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
Hi Patrick,
I'll try that tomorrow and let you know, thanks!
I was unable to reproduce the crash today. Even with
mds_abort_on_newly_corrupt_dentry set to true, all MDS booted up
correctly (though they took forever to rejoin with logs set to 20).
To me it looks like the issue has resolved itself overnight. I had run a
recursive scrub on the file system and another snapshot was taken, in
case any of those might have had an effect on this. It could also be the
case that the (supposedly) corrupt journal entry has simply been
committed now and hence doesn't trigger the assertion any more. Is there
any way I can verify this?
You can run:
https://github.com/ceph/ceph/blob/main/src/tools/cephfs/first-damage.py
Just do:
python3 first-damage.py --memo run.1 <meta pool>
No need to do any of the other steps if you just want a read-only check.
--
Bauhaus-Universität Weimar
Bauhausstr. 9a, R308
99423 Weimar, Germany
Phone: +49 3643 58 3577
www.webis.de
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx