Re: Production 12.2.2 CephFS Cluster still broken, new Details

Tobias Prousa <tobias.prousa@xxxxxxxxx> · Tue, 12 Dec 2017 14:14:33 +0100

Thank you very much! I feel optimistic that now I got what I need to get that thing back working again.

I'll report back...

Best regards,
Tobi

On 12/12/2017 02:08 PM, Yan, Zheng wrote:
On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx> wrote:
Hi Zheng,

the more you tell me the more what I see begins to makes sens to me. Thank
you very much.

Could you please be a little more verbose about how to use rados rmomapky?
What to use for <name> and what to use for <>. Here is what my dir_frag
looks like:

     {
         "damage_type": "dir_frag",
         "id": 1418581248,
         "ino": 1099733590290,
         "frag": "*",
         "path":
"/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-backup"
     }

Find inode number of parent directory
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/ in this
case), print it in hex. You will get  something like 1000xxxxxxx.

run 'rados -p cephfs_metadatapool listomapkeys 1000xxxxxxx.00000000'

the output should include one entry named safebrowsing-backup_head

run 'rados -p cephfs_metadatapool rmomapkey 1000xxxxxxx.00000000
safebrowsing-backup_head'

before doing rmomapkey, run 'ceph daemon mds.x flush journal'  and
stop mds. you'd better to do this after scrub

I cannot simply remove that dir through filesystem as it refuses to delete
that folder.

Then you say its easy to fix backtrace, yet here it looks like some
backtraces get fixed with online MDS scrub while most of them fail to be
fixed and stay in damage_type "backtrace".

Once again, thank you so much for your help!

Best regards,
Tobi

On 12/12/2017 01:10 PM, Yan, Zheng wrote:
On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx>
wrote:
Hi there,

regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke
my
CephFs) I was able to get a little further with the suggested
"cephfs-table-tool take_inos <max ino>". This made the whole issue with
loads of "falsely free-marked inodes" go away.

I then restarted MDS, kept all clients down so no client has mounted FS.
Then I started an online MDS scrub

ceph daemon mds.a scrub_path / recursive repair

This again ran for about 3 hours, then MDS again marked FS damaged and
changes its own state to standby (at least that is what I interpret from
what I see. This happened exactly at the moment when the scrub hit a
missing
object. See end of logfile (default log level):

2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
bad backtrace on inode

0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
rewriting it
2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
Scrub error on inode 0x1000d3aede3

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
dirtyparent=1
scrubqueue=0 0x55ef37c83200]:

{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
v382>,<0x10002de79e8/safebrowsing
v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
disk;
see

retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
bad backtrace on inode

0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
rewriting it
2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
Scrub error on inode 0x1000d3aedf1

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
dirtyparent=1
scrubqueue=0 0x55ef3aa38a00]:

{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore
v384>,<0x10002de79e8/safebrowsing
v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
disk;
see

retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
bad backtrace on inode

0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache),
rewriting it
2017-12-11 22:29:05.733420 7fc2342bc700  0 log_channel(cluster) log [WRN]
:
Scrub error on inode 0x1000d3aedb6

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache)
see mds.b log and `damage ls` output for details
2017-12-11 22:29:05.733475 7fc2342bc700 -1 mds.0.scrubstack
_validate_inode_done scrub error on inode [inode 0x1000d3aedb6 [2,head]

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache
auth v366 dirtyparent s=44 n(v0 b44 1=1+0) (iversion lock) |
dirtyparent=1
scrubqueue=0 0x55ef37c78a00]:

{"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedb6:[<0x1000d3aeda7/test-malware-simple.cache
v366>,<0x10002de79e8/safebrowsing
v7142119>,<0x10002de79df/dsjf5siv.default
v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
disk;
see

retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
2017-12-11 22:29:05.772351 7fc2342bc700  0 mds.0.cache.dir(0x1000d3ae112)
_fetched missing object for [dir 0x1000d3ae112

/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-to_delete/
[2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0
|
waiter=1 authpin=1 0x55eedee27a80]
2017-12-11 22:29:05.772385 7fc2342bc700 -1 log_channel(cluster) log [ERR]
:
dir 0x1000d3ae112 object missing on disk; some files may be lost

(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-to_delete)
2017-12-11 22:29:05.778009 7fc2342bc700  1 mds.b respawn
2017-12-11 22:29:05.778028 7fc2342bc700  1 mds.b  e: '/usr/bin/ceph-mds'
2017-12-11 22:29:05.778031 7fc2342bc700  1 mds.b  0: '/usr/bin/ceph-mds'
2017-12-11 22:29:05.778036 7fc2342bc700  1 mds.b  1: '-i'
2017-12-11 22:29:05.778038 7fc2342bc700  1 mds.b  2: 'b'
2017-12-11 22:29:05.778040 7fc2342bc700  1 mds.b  3: '--pid-file'
2017-12-11 22:29:05.778042 7fc2342bc700  1 mds.b  4:
'/var/run/ceph/mds.b.pid'
2017-12-11 22:29:05.778044 7fc2342bc700  1 mds.b  5: '-c'
2017-12-11 22:29:05.778046 7fc2342bc700  1 mds.b  6:
'/etc/ceph/ceph.conf'
2017-12-11 22:29:05.778048 7fc2342bc700  1 mds.b  7: '--cluster'
2017-12-11 22:29:05.778050 7fc2342bc700  1 mds.b  8: 'ceph'
2017-12-11 22:29:05.778051 7fc2342bc700  1 mds.b  9: '--setuser'
2017-12-11 22:29:05.778053 7fc2342bc700  1 mds.b  10: 'ceph'
2017-12-11 22:29:05.778055 7fc2342bc700  1 mds.b  11: '--setgroup'
2017-12-11 22:29:05.778057 7fc2342bc700  1 mds.b  12: 'ceph'
2017-12-11 22:29:05.778104 7fc2342bc700  1 mds.b respawning with exe
/usr/bin/ceph-mds
2017-12-11 22:29:05.778107 7fc2342bc700  1 mds.b  exe_path /proc/self/exe
2017-12-11 22:29:06.186020 7f9ad28f41c0  0 ceph version 12.2.2
(cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
(unknown), pid 3214
2017-12-11 22:29:10.604701 7f9acbb38700  1 mds.b handle_mds_map standby

As long as MDS was still active, "damage ls" again gave me exactly 10001
damages of damage_type "backtrace". Log implies that those backtraces
cannot
be fixed automatically. I could live with losing those 10k files, but I
do
not get why MDS switches to "standby" and marks FS damaged rendering it
offline.
ceph -s then reports something like: mds: cephfs-0/1/1 1:damaged
1:standby
(not pasted but manually typed from my memory)

Btw. in the log the MDS encountered two more "object missing on disk;
some
files may be lost" much earlier during that scrub (so three in total),
but
the first two did not make the MDS going to standby.
I marked FS repaired, restarted MDS with mdf debug level 20 and reran a
scrub on that particular path but this time MDS wouldn't mark whole FS
damaged and stayed active. Will it only do so when finding three of those
damages in a row?

Is this a bug or is there something I would have to do to my cluster to
get
it back to stable working condition? Again, all this began with upgrading
from 12.2.1 to 12.2.2.

Furthermore, is there a way to get rid of those "broken" files (either
bad
backtrace or even more important those with missing objects) as I could
live
with losing certain files if it helps getting CephFS working stable
again.

due to mds_damage_table_max_entries config,  mds became damaged after
it encountered 10000 errors (most errors are bad backtrace). Your
cephfs was created before backtrace was introduced. It's likely you
didn't create backtrace for all files when upgrading from pre-firefly
release (http://ceph.com/geen-categorie/v0-81-released/). The real
harmful corruption is "object missing on disk", if the missing object
is dirfrag, all files and sub-directories under it become
unaccessible. 'cephfs-data-scan scan_inodes' can recover these
unaccessible files/directories. If you can live with losing those
files/directories, you can use 'rados rmomapkey' to remove inodes with
bad objects.

good luck
Yan, Zheng

Again, any help is highly appreciated, I need to get the FS back up as
soon
as possible. Thank you very much!

Best regards,
Tobi

--
-----------------------------------------------------------
Dipl.-Inf. (FH) Tobias Prousa
Leiter Entwicklung Datenlogger

CAETEC GmbH
Industriestr. 1
D-82140 Olching
www.caetec.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Olching
Handelsregister: Amtsgericht München, HRB 183929
Geschäftsführung: Stephan Bacher, Andreas Wocke

Tel.: +49 (0)8142 / 50 13 60
Fax.: +49 (0)8142 / 50 13 69

eMail: tobias.prousa@xxxxxxxxx
Web:   http://www.caetec.de
------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
-----------------------------------------------------------
Dipl.-Inf. (FH) Tobias Prousa
Leiter Entwicklung Datenlogger

CAETEC GmbH
Industriestr. 1
D-82140 Olching
www.caetec.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Olching
Handelsregister: Amtsgericht München, HRB 183929
Geschäftsführung: Stephan Bacher, Andreas Wocke

Tel.: +49 (0)8142 / 50 13 60
Fax.: +49 (0)8142 / 50 13 69

eMail: tobias.prousa@xxxxxxxxx
Web:   http://www.caetec.de
------------------------------------------------------------

--
-----------------------------------------------------------
Dipl.-Inf. (FH) Tobias Prousa
Leiter Entwicklung Datenlogger

CAETEC GmbH
Industriestr. 1
D-82140 Olching
www.caetec.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Olching
Handelsregister: Amtsgericht München, HRB 183929
Geschäftsführung: Stephan Bacher, Andreas Wocke

Tel.: +49 (0)8142 / 50 13 60
Fax.: +49 (0)8142 / 50 13 69

eMail: tobias.prousa@xxxxxxxxx
Web:   http://www.caetec.de
------------------------------------------------------------

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com