Re: Production 12.2.2 CephFS Cluster still broken, new Details

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 12 Dec 2017 21:08:53 +0800

On Tue, Dec 12, 2017 at 8:29 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx> wrote:
> Hi Zheng,
>
> the more you tell me the more what I see begins to makes sens to me. Thank
> you very much.
>
> Could you please be a little more verbose about how to use rados rmomapky?
> What to use for <name> and what to use for <>. Here is what my dir_frag
> looks like:
>
>     {
>         "damage_type": "dir_frag",
>         "id": 1418581248,
>         "ino": 1099733590290,
>         "frag": "*",
>         "path":
> "/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-backup"
>     }

Find inode number of parent directory
(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/ in this
case), print it in hex. You will get  something like 1000xxxxxxx.

run 'rados -p cephfs_metadatapool listomapkeys 1000xxxxxxx.00000000'

the output should include one entry named safebrowsing-backup_head

run 'rados -p cephfs_metadatapool rmomapkey 1000xxxxxxx.00000000
safebrowsing-backup_head'

before doing rmomapkey, run 'ceph daemon mds.x flush journal'  and
stop mds. you'd better to do this after scrub

>
> I cannot simply remove that dir through filesystem as it refuses to delete
> that folder.
>
> Then you say its easy to fix backtrace, yet here it looks like some
> backtraces get fixed with online MDS scrub while most of them fail to be
> fixed and stay in damage_type "backtrace".
>
> Once again, thank you so much for your help!
>
> Best regards,
> Tobi
>
>
>
>
> On 12/12/2017 01:10 PM, Yan, Zheng wrote:
>>
>> On Tue, Dec 12, 2017 at 4:22 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx>
>> wrote:
>>>
>>> Hi there,
>>>
>>> regarding my ML post from yesterday (Upgrade from 12.2.1 to 12.2.2 broke
>>> my
>>> CephFs) I was able to get a little further with the suggested
>>> "cephfs-table-tool take_inos <max ino>". This made the whole issue with
>>> loads of "falsely free-marked inodes" go away.
>>>
>>> I then restarted MDS, kept all clients down so no client has mounted FS.
>>> Then I started an online MDS scrub
>>>
>>> ceph daemon mds.a scrub_path / recursive repair
>>>
>>> This again ran for about 3 hours, then MDS again marked FS damaged and
>>> changes its own state to standby (at least that is what I interpret from
>>> what I see. This happened exactly at the moment when the scrub hit a
>>> missing
>>> object. See end of logfile (default log level):
>>>
>>> 2017-12-11 22:29:05.725484 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> bad backtrace on inode
>>>
>>> 0x1000d3aede3(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore),
>>> rewriting it
>>> 2017-12-11 22:29:05.725507 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> Scrub error on inode 0x1000d3aede3
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore)
>>> see mds.b log and `damage ls` output for details
>>> 2017-12-11 22:29:05.725569 7fc2342bc700 -1 mds.0.scrubstack
>>> _validate_inode_done scrub error on inode [inode 0x1000d3aede3 [2,head]
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-unwanted-simple.sbstore
>>> auth v382 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
>>> dirtyparent=1
>>> scrubqueue=0 0x55ef37c83200]:
>>>
>>> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aede3:[<0x1000d3aeda7/test-unwanted-simple.sbstore
>>> v382>,<0x10002de79e8/safebrowsing
>>> v7142119>,<0x10002de79df/dsjf5siv.default
>>> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
>>> v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
>>> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
>>> disk;
>>> see
>>>
>>> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
>>> 2017-12-11 22:29:05.729992 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> bad backtrace on inode
>>>
>>> 0x1000d3aedf1(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore),
>>> rewriting it
>>> 2017-12-11 22:29:05.730022 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> Scrub error on inode 0x1000d3aedf1
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore)
>>> see mds.b log and `damage ls` output for details
>>> 2017-12-11 22:29:05.730077 7fc2342bc700 -1 mds.0.scrubstack
>>> _validate_inode_done scrub error on inode [inode 0x1000d3aedf1 [2,head]
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/testexcept-flashsubdoc-simple.sbstore
>>> auth v384 dirtyparent s=232 n(v0 b232 1=1+0) (iversion lock) |
>>> dirtyparent=1
>>> scrubqueue=0 0x55ef3aa38a00]:
>>>
>>> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedf1:[<0x1000d3aeda7/testexcept-flashsubdoc-simple.sbstore
>>> v384>,<0x10002de79e8/safebrowsing
>>> v7142119>,<0x10002de79df/dsjf5siv.default
>>> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
>>> v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
>>> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
>>> disk;
>>> see
>>>
>>> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
>>> 2017-12-11 22:29:05.733389 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> bad backtrace on inode
>>>
>>> 0x1000d3aedb6(/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache),
>>> rewriting it
>>> 2017-12-11 22:29:05.733420 7fc2342bc700  0 log_channel(cluster) log [WRN]
>>> :
>>> Scrub error on inode 0x1000d3aedb6
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache)
>>> see mds.b log and `damage ls` output for details
>>> 2017-12-11 22:29:05.733475 7fc2342bc700 -1 mds.0.scrubstack
>>> _validate_inode_done scrub error on inode [inode 0x1000d3aedb6 [2,head]
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing/test-malware-simple.cache
>>> auth v366 dirtyparent s=44 n(v0 b44 1=1+0) (iversion lock) |
>>> dirtyparent=1
>>> scrubqueue=0 0x55ef37c78a00]:
>>>
>>> {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":false,"read_ret_val":-61,"ondisk_value":"(-1)0x0:[]//","memoryvalue":"(0)0x1000d3aedb6:[<0x1000d3aeda7/test-malware-simple.cache
>>> v366>,<0x10002de79e8/safebrowsing
>>> v7142119>,<0x10002de79df/dsjf5siv.default
>>> v4089757>,<0x10002de79de/firefox v3998050>,<0x10002de79dd/mozilla
>>> v4933047>,<0x100018bd837/.cache v115551644>,<0x10000000000/some_username
>>> v444724510>,<0x1/home v228039388>]//","error_str":"failed to read off
>>> disk;
>>> see
>>>
>>> retval"},"raw_stats":{"checked":false,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f()","ondisk_value.rstat":"n()","memory_value.dirrstat":"f()","memory_value.rstat":"n()","error_str":""},"return_code":-61}
>>> 2017-12-11 22:29:05.772351 7fc2342bc700  0 mds.0.cache.dir(0x1000d3ae112)
>>> _fetched missing object for [dir 0x1000d3ae112
>>>
>>> /home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-to_delete/
>>> [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0
>>> |
>>> waiter=1 authpin=1 0x55eedee27a80]
>>> 2017-12-11 22:29:05.772385 7fc2342bc700 -1 log_channel(cluster) log [ERR]
>>> :
>>> dir 0x1000d3ae112 object missing on disk; some files may be lost
>>>
>>> (/home/some_username/.cache/mozilla/firefox/dsjf5siv.default/safebrowsing-to_delete)
>>> 2017-12-11 22:29:05.778009 7fc2342bc700  1 mds.b respawn
>>> 2017-12-11 22:29:05.778028 7fc2342bc700  1 mds.b  e: '/usr/bin/ceph-mds'
>>> 2017-12-11 22:29:05.778031 7fc2342bc700  1 mds.b  0: '/usr/bin/ceph-mds'
>>> 2017-12-11 22:29:05.778036 7fc2342bc700  1 mds.b  1: '-i'
>>> 2017-12-11 22:29:05.778038 7fc2342bc700  1 mds.b  2: 'b'
>>> 2017-12-11 22:29:05.778040 7fc2342bc700  1 mds.b  3: '--pid-file'
>>> 2017-12-11 22:29:05.778042 7fc2342bc700  1 mds.b  4:
>>> '/var/run/ceph/mds.b.pid'
>>> 2017-12-11 22:29:05.778044 7fc2342bc700  1 mds.b  5: '-c'
>>> 2017-12-11 22:29:05.778046 7fc2342bc700  1 mds.b  6:
>>> '/etc/ceph/ceph.conf'
>>> 2017-12-11 22:29:05.778048 7fc2342bc700  1 mds.b  7: '--cluster'
>>> 2017-12-11 22:29:05.778050 7fc2342bc700  1 mds.b  8: 'ceph'
>>> 2017-12-11 22:29:05.778051 7fc2342bc700  1 mds.b  9: '--setuser'
>>> 2017-12-11 22:29:05.778053 7fc2342bc700  1 mds.b  10: 'ceph'
>>> 2017-12-11 22:29:05.778055 7fc2342bc700  1 mds.b  11: '--setgroup'
>>> 2017-12-11 22:29:05.778057 7fc2342bc700  1 mds.b  12: 'ceph'
>>> 2017-12-11 22:29:05.778104 7fc2342bc700  1 mds.b respawning with exe
>>> /usr/bin/ceph-mds
>>> 2017-12-11 22:29:05.778107 7fc2342bc700  1 mds.b  exe_path /proc/self/exe
>>> 2017-12-11 22:29:06.186020 7f9ad28f41c0  0 ceph version 12.2.2
>>> (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
>>> (unknown), pid 3214
>>> 2017-12-11 22:29:10.604701 7f9acbb38700  1 mds.b handle_mds_map standby
>>>
>>> As long as MDS was still active, "damage ls" again gave me exactly 10001
>>> damages of damage_type "backtrace". Log implies that those backtraces
>>> cannot
>>> be fixed automatically. I could live with losing those 10k files, but I
>>> do
>>> not get why MDS switches to "standby" and marks FS damaged rendering it
>>> offline.
>>> ceph -s then reports something like: mds: cephfs-0/1/1 1:damaged
>>> 1:standby
>>> (not pasted but manually typed from my memory)
>>>
>>> Btw. in the log the MDS encountered two more "object missing on disk;
>>> some
>>> files may be lost" much earlier during that scrub (so three in total),
>>> but
>>> the first two did not make the MDS going to standby.
>>> I marked FS repaired, restarted MDS with mdf debug level 20 and reran a
>>> scrub on that particular path but this time MDS wouldn't mark whole FS
>>> damaged and stayed active. Will it only do so when finding three of those
>>> damages in a row?
>>>
>>> Is this a bug or is there something I would have to do to my cluster to
>>> get
>>> it back to stable working condition? Again, all this began with upgrading
>>> from 12.2.1 to 12.2.2.
>>>
>>> Furthermore, is there a way to get rid of those "broken" files (either
>>> bad
>>> backtrace or even more important those with missing objects) as I could
>>> live
>>> with losing certain files if it helps getting CephFS working stable
>>> again.
>>>
>> due to mds_damage_table_max_entries config,  mds became damaged after
>> it encountered 10000 errors (most errors are bad backtrace). Your
>> cephfs was created before backtrace was introduced. It's likely you
>> didn't create backtrace for all files when upgrading from pre-firefly
>> release (http://ceph.com/geen-categorie/v0-81-released/). The real
>> harmful corruption is "object missing on disk", if the missing object
>> is dirfrag, all files and sub-directories under it become
>> unaccessible. 'cephfs-data-scan scan_inodes' can recover these
>> unaccessible files/directories. If you can live with losing those
>> files/directories, you can use 'rados rmomapkey' to remove inodes with
>> bad objects.
>>
>> good luck
>> Yan, Zheng
>>
>>
>>
>>
>>> Again, any help is highly appreciated, I need to get the FS back up as
>>> soon
>>> as possible. Thank you very much!
>>>
>>> Best regards,
>>> Tobi
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------------
>>> Dipl.-Inf. (FH) Tobias Prousa
>>> Leiter Entwicklung Datenlogger
>>>
>>> CAETEC GmbH
>>> Industriestr. 1
>>> D-82140 Olching
>>> www.caetec.de
>>>
>>> Gesellschaft mit beschränkter Haftung
>>> Sitz der Gesellschaft: Olching
>>> Handelsregister: Amtsgericht München, HRB 183929
>>> Geschäftsführung: Stephan Bacher, Andreas Wocke
>>>
>>> Tel.: +49 (0)8142 / 50 13 60
>>> Fax.: +49 (0)8142 / 50 13 69
>>>
>>> eMail: tobias.prousa@xxxxxxxxx
>>> Web:   http://www.caetec.de
>>> ------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
> --
> -----------------------------------------------------------
> Dipl.-Inf. (FH) Tobias Prousa
> Leiter Entwicklung Datenlogger
>
> CAETEC GmbH
> Industriestr. 1
> D-82140 Olching
> www.caetec.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Olching
> Handelsregister: Amtsgericht München, HRB 183929
> Geschäftsführung: Stephan Bacher, Andreas Wocke
>
> Tel.: +49 (0)8142 / 50 13 60
> Fax.: +49 (0)8142 / 50 13 69
>
> eMail: tobias.prousa@xxxxxxxxx
> Web:   http://www.caetec.de
> ------------------------------------------------------------
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com