Re: CephFS FAILED assert(dn->get_linkage()->is_null())

John Spray <jspray@xxxxxxxxxx> · Mon, 12 Dec 2016 12:17:33 +0000

On Sat, Dec 10, 2016 at 1:50 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
> Hi Goncarlo,
>
> With the output from "ceph tell mds.0 damage ls" we tracked the inodes of
> two damaged directories using 'find /mnt/ceph/ -inum $inode', after
> reviewing the paths involved we confirmed a backup was availble for this
> data so we ran "ceph tell mds.0 damage rm $inode" on the two inodes. We then
> marked the mds as repaired "ceph mds repaired 0".

You're going to see the damage pop back up again just as soon as you
touch the file in question.  "damage rm" doesn't fix anything, it just
removes the record of damage (i.e. it's how you tell the MDS "I fixed
this for you").

"mds repaired" is for when a rank is entirely offline due to
catastrophic damage (i.e. something too bad for the damage table to
report nicely from a live MDS) -- it will presumably have been a no-op
for you.

Can you say exactly what operations you have done and exactly what
damage is being reported?

How did you conclude that your journal was corrupt?

John

P.S. I went quiet at the end of last week because I was out of the
office, it's not that I don't care :-)
P.P.S. Any chance you guys could use your work mail addresses?  It's
not always obvious that a series of different people posting from
@gmail.com addresses are working on the same system.

> We have restarted the mds to confirm it is not htting any asserts, we are
> now just enabling scrubs and running a "ls -R /mnt/ceph" to see if we hit
> any further problems.
>
> Thanks
>
> On Fri, Dec 9, 2016 at 11:37 PM, Chris Sarginson <csargiso@xxxxxxxxx> wrote:
>>
>> Hi Goncarlo,
>>
>> In the end we ascertained that the assert was coming from reading corrupt
>> data in the mds journal.  We have followed the sections at the following
>> link (http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/) in order
>> down to (and including) MDS Table wipes (only wiping the "session" table in
>> the final step).  This resolved the problem we had with our mds asserting.
>>
>> We have also run a cephfs scrub to validate the data (ceph daemon mds.0
>> scrub_path / recursive repair), which has resulted in "metadata damage
>> detected" health warning.  This seems to perform a read of all objects
>> involved in cephfs rados pools (anecdotal: performance of the scan against
>> the data pool was much faster to process than the metadata pool itself).
>>
>> We are now working with the output of "ceph tell mds.0 damage ls", and
>> looking at the following mailing list post as a starting point for
>> proceeding with that:
>> http://ceph-users.ceph.narkive.com/EfFTUPyP/how-to-fix-the-mds-damaged-issue
>>
>> Chris
>>
>> On Fri, 9 Dec 2016 at 19:26 Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx>
>> wrote:
>>>
>>> Hi Sean, Rob.
>>>
>>> I saw on the tracker that you were able to resolve the mds assert by
>>> manually cleaning the corrupted metadata. Since I am also hitting that issue
>>> and I suspect that i will face an mds assert of the same type sooner or
>>> later, can you please explain a bit further what operations did you do to
>>> clean the problem?
>>> Cheers
>>> Goncalo
>>> ________________________________________
>>> From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of Rob
>>> Pickerill [r.pickerill@xxxxxxxxx]
>>> Sent: 09 December 2016 07:13
>>> To: Sean Redmond; John Spray
>>> Cc: ceph-users
>>> Subject: Re:  CephFS FAILED
>>> assert(dn->get_linkage()->is_null())
>>>
>>> Hi John / All
>>>
>>> Thank you for the help so far.
>>>
>>> To add a further point to Sean's previous email, I see this log entry
>>> before the assertion failure:
>>>
>>>     -6> 2016-12-08 15:47:08.483700 7fb133dca700 12
>>> mds.0.cache.dir(1000a453344) remove_dentry [dentry
>>> #100/stray9/1000a453344/config [2,head] auth NULL (dver
>>> sion lock) v=540 inode=0 0x55e8664fede0]
>>>     -5> 2016-12-08 15:47:08.484882 7fb133dca700 -1 mds/CDir.cc: In
>>> function 'void CDir::try_remove_dentries_for_stray()' thread 7fb133dca700
>>> time 2016-12-08
>>> 15:47:08.483704
>>> mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
>>>
>>> And I can reference this with:
>>>
>>> root@ceph-mon1:~/1000a453344# rados -p ven-ceph-metadata-1 listomapkeys
>>> 1000a453344.00000000
>>> 1470734502_head
>>> config_head
>>>
>>> Would we also need to clean up this object, if so is there a safe we can
>>> do this?
>>>
>>> Rob
>>>
>>> On Thu, 8 Dec 2016 at 19:58 Sean Redmond
>>> <sean.redmond1@xxxxxxxxx<mailto:sean.redmond1@xxxxxxxxx>> wrote:
>>> Hi John,
>>>
>>> Thanks for your pointers, I have extracted the onmap_keys and
>>> onmap_values for an object I found in the metadata pool called
>>> '600.00000000' and dropped them at the below location
>>>
>>> https://www.dropbox.com/sh/wg6irrjg7kie95p/AABk38IB4PXsn2yINpNa9Js5a?dl=0
>>>
>>> Could you explain how is it possible to identify stray directory
>>> fragments?
>>>
>>> Thanks
>>>
>>> On Thu, Dec 8, 2016 at 6:30 PM, John Spray
>>> <jspray@xxxxxxxxxx<mailto:jspray@xxxxxxxxxx>> wrote:
>>> On Thu, Dec 8, 2016 at 3:45 PM, Sean Redmond
>>> <sean.redmond1@xxxxxxxxx<mailto:sean.redmond1@xxxxxxxxx>> wrote:
>>> > Hi,
>>> >
>>> > We had no changes going on with the ceph pools or ceph servers at the
>>> > time.
>>> >
>>> > We have however been hitting this in the last week and it maybe
>>> > related:
>>> >
>>> > http://tracker.ceph.com/issues/17177
>>>
>>> Oh, okay -- so you've got corruption in your metadata pool as a result
>>> of hitting that issue, presumably.
>>>
>>> I think in the past people have managed to get past this by taking
>>> their MDSs offline and manually removing the omap entries in their
>>> stray directory fragments (i.e. using the `rados` cli on the objects
>>> starting "600.").
>>>
>>> John
>>>
>>>
>>>
>>> > Thanks
>>> >
>>> > On Thu, Dec 8, 2016 at 3:34 PM, John Spray
>>> > <jspray@xxxxxxxxxx<mailto:jspray@xxxxxxxxxx>> wrote:
>>> >>
>>> >> On Thu, Dec 8, 2016 at 3:11 PM, Sean Redmond
>>> >> <sean.redmond1@xxxxxxxxx<mailto:sean.redmond1@xxxxxxxxx>>
>>> >> wrote:
>>> >> > Hi,
>>> >> >
>>> >> > I have a CephFS cluster that is currently unable to start the mds
>>> >> > server
>>> >> > as
>>> >> > it is hitting an assert, the extract from the mds log is below, any
>>> >> > pointers
>>> >> > are welcome:
>>> >> >
>>> >> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>>> >> >
>>> >> > 2016-12-08 14:50:18.577038 7f7d9faa3700  1 mds.0.47077
>>> >> > handle_mds_map
>>> >> > state
>>> >> > change up:rejoin --> up:active
>>> >> > 2016-12-08 14:50:18.577048 7f7d9faa3700  1 mds.0.47077 recovery_done
>>> >> > --
>>> >> > successful recovery!
>>> >> > 2016-12-08 14:50:18.577166 7f7d9faa3700  1 mds.0.47077 active_start
>>> >> > 2016-12-08 14:50:19.460208 7f7d9faa3700  1 mds.0.47077 cluster
>>> >> > recovered.
>>> >> > 2016-12-08 14:50:19.495685 7f7d9abfc700 -1 mds/CDir.cc: In function
>>> >> > 'void
>>> >> > CDir::try_remove_dentries_for_stray()' thread 7f7d9abfc700 time
>>> >> > 2016-12-08
>>> >> > 14:50:19
>>> >> > .494508
>>> >> > mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
>>> >> >
>>> >> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>>> >> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> >> > const*)+0x80) [0x55f0f789def0]
>>> >> >  2: (CDir::try_remove_dentries_for_stray()+0x1a0) [0x55f0f76666c0]
>>> >> >  3: (StrayManager::__eval_stray(CDentry*, bool)+0x8c9)
>>> >> > [0x55f0f75e7799]
>>> >> >  4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f0f75e7cf2]
>>> >> >  5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f0f753b30d]
>>> >> >  6: (MDSInternalContextBase::complete(int)+0x18b) [0x55f0f76e93db]
>>> >> >  7: (MDSRank::_advance_queues()+0x6a7) [0x55f0f749bf27]
>>> >> >  8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f0f749c45a]
>>> >> >  9: (()+0x770a) [0x7f7da6bdc70a]
>>> >> >  10: (clone()+0x6d) [0x7f7da509d82d]
>>> >> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> >> > needed to
>>> >> > interpret this.
>>> >>
>>> >> Last time someone had this issue they had tried to create a filesystem
>>> >> using pools that had another filesystem's old objects in:
>>> >> http://tracker.ceph.com/issues/16829
>>> >>
>>> >> What was going on on your system before you hit this?
>>> >>
>>> >> John
>>> >>
>>> >> > Thanks
>>> >> >
>>> >> > _______________________________________________
>>> >> > ceph-users mailing list
>>> >> > ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> >
>>> >
>>> >
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com