Hi John / All
Thank you for the help so far.
To add a further point to Sean's previous email, I see this log entry before the assertion failure:
-6> 2016-12-08 15:47:08.483700 7fb133dca700 12 mds.0.cache.dir(1000a453344) remove_dentry [dentry #100/stray9/1000a453344/config [2,head] auth NULL (dver
sion lock) v=540 inode=0 0x55e8664fede0]
-5> 2016-12-08 15:47:08.484882 7fb133dca700 -1 mds/CDir.cc: In function 'void CDir::try_remove_dentries_for_stray()' thread 7fb133dca700 time 2016-12-08
15:47:08.483704
mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
And I can reference this with:
root@ceph-mon1:~/1000a453344# rados -p ven-ceph-metadata-1 listomapkeys 1000a453344.00000000
1470734502_head
config_head
Would we also need to clean up this object, if so is there a safe we can do this?
Rob
On Thu, 8 Dec 2016 at 19:58 Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
Hi John,Thanks for your pointers, I have extracted the onmap_keys and onmap_values for an object I found in the metadata pool called '600.00000000' and dropped them at the below locationCould you explain how is it possible to identify stray directory fragments?Thanks_______________________________________________On Thu, Dec 8, 2016 at 6:30 PM, John Spray <jspray@xxxxxxxxxx> wrote:On Thu, Dec 8, 2016 at 3:45 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
> Hi,
>
> We had no changes going on with the ceph pools or ceph servers at the time.
>
> We have however been hitting this in the last week and it maybe related:
>
> http://tracker.ceph.com/issues/17177
Oh, okay -- so you've got corruption in your metadata pool as a result
of hitting that issue, presumably.
I think in the past people have managed to get past this by taking
their MDSs offline and manually removing the omap entries in their
stray directory fragments (i.e. using the `rados` cli on the objects
starting "600.").
John
> Thanks
>
> On Thu, Dec 8, 2016 at 3:34 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>
>> On Thu, Dec 8, 2016 at 3:11 PM, Sean Redmond <sean.redmond1@xxxxxxxxx>
>> wrote:
>> > Hi,
>> >
>> > I have a CephFS cluster that is currently unable to start the mds server
>> > as
>> > it is hitting an assert, the extract from the mds log is below, any
>> > pointers
>> > are welcome:
>> >
>> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> >
>> > 2016-12-08 14:50:18.577038 7f7d9faa3700 1 mds.0.47077 handle_mds_map
>> > state
>> > change up:rejoin --> up:active
>> > 2016-12-08 14:50:18.577048 7f7d9faa3700 1 mds.0.47077 recovery_done --
>> > successful recovery!
>> > 2016-12-08 14:50:18.577166 7f7d9faa3700 1 mds.0.47077 active_start
>> > 2016-12-08 14:50:19.460208 7f7d9faa3700 1 mds.0.47077 cluster
>> > recovered.
>> > 2016-12-08 14:50:19.495685 7f7d9abfc700 -1 mds/CDir.cc: In function
>> > 'void
>> > CDir::try_remove_dentries_for_stray()' thread 7f7d9abfc700 time
>> > 2016-12-08
>> > 14:50:19
>> > .494508
>> > mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
>> >
>> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x80) [0x55f0f789def0]
>> > 2: (CDir::try_remove_dentries_for_stray()+0x1a0) [0x55f0f76666c0]
>> > 3: (StrayManager::__eval_stray(CDentry*, bool)+0x8c9) [0x55f0f75e7799]
>> > 4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f0f75e7cf2]
>> > 5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f0f753b30d]
>> > 6: (MDSInternalContextBase::complete(int)+0x18b) [0x55f0f76e93db]
>> > 7: (MDSRank::_advance_queues()+0x6a7) [0x55f0f749bf27]
>> > 8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f0f749c45a]
>> > 9: (()+0x770a) [0x7f7da6bdc70a]
>> > 10: (clone()+0x6d) [0x7f7da509d82d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to
>> > interpret this.
>>
>> Last time someone had this issue they had tried to create a filesystem
>> using pools that had another filesystem's old objects in:
>> http://tracker.ceph.com/issues/16829
>>
>> What was going on on your system before you hit this?
>>
>> John
>>
>> > Thanks
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com