Hi John,
Thanks for your pointers, I have extracted the onmap_keys and onmap_values for an object I found in the metadata pool called '600.00000000' and dropped them at the below location
Could you explain how is it possible to identify stray directory fragments?
Thanks
On Thu, Dec 8, 2016 at 6:30 PM, John Spray <jspray@xxxxxxxxxx> wrote:
On Thu, Dec 8, 2016 at 3:45 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
> Hi,
>
> We had no changes going on with the ceph pools or ceph servers at the time.
>
> We have however been hitting this in the last week and it maybe related:
>
> http://tracker.ceph.com/issues/17177
Oh, okay -- so you've got corruption in your metadata pool as a result
of hitting that issue, presumably.
I think in the past people have managed to get past this by taking
their MDSs offline and manually removing the omap entries in their
stray directory fragments (i.e. using the `rados` cli on the objects
starting "600.").
John
> Thanks
>
> On Thu, Dec 8, 2016 at 3:34 PM, John Spray <jspray@xxxxxxxxxx> wrote:
>>
>> On Thu, Dec 8, 2016 at 3:11 PM, Sean Redmond <sean.redmond1@xxxxxxxxx>
>> wrote:
>> > Hi,
>> >
>> > I have a CephFS cluster that is currently unable to start the mds server
>> > as
>> > it is hitting an assert, the extract from the mds log is below, any
>> > pointers
>> > are welcome:
>> >
>> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53c c93f92e65b)
>> >
>> > 2016-12-08 14:50:18.577038 7f7d9faa3700 1 mds.0.47077 handle_mds_map
>> > state
>> > change up:rejoin --> up:active
>> > 2016-12-08 14:50:18.577048 7f7d9faa3700 1 mds.0.47077 recovery_done --
>> > successful recovery!
>> > 2016-12-08 14:50:18.577166 7f7d9faa3700 1 mds.0.47077 active_start
>> > 2016-12-08 14:50:19.460208 7f7d9faa3700 1 mds.0.47077 cluster
>> > recovered.
>> > 2016-12-08 14:50:19.495685 7f7d9abfc700 -1 mds/CDir.cc: In function
>> > 'void
>> > CDir::try_remove_dentries_for_stray()' thread 7f7d9abfc700 time
>> > 2016-12-08
>> > 14:50:19
>> > .494508
>> > mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
>> >
>> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53c c93f92e65b)
>> > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x80) [0x55f0f789def0]
>> > 2: (CDir::try_remove_dentries_for_stray()+0x1a0) [0x55f0f76666c0]
>> > 3: (StrayManager::__eval_stray(CDentry*, bool)+0x8c9) [0x55f0f75e7799]
>> > 4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f0f75e7cf2]
>> > 5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f0f753b30d]
>> > 6: (MDSInternalContextBase::complete(int)+0x18b) [0x55f0f76e93db]
>> > 7: (MDSRank::_advance_queues()+0x6a7) [0x55f0f749bf27]
>> > 8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f0f749c45a]
>> > 9: (()+0x770a) [0x7f7da6bdc70a]
>> > 10: (clone()+0x6d) [0x7f7da509d82d]
>> > NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> > needed to
>> > interpret this.
>>
>> Last time someone had this issue they had tried to create a filesystem
>> using pools that had another filesystem's old objects in:
>> http://tracker.ceph.com/issues/16829
>>
>> What was going on on your system before you hit this?
>>
>> John
>>
>> > Thanks
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
>> >
>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com