Re: "ceph fs" commands hang forever and kill monitors

Richard Hesketh <richard.hesketh@xxxxxxxxxxxx> · Wed, 27 Sep 2017 13:18:07 +0100

On 27/09/17 12:32, John Spray wrote:
> On Wed, Sep 27, 2017 at 12:15 PM, Richard Hesketh
> <richard.hesketh@xxxxxxxxxxxx> wrote:
>> As the subject says... any ceph fs administrative command I try to run hangs forever and kills monitors in the background - sometimes they come back, on a couple of occasions I had to manually stop/restart a suffering mon. Trying to load the filesystem tab in the ceph-mgr dashboard dumps an error and can also kill a monitor. However, clients can mount the filesystem and read/write data without issue.
>>
>> Relevant excerpt from logs on an affected monitor, just trying to run 'ceph fs ls':
>>
>> 2017-09-26 13:20:50.716087 7fc85fdd9700  0 mon.vm-ds-01@0(leader) e19 handle_command mon_command({"prefix": "fs ls"} v 0) v1
>> 2017-09-26 13:20:50.727612 7fc85fdd9700  0 log_channel(audit) log [DBG] : from='client.? 10.10.10.1:0/2771553898' entity='client.admin' cmd=[{"prefix": "fs ls"}]: dispatch
>> 2017-09-26 13:20:50.950373 7fc85fdd9700 -1 /build/ceph-12.2.0/src/osd/OSDMap.h: In function 'const string& OSDMap::get_pool_name(int64_t) const' thread 7fc85fdd9700 time 2017-09-26 13:20:50.727676
>> /build/ceph-12.2.0/src/osd/OSDMap.h: 1176: FAILED assert(i != pool_name.end())
>>
>>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x55a8ca0bb642]
>>  2: (()+0x48165f) [0x55a8c9f4165f]
>>  3: (MDSMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0x1d18) [0x55a8ca047688]
>>  4: (MDSMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x2a8) [0x55a8ca048008]
>>  5: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x700) [0x55a8c9f9d1b0]
>>  6: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x1f93) [0x55a8c9e63193]
>>  7: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0xa0e) [0x55a8c9e6a52e]
>>  8: (Monitor::_ms_dispatch(Message*)+0x6db) [0x55a8c9e6b57b]
>>  9: (Monitor::ms_dispatch(Message*)+0x23) [0x55a8c9e9a053]
>>  10: (DispatchQueue::entry()+0xf4a) [0x55a8ca3b5f7a]
>>  11: (DispatchQueue::DispatchThread::entry()+0xd) [0x55a8ca16bc1d]
>>  12: (()+0x76ba) [0x7fc86b3ac6ba]
>>  13: (clone()+0x6d) [0x7fc869bd63dd]
>>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>>
>> I'm running Luminous. The cluster and FS have been in service since Hammer and have default data/metadata pool names. I discovered the issue after attempting to enable directory sharding.
> 
> Well that's not good...
> 
> The assertion is because your FSMap is referring to a pool that
> apparently no longer exists in the OSDMap.  This should be impossible
> in current Ceph (we forbid removing pools if they're in use), but
> could perhaps have been caused in an earlier version of Ceph when it
> was possible to remove a pool even if CephFS was referring to it?
> 
> Alternatively, perhaps something more severe is going on that's
> causing your mons to see a wrong/inconsistent view of the world.  Has
> the cluster ever been through any traumatic disaster recovery type
> activity involving hand-editing any of the cluster maps?  What
> intermediate versions has it passed through on the way from Hammer to
> Luminous?
> 
> Opened a ticket here: http://tracker.ceph.com/issues/21568
> 
> John

I've reviewed my notes (i.e. I've grepped my IRC logs); I actually inherited this cluster from a colleague who left shortly after I joined, so unfortunately there is some of its history I cannot fill in.

Turns out the cluster actually predates Firefly. Looking at dates my suspicion is that it went Emperor -> Firefly -> Giant -> Hammer. I inherited it at Hammer, and took it Hammer -> Infernalis -> Jewel -> Luminous myself. I know I did make sure to do the tmap_upgrade step on cephfs but can't remember if I did it at Infernalis or Jewel.

Infernalis was a tricky upgrade; the attempt was aborted once after the first set of OSDs didn't come back up after upgrade (had to remove/downgrade and readd), and setting sortbitwise as the documentation suggested after a successful second attempt caused everything to break and degrade slowly until it was unset and recovered. Never had disaster recovery involve mucking around with the pools while I was administrating it, but unfortunately I cannot speak for the cluster's pre-Hammer history. The only pools I have removed were ones I created temporarily for testing crush rules/benchmarking.

I have hand-edited the crush map (extract, decompile, modify, recompile, inject) at times because I found it more convenient for creating new crush rules than using the CLI tools, but not the OSD map.

Why would the cephfs have been referring to other pools?

Rich

Attachment:
signature.asc

Description: OpenPGP digital signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com