osd crash: trim_objectcould not find coid

francois@xxxxxxxxxxxxx (Francois Deppierraz) · Thu, 11 Sep 2014 11:05:51 +0200

Hi Greg,

An attempt to recover pg 3.3ef by copying it from broken osd.6 to
working osd.32 resulted in one more broken osd :(

Here's what was actually done:

root at storage1:~# ceph pg 3.3ef list_missing | head
{ "offset": { "oid": "",
      "key": "",
      "snapid": 0,
      "hash": 0,
      "max": 0,
      "pool": -1,
      "namespace": ""},
  "num_missing": 219,
  "num_unfound": 219,
  "objects": [
[...]
root at storage1:~# ceph pg 3.3ef query
[...]
          "might_have_unfound": [
                { "osd": 6,
                  "status": "osd is down"},
                { "osd": 19,
                  "status": "already probed"},
                { "osd": 32,
                  "status": "already probed"},
                { "osd": 42,
                  "status": "already probed"}],
[...]

# Exporting pg 3.3ef from broken osd.6

root at storage2:~# ceph_objectstore_tool --data-path
/var/lib/ceph/osd/ceph-6/ --journal-path
/var/lib/ceph/osd/ssd0/6.journal --pgid 3.3ef --op export --file
~/backup/osd-6.pg-3.3ef.export

# Remove an empty pg 3.3ef which was already present on this OSD

root at storage2:~# service ceph stop osd.32
root at storage2:~# ceph_objectstore_tool --data-path
/var/lib/ceph/osd/ceph-32/ --journal-path
/var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove

# Import pg 3.3ef from dump

root at storage2:~# ceph_objectstore_tool --data-path
/var/lib/ceph/osd/ceph-32/ --journal-path
/var/lib/ceph/osd/ssd0/32.journal --op import --file
~/backup/osd-6.pg-3.3ef.export
root at storage2:~# service ceph start osd.32

    -1> 2014-09-10 18:53:37.196262 7f13fdd7d780  5 osd.32 pg_epoch:
48366 pg[3.3ef(unlocked)] enter Initial
     0> 2014-09-10 18:53:37.239479 7f13fdd7d780 -1 *** Caught signal
(Aborted) **
 in thread 7f13fdd7d780

 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
 1: /usr/bin/ceph-osd() [0x8843da]
 2: (()+0xfcb0) [0x7f13fcfabcb0]
 3: (gsignal()+0x35) [0x7f13fb98a0d5]
 4: (abort()+0x17b) [0x7f13fb98d83b]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f13fc2dc69d]
 6: (()+0xb5846) [0x7f13fc2da846]
 7: (()+0xb5873) [0x7f13fc2da873]
 8: (()+0xb596e) [0x7f13fc2da96e]
 9: /usr/bin/ceph-osd() [0x94b34f]
 10:
(pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x12c)
[0x691b6c]
 11: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&,
std::map<eversion_t, hobject_t, std::less<eversion_t>,
std::allocator<std::pair<eversion_t const,
 hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
std::basic_ostringstream<char, std::char_traits<char>,
std::allocator<char> >&, std::set<std::string, std::less<std::
string>, std::allocator<std::string> >*)+0x16d4) [0x7d3ef4]
 12: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2c1) [0x7951b1]
 13: (OSD::load_pgs()+0x18f3) [0x61e143]
 14: (OSD::init()+0x1b9a) [0x62726a]
 15: (main()+0x1e8d) [0x5d2d0d]
 16: (__libc_start_main()+0xed) [0x7f13fb97576d]
 17: /usr/bin/ceph-osd() [0x5d69d9]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

Fortunately it was possible to bring back osd.32 into a working state
simply be removing this pg.

root at storage2:~# ceph_objectstore_tool --data-path
/var/lib/ceph/osd/ceph-32/ --journal-path
/var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove

Did I miss something from this procedure or does it mean that this pg is
definitely lost?

Thanks!

Fran?ois

On 09. 09. 14 00:23, Gregory Farnum wrote:
> On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz
> <francois at ctrlaltdel.ch> wrote:
>> Hi Greg,
>>
>> Thanks for your support!
>>
>> On 08. 09. 14 20:20, Gregory Farnum wrote:
>>
>>> The first one is not caused by the same thing as the ticket you
>>> reference (it was fixed well before emperor), so it appears to be some
>>> kind of disk corruption.
>>> The second one is definitely corruption of some kind as it's missing
>>> an OSDMap it thinks it should have. It's possible that you're running
>>> into bugs in emperor that were fixed after we stopped doing regular
>>> support releases of it, but I'm more concerned that you've got disk
>>> corruption in the stores. What kind of crashes did you see previously;
>>> are there any relevant messages in dmesg, etc?
>>
>> Nothing special in dmesg except probably irrelevant XFS warnings:
>>
>> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> 
> Hmm, I'm not sure what the outcome of that could be. Googling for the
> error message returns this as the first result, though:
> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/58429
> Which indicates that it's a real deadlock and capable of messing up
> your OSDs pretty good.
> 
>>
>> All logs from before the disaster are still there, do you have any
>> advise on what would be relevant?
>>
>>> Given these issues, you might be best off identifying exactly which
>>> PGs are missing, carefully copying them to working OSDs (use the osd
>>> store tool), and killing these OSDs. Do lots of backups at each
>>> stage...
>>
>> This sounds scary, I'll keep fingers crossed and will do a bunch of
>> backups. There are 17 pg with missing objects.
>>
>> What do you exactly mean by the osd store tool? Is it the
>> 'ceph_filestore_tool' binary?
> 
> Yeah, that one.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>