Hi Greg, An attempt to recover pg 3.3ef by copying it from broken osd.6 to working osd.32 resulted in one more broken osd :( Here's what was actually done: root at storage1:~# ceph pg 3.3ef list_missing | head { "offset": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -1, "namespace": ""}, "num_missing": 219, "num_unfound": 219, "objects": [ [...] root at storage1:~# ceph pg 3.3ef query [...] "might_have_unfound": [ { "osd": 6, "status": "osd is down"}, { "osd": 19, "status": "already probed"}, { "osd": 32, "status": "already probed"}, { "osd": 42, "status": "already probed"}], [...] # Exporting pg 3.3ef from broken osd.6 root at storage2:~# ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-6/ --journal-path /var/lib/ceph/osd/ssd0/6.journal --pgid 3.3ef --op export --file ~/backup/osd-6.pg-3.3ef.export # Remove an empty pg 3.3ef which was already present on this OSD root at storage2:~# service ceph stop osd.32 root at storage2:~# ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-32/ --journal-path /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove # Import pg 3.3ef from dump root at storage2:~# ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-32/ --journal-path /var/lib/ceph/osd/ssd0/32.journal --op import --file ~/backup/osd-6.pg-3.3ef.export root at storage2:~# service ceph start osd.32 -1> 2014-09-10 18:53:37.196262 7f13fdd7d780 5 osd.32 pg_epoch: 48366 pg[3.3ef(unlocked)] enter Initial 0> 2014-09-10 18:53:37.239479 7f13fdd7d780 -1 *** Caught signal (Aborted) ** in thread 7f13fdd7d780 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) 1: /usr/bin/ceph-osd() [0x8843da] 2: (()+0xfcb0) [0x7f13fcfabcb0] 3: (gsignal()+0x35) [0x7f13fb98a0d5] 4: (abort()+0x17b) [0x7f13fb98d83b] 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f13fc2dc69d] 6: (()+0xb5846) [0x7f13fc2da846] 7: (()+0xb5873) [0x7f13fc2da873] 8: (()+0xb596e) [0x7f13fc2da96e] 9: /usr/bin/ceph-osd() [0x94b34f] 10: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x12c) [0x691b6c] 11: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, std::map<eversion_t, hobject_t, std::less<eversion_t>, std::allocator<std::pair<eversion_t const, hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, std::set<std::string, std::less<std:: string>, std::allocator<std::string> >*)+0x16d4) [0x7d3ef4] 12: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2c1) [0x7951b1] 13: (OSD::load_pgs()+0x18f3) [0x61e143] 14: (OSD::init()+0x1b9a) [0x62726a] 15: (main()+0x1e8d) [0x5d2d0d] 16: (__libc_start_main()+0xed) [0x7f13fb97576d] 17: /usr/bin/ceph-osd() [0x5d69d9] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Fortunately it was possible to bring back osd.32 into a working state simply be removing this pg. root at storage2:~# ceph_objectstore_tool --data-path /var/lib/ceph/osd/ceph-32/ --journal-path /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove Did I miss something from this procedure or does it mean that this pg is definitely lost? Thanks! Fran?ois On 09. 09. 14 00:23, Gregory Farnum wrote: > On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz > <francois at ctrlaltdel.ch> wrote: >> Hi Greg, >> >> Thanks for your support! >> >> On 08. 09. 14 20:20, Gregory Farnum wrote: >> >>> The first one is not caused by the same thing as the ticket you >>> reference (it was fixed well before emperor), so it appears to be some >>> kind of disk corruption. >>> The second one is definitely corruption of some kind as it's missing >>> an OSDMap it thinks it should have. It's possible that you're running >>> into bugs in emperor that were fixed after we stopped doing regular >>> support releases of it, but I'm more concerned that you've got disk >>> corruption in the stores. What kind of crashes did you see previously; >>> are there any relevant messages in dmesg, etc? >> >> Nothing special in dmesg except probably irrelevant XFS warnings: >> >> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) > > Hmm, I'm not sure what the outcome of that could be. Googling for the > error message returns this as the first result, though: > http://comments.gmane.org/gmane.comp.file-systems.xfs.general/58429 > Which indicates that it's a real deadlock and capable of messing up > your OSDs pretty good. > >> >> All logs from before the disaster are still there, do you have any >> advise on what would be relevant? >> >>> Given these issues, you might be best off identifying exactly which >>> PGs are missing, carefully copying them to working OSDs (use the osd >>> store tool), and killing these OSDs. Do lots of backups at each >>> stage... >> >> This sounds scary, I'll keep fingers crossed and will do a bunch of >> backups. There are 17 pg with missing objects. >> >> What do you exactly mean by the osd store tool? Is it the >> 'ceph_filestore_tool' binary? > > Yeah, that one. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com >