On Fri, Sep 12, 2014 at 4:41 AM, Francois Deppierraz <francois at ctrlaltdel.ch> wrote: > Hi, > > Following-up this issue, I've identified that almost all unfound objects > belongs to a single RBD volume (with the help of the script below). > > Now what's the best way to try to recover the filesystem stored on this > RBD volume? > > 'mark_unfound_lost revert' or 'mark_unfound_lost lost' and then running > fsck? > > By the way, I'm also still interested to know whether the procedure I've > tried with ceph_objectstore_tool was correct? Yeah, that was the correct procedure. I believe you should just need to mark osd.6 as lost and remove it from the cluster and it will give up on getting the pg back. (You may also need to force_create_pgs or something; I don't recall. The docs should discuss that, though.) Once you've given up on the objects, recovering data from rbd images which included them is just like recovering from a lost hard drive sector or whatever. Hopefully fsck in the VM leaves you with a working filesystem, and however many files are still present... -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > Thanks! > > Fran?ois > > [1] ceph-list-unfound.sh > > #!/bin/sh > for pg in $(ceph health detail | awk '/unfound$/ { print $2; }'); do > ceph pg $pg list_missing | jq .objects > done | jq -s add | jq '.[] | .oid.oid' > > > On 11. 09. 14 11:05, Francois Deppierraz wrote: >> Hi Greg, >> >> An attempt to recover pg 3.3ef by copying it from broken osd.6 to >> working osd.32 resulted in one more broken osd :( >> >> Here's what was actually done: >> >> root at storage1:~# ceph pg 3.3ef list_missing | head >> { "offset": { "oid": "", >> "key": "", >> "snapid": 0, >> "hash": 0, >> "max": 0, >> "pool": -1, >> "namespace": ""}, >> "num_missing": 219, >> "num_unfound": 219, >> "objects": [ >> [...] >> root at storage1:~# ceph pg 3.3ef query >> [...] >> "might_have_unfound": [ >> { "osd": 6, >> "status": "osd is down"}, >> { "osd": 19, >> "status": "already probed"}, >> { "osd": 32, >> "status": "already probed"}, >> { "osd": 42, >> "status": "already probed"}], >> [...] >> >> # Exporting pg 3.3ef from broken osd.6 >> >> root at storage2:~# ceph_objectstore_tool --data-path >> /var/lib/ceph/osd/ceph-6/ --journal-path >> /var/lib/ceph/osd/ssd0/6.journal --pgid 3.3ef --op export --file >> ~/backup/osd-6.pg-3.3ef.export >> >> # Remove an empty pg 3.3ef which was already present on this OSD >> >> root at storage2:~# service ceph stop osd.32 >> root at storage2:~# ceph_objectstore_tool --data-path >> /var/lib/ceph/osd/ceph-32/ --journal-path >> /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove >> >> # Import pg 3.3ef from dump >> >> root at storage2:~# ceph_objectstore_tool --data-path >> /var/lib/ceph/osd/ceph-32/ --journal-path >> /var/lib/ceph/osd/ssd0/32.journal --op import --file >> ~/backup/osd-6.pg-3.3ef.export >> root at storage2:~# service ceph start osd.32 >> >> -1> 2014-09-10 18:53:37.196262 7f13fdd7d780 5 osd.32 pg_epoch: >> 48366 pg[3.3ef(unlocked)] enter Initial >> 0> 2014-09-10 18:53:37.239479 7f13fdd7d780 -1 *** Caught signal >> (Aborted) ** >> in thread 7f13fdd7d780 >> >> ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) >> 1: /usr/bin/ceph-osd() [0x8843da] >> 2: (()+0xfcb0) [0x7f13fcfabcb0] >> 3: (gsignal()+0x35) [0x7f13fb98a0d5] >> 4: (abort()+0x17b) [0x7f13fb98d83b] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f13fc2dc69d] >> 6: (()+0xb5846) [0x7f13fc2da846] >> 7: (()+0xb5873) [0x7f13fc2da873] >> 8: (()+0xb596e) [0x7f13fc2da96e] >> 9: /usr/bin/ceph-osd() [0x94b34f] >> 10: >> (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x12c) >> [0x691b6c] >> 11: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, >> std::map<eversion_t, hobject_t, std::less<eversion_t>, >> std::allocator<std::pair<eversion_t const, >> hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, >> std::basic_ostringstream<char, std::char_traits<char>, >> std::allocator<char> >&, std::set<std::string, std::less<std:: >> string>, std::allocator<std::string> >*)+0x16d4) [0x7d3ef4] >> 12: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2c1) [0x7951b1] >> 13: (OSD::load_pgs()+0x18f3) [0x61e143] >> 14: (OSD::init()+0x1b9a) [0x62726a] >> 15: (main()+0x1e8d) [0x5d2d0d] >> 16: (__libc_start_main()+0xed) [0x7f13fb97576d] >> 17: /usr/bin/ceph-osd() [0x5d69d9] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >> needed to interpret this. >> >> Fortunately it was possible to bring back osd.32 into a working state >> simply be removing this pg. >> >> root at storage2:~# ceph_objectstore_tool --data-path >> /var/lib/ceph/osd/ceph-32/ --journal-path >> /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove >> >> Did I miss something from this procedure or does it mean that this pg is >> definitely lost? >> >> Thanks! >> >> Fran?ois >> >> On 09. 09. 14 00:23, Gregory Farnum wrote: >>> On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz >>> <francois at ctrlaltdel.ch> wrote: >>>> Hi Greg, >>>> >>>> Thanks for your support! >>>> >>>> On 08. 09. 14 20:20, Gregory Farnum wrote: >>>> >>>>> The first one is not caused by the same thing as the ticket you >>>>> reference (it was fixed well before emperor), so it appears to be some >>>>> kind of disk corruption. >>>>> The second one is definitely corruption of some kind as it's missing >>>>> an OSDMap it thinks it should have. It's possible that you're running >>>>> into bugs in emperor that were fixed after we stopped doing regular >>>>> support releases of it, but I'm more concerned that you've got disk >>>>> corruption in the stores. What kind of crashes did you see previously; >>>>> are there any relevant messages in dmesg, etc? >>>> >>>> Nothing special in dmesg except probably irrelevant XFS warnings: >>>> >>>> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) >>> >>> Hmm, I'm not sure what the outcome of that could be. Googling for the >>> error message returns this as the first result, though: >>> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/58429 >>> Which indicates that it's a real deadlock and capable of messing up >>> your OSDs pretty good. >>> >>>> >>>> All logs from before the disaster are still there, do you have any >>>> advise on what would be relevant? >>>> >>>>> Given these issues, you might be best off identifying exactly which >>>>> PGs are missing, carefully copying them to working OSDs (use the osd >>>>> store tool), and killing these OSDs. Do lots of backups at each >>>>> stage... >>>> >>>> This sounds scary, I'll keep fingers crossed and will do a bunch of >>>> backups. There are 17 pg with missing objects. >>>> >>>> What do you exactly mean by the osd store tool? Is it the >>>> 'ceph_filestore_tool' binary? >>> >>> Yeah, that one. >>> -Greg >>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>> >> >