Hi, Following-up this issue, I've identified that almost all unfound objects belongs to a single RBD volume (with the help of the script below). Now what's the best way to try to recover the filesystem stored on this RBD volume? 'mark_unfound_lost revert' or 'mark_unfound_lost lost' and then running fsck? By the way, I'm also still interested to know whether the procedure I've tried with ceph_objectstore_tool was correct? Thanks! Fran?ois [1] ceph-list-unfound.sh #!/bin/sh for pg in $(ceph health detail | awk '/unfound$/ { print $2; }'); do ceph pg $pg list_missing | jq .objects done | jq -s add | jq '.[] | .oid.oid' On 11. 09. 14 11:05, Francois Deppierraz wrote: > Hi Greg, > > An attempt to recover pg 3.3ef by copying it from broken osd.6 to > working osd.32 resulted in one more broken osd :( > > Here's what was actually done: > > root at storage1:~# ceph pg 3.3ef list_missing | head > { "offset": { "oid": "", > "key": "", > "snapid": 0, > "hash": 0, > "max": 0, > "pool": -1, > "namespace": ""}, > "num_missing": 219, > "num_unfound": 219, > "objects": [ > [...] > root at storage1:~# ceph pg 3.3ef query > [...] > "might_have_unfound": [ > { "osd": 6, > "status": "osd is down"}, > { "osd": 19, > "status": "already probed"}, > { "osd": 32, > "status": "already probed"}, > { "osd": 42, > "status": "already probed"}], > [...] > > # Exporting pg 3.3ef from broken osd.6 > > root at storage2:~# ceph_objectstore_tool --data-path > /var/lib/ceph/osd/ceph-6/ --journal-path > /var/lib/ceph/osd/ssd0/6.journal --pgid 3.3ef --op export --file > ~/backup/osd-6.pg-3.3ef.export > > # Remove an empty pg 3.3ef which was already present on this OSD > > root at storage2:~# service ceph stop osd.32 > root at storage2:~# ceph_objectstore_tool --data-path > /var/lib/ceph/osd/ceph-32/ --journal-path > /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove > > # Import pg 3.3ef from dump > > root at storage2:~# ceph_objectstore_tool --data-path > /var/lib/ceph/osd/ceph-32/ --journal-path > /var/lib/ceph/osd/ssd0/32.journal --op import --file > ~/backup/osd-6.pg-3.3ef.export > root at storage2:~# service ceph start osd.32 > > -1> 2014-09-10 18:53:37.196262 7f13fdd7d780 5 osd.32 pg_epoch: > 48366 pg[3.3ef(unlocked)] enter Initial > 0> 2014-09-10 18:53:37.239479 7f13fdd7d780 -1 *** Caught signal > (Aborted) ** > in thread 7f13fdd7d780 > > ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60) > 1: /usr/bin/ceph-osd() [0x8843da] > 2: (()+0xfcb0) [0x7f13fcfabcb0] > 3: (gsignal()+0x35) [0x7f13fb98a0d5] > 4: (abort()+0x17b) [0x7f13fb98d83b] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f13fc2dc69d] > 6: (()+0xb5846) [0x7f13fc2da846] > 7: (()+0xb5873) [0x7f13fc2da873] > 8: (()+0xb596e) [0x7f13fc2da96e] > 9: /usr/bin/ceph-osd() [0x94b34f] > 10: > (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x12c) > [0x691b6c] > 11: (PGLog::read_log(ObjectStore*, coll_t, hobject_t, pg_info_t const&, > std::map<eversion_t, hobject_t, std::less<eversion_t>, > std::allocator<std::pair<eversion_t const, > hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&, > std::basic_ostringstream<char, std::char_traits<char>, > std::allocator<char> >&, std::set<std::string, std::less<std:: > string>, std::allocator<std::string> >*)+0x16d4) [0x7d3ef4] > 12: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2c1) [0x7951b1] > 13: (OSD::load_pgs()+0x18f3) [0x61e143] > 14: (OSD::init()+0x1b9a) [0x62726a] > 15: (main()+0x1e8d) [0x5d2d0d] > 16: (__libc_start_main()+0xed) [0x7f13fb97576d] > 17: /usr/bin/ceph-osd() [0x5d69d9] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is > needed to interpret this. > > Fortunately it was possible to bring back osd.32 into a working state > simply be removing this pg. > > root at storage2:~# ceph_objectstore_tool --data-path > /var/lib/ceph/osd/ceph-32/ --journal-path > /var/lib/ceph/osd/ssd0/32.journal --pgid 3.3ef --op remove > > Did I miss something from this procedure or does it mean that this pg is > definitely lost? > > Thanks! > > Fran?ois > > On 09. 09. 14 00:23, Gregory Farnum wrote: >> On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz >> <francois at ctrlaltdel.ch> wrote: >>> Hi Greg, >>> >>> Thanks for your support! >>> >>> On 08. 09. 14 20:20, Gregory Farnum wrote: >>> >>>> The first one is not caused by the same thing as the ticket you >>>> reference (it was fixed well before emperor), so it appears to be some >>>> kind of disk corruption. >>>> The second one is definitely corruption of some kind as it's missing >>>> an OSDMap it thinks it should have. It's possible that you're running >>>> into bugs in emperor that were fixed after we stopped doing regular >>>> support releases of it, but I'm more concerned that you've got disk >>>> corruption in the stores. What kind of crashes did you see previously; >>>> are there any relevant messages in dmesg, etc? >>> >>> Nothing special in dmesg except probably irrelevant XFS warnings: >>> >>> XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) >> >> Hmm, I'm not sure what the outcome of that could be. Googling for the >> error message returns this as the first result, though: >> http://comments.gmane.org/gmane.comp.file-systems.xfs.general/58429 >> Which indicates that it's a real deadlock and capable of messing up >> your OSDs pretty good. >> >>> >>> All logs from before the disaster are still there, do you have any >>> advise on what would be relevant? >>> >>>> Given these issues, you might be best off identifying exactly which >>>> PGs are missing, carefully copying them to working OSDs (use the osd >>>> store tool), and killing these OSDs. Do lots of backups at each >>>> stage... >>> >>> This sounds scary, I'll keep fingers crossed and will do a bunch of >>> backups. There are 17 pg with missing objects. >>> >>> What do you exactly mean by the osd store tool? Is it the >>> 'ceph_filestore_tool' binary? >> >> Yeah, that one. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >