I am in the process of doing exactly what you are -- this worked for me:
1. mount the first partition of the bluestore drive that holds the missing PGs (if it's not already mounted)
> mkdir /mnt/tmp
> mount /dev/sdb1 /mnt/tmp
2. export the pg to a suitable temporary storage location:
> ceph-objectstore-tool --data-path /mnt/tmp --pgid 1.24 --op export --file /mnt/sdd1/recover.1.24
3. find the acting osd
> ceph health detail |grep incomplete
PG_DEGRADED Degraded data redundancy: 23 pgs unclean, 23 pgs incomplete
pg 1.24 is incomplete, acting [18,13]
pg 4.1f is incomplete, acting [11]
...
4. set noout
> ceph osd set noout
5. Find the OSD and log into it -- I used 18 here.
> ceph osd find 18
{
"osd": 18,
"ip": "10.0.15.54:6801/9263",
"crush_location": {
"building": "building-dc",
"chassis": "chassis-dc400f5-10",
"city": "city",
"floor": "floor-dc4",
"host": "stor-vm4",
"rack": "rack-dc400f5",
"region": "cfl",
"room": "room-dc400",
"root": "default",
"row": "row-dc400f"
}
}
> ssh user@10.0.15.54
6. copy the file to somewhere accessible by the new(acting) osd
> scp user@10.0.14.51:/mnt/sdd1/recover.1.24 /tmp/recover.1.24
7. stop the osd
> service ceph-osd@18 stop
8. import the file using ceph-objectstore-tool
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-18 --op import --file /tmp/recover.1.24
9. start the osd
> service-osd@18 start
this worked for me -- not sure if this is the best way or if I took any extra steps and I have yet to validate that the data is good.
I based this partially off your original email, and the guide here http://ceph.com/geen-categorie/incomplete-pgs-oh-my/
On Sat, Jul 22, 2017 at 4:46 PM, mofta7y <mofta7y@xxxxxxxxx> wrote:
Hi All,
I have a situation here.
I have an EC pool that is having cache tier pool (the cache tier is replicated with size 2).
Had an issue on the pool and the crush map got changed after rebooting some OSD in any case I lost 4 cache ties OSDs
those lost OSDs are not really lost they look fine to me but bluestore is giving me exception when starting them i cant deal with it. (will open question about that exception as well)
So now i have 14 incomplete Pgs on the caching tier.
I am trying to recover them using ceph-objectstore-tool
the extraction and import works nice with no issues but the OSD fail to start after wards with same issue as the original OSD .
after importing the PG on the acting OSD i get the exact same exception I was getting while trying to start the failed OSD
removing that import resolve the issue.
So the question is how can use ceph-objectstore-tool to import in bluestore as i think i am missing somthing here
here is the procedure and the steps i used
1- stop old osd (it cannot start anyway)
2- use this command to extract the pg i need
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-116 --pgid 15.371 --op export --file /tmp/recover.15.371
that command work
3- check what is the acting OSD for the pg
4- stop the acting OSD
5- delete the current folder with same og name
6- use this command
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-78 --op import /tmp/recover.15.371
the error i got in both cases is this bluestore error
Jul 22 16:35:20 alm9 ceph-osd[3799171]: -257> 2017-07-22 16:20:19.544195 7f7157036a40 -1 osd.116 119691 log_to_monitors {default=true}
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 0> 2017-07-22 16:35:20.142143 7f713c597700 -1 /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: In function 'virtual int BitMapAllocator::reserve(uint6 4_t)' thread 7f713c597700 time 2017-07-22 16:35:20.139309
Jul 22 16:35:20 alm9 ceph-osd[3799171]: /tmp/buildd/ceph-11.2.0/src/os/bluestore/BitMapAllocator.cc: 82: FAILED assert(!(need % m_block_size))
Jul 22 16:35:20 alm9 ceph-osd[3799171]: ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x80) [0x562b84558380]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 2: (BitMapAllocator::reserve(unsigned long)+0x2ab) [0x562b8437c5cb]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 3: (BlueFS::reclaim_blocks(unsigned int, unsigned long, std::vector<AllocExtent, mempool::pool_allocator<(mempo ol::pool_index_t)7, AllocExtent> >*)+0x22a) [0x562b8435109a]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 4: (BlueStore::_balance_bluefs_freespace(std::vector<bluestore_ pextent_t, std::allocator<bluestore_pexte nt_t> >*)+0x28e) [0x562b84270dae]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 5: (BlueStore::_kv_sync_thread()+0x164a) [0x562b84273eea]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 6: (BlueStore::KVSyncThread::entry()+0xd) [0x562b842ad9dd]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 7: (()+0x76ba) [0x7f71560c76ba]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: 8: (clone()+0x6d) [0x7f71547953dd]
Jul 22 16:35:20 alm9 ceph-osd[3799171]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
if any one have any idea how to restore those PGs please point me to the right direction
by the way resarting the folder that i deleted in step5 manually make the osd go up again
Thanks
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com