Re: unfound objects blocking cluster, need help!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Do you understand why removing that osd led to unfound objects? Do you have the ceph.log from yesterday?

Cheers, Dan

On 2 Oct 2016 10:18, "Tomasz Kuzemko" <tomasz@xxxxxxxxxxx> wrote:
>
> Forgot to mention Ceph version - 0.94.5.
>
> I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. Then I tried to issue "ceph pg missing_lost delete" and it worked too. After deleting all unfound objects this way cluster finally unblocked. Before that I exported all blocked PGs so hopefully I will be able to recover those 17 objects to a near-latest state.
>
> Hope this helps anyone who might run into the same problem.
>
>
> 2016-10-01 14:27 GMT+02:00 Tomasz Kuzemko <tomasz@xxxxxxxxxxx>:
>>
>> Hi,
>>
>> I have a production cluster on which 1 OSD on a failing disk was slowing the whole cluster down. I removed the OSD (osd.87) like usual in such case but this time it resulted in 17 unfound objects. I no longer have the files from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete" on 10 of those objects.
>>
>> On the remaining objects 7 the command blocks. When I try to do "ceph pg  PGID query" on this PG it also blocks. I suspect this is same reason why mark_unfound blocks.
>>
>> Other client IO to PGs that have unfound objects are also blocked. When trying to query the OSDs which has the PG with unfound objects, "ceph tell" blocks.
>>
>> I tried to mark the PG as complete using ceph-objectstore-tool but it did not help as the PG is in fact complete but for some reason blocks.
>>
>> I tried recreating an empty osd.87 and importing the PG exported from other replica but it did not help.
>>
>> Can someone help me please? This is really important.
>>
>> ceph pg dump:
>> https://gist.github.com/anonymous/c0622ef0d8c0ac84e0778e73bad3c1af/raw/206a06e674ed1c870bbb09bb75fe4285a8e20ba4/pg-dump
>>
>> ceph osd dump:
>> https://gist.github.com/anonymous/64e237d85016af6bd7879ef272ca5639/raw/d6fceb9acd206b75c3ce59c60bcd55a47dea7acd/osd-dump
>>
>> ceph health detail:
>> https://gist.github.com/anonymous/ddb27863ecd416748ebd7ebbc036e438/raw/59ef1582960e011f10cbdbd4ccee509419b95d4e/health-detail
>>
>>
>> --
>> Pozdrawiam,
>> Tomasz Kuzemko
>> tomasz@xxxxxxxxxxx
>
>
>
>
> --
> Pozdrawiam,
> Tomasz Kuzemko
> tomasz@xxxxxxxxxxx
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux