Re: Stuck PGs blocked_by non-existent OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sure thing, n.b. I increased pg count to see if it would help. Alas not. :)

Thanks again!

health_detail
https://gist.github.com/199bab6d3a9fe30fbcae

osd_dump
https://gist.github.com/499178c542fa08cc33bb

osd_tree
https://gist.github.com/02b62b2501cbd684f9b2

Random selected queries:
queries/0.19.query
https://gist.github.com/f45fea7c85d6e665edf8
queries/1.a1.query
https://gist.github.com/dd68fbd5e862f94eb3be
queries/7.100.query
https://gist.github.com/d4fd1fb030c6f2b5e678
queries/7.467.query
https://gist.github.com/05dbcdc9ee089bd52d0c

On Tue, Mar 10, 2015 at 2:49 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
> Yeah, get a ceph pg query on one of the stuck ones.
> -Sam
>
> On Tue, 2015-03-10 at 14:41 +0000, joel.merrick@xxxxxxxxx wrote:
>> Stuck unclean and stuck inactive. I can fire up a full query and
>> health dump somewhere useful if you want (full pg query info on ones
>> listed in health detail, tree, osd dump etc). There were blocked_by
>> operations that no longer exist after doing the OSD addition.
>>
>> Side note, spent some time yesterday writing some bash to do this
>> programatically (might be useful to others, will throw on github)
>>
>> On Tue, Mar 10, 2015 at 1:41 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> > What do you mean by "unblocked" but still "stuck"?
>> > -Sam
>> >
>> > On Mon, 2015-03-09 at 22:54 +0000, joel.merrick@xxxxxxxxx wrote:
>> >> On Mon, Mar 9, 2015 at 2:28 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> >> > You'll probably have to recreate osds with the same ids (empty ones),
>> >> > let them boot, stop them, and mark them lost.  There is a feature in the
>> >> > tracker to improve this behavior: http://tracker.ceph.com/issues/10976
>> >> > -Sam
>> >>
>> >> Thanks Sam, I've readded the OSDs, they became unblocked but there are
>> >> still the same number of pgs stuck. I looked at them in some more
>> >> detail and it seems they all have num_bytes='0'. Tried a repair too,
>> >> for good measure. Still nothing I'm afraid.
>> >>
>> >> Does this mean some underlying catastrophe has happened and they are
>> >> never going to recover? Following on, would that cause data loss.
>> >> There are no missing objects and I'm hoping there's appropriate
>> >> checksumming / replicas to balance that out, but now I'm not so sure.
>> >>
>> >> Thanks again,
>> >> Joel
>> >
>> >
>>
>>
>>
>
>



-- 
$ echo "kpfmAdpoofdufevq/dp/vl" | perl -pe 's/(.)/chr(ord($1)-1)/ge'
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux