Re: EC PGs stuck activating, 2^31-1 as OSD ID, automatic recovery not kicking in

huang jun <hjwsm1989@xxxxxxxxx> · Mon, 25 Nov 2019 09:25:54 +0800

How many PGs in the pools?
This maybe the CRUSH can not get the proper OSD.
you can check param  tunable choose_total_tries in crush tunables,
try increase it like this:

ceph osd getcrushmap -o crush
crushtool -d crush -o crush.txt
sed -i 's/tunable choose_total_tries 50/tunable choose_total_tries
150/g' crush.txt
crushtool -c crush.txt -o crush.new
ceph osd setcrushmap -i crush.new

Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> 于2019年11月23日周六 上午5:26写道：
>
> On 2019-11-22 21:45, Paul Emmerich wrote:
> > On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
> > <zoltan@xxxxxxxxxxxxxxxxxx> wrote:
> >
> >> The 2^31-1 in there seems to indicate an overflow somewhere - the way
> >> we
> >> were able to figure out where exactly
> >> is to query the PG and compare the "up" and "acting" sets - only _one_
> >> of them had the 2^31-1 number in place
> >> of the correct OSD number. We restarted that and the PG started doing
> >> its job and recovered.
> >
> > no, this value is intentional (and shows up as 'None' on higher level
> > tools), it means no mapping could be found
>
> thanks, didn't know.
>
> > check your crush map and crush rule
>
> if it were indeed a crush rule or map issue, it would not have been
> resolved by just restarting the primary OSD of the PG, would it?
>
> the crush rule was created by running
>
> ceph osd erasure-code-profile set ec42 k=4 m=2 crush-device-class=nvme
>
> where the default failure domain is host; as I said we have 12 hosts,
> so I don't see anything wrong here - it's all default...
>
> this is why I suspect a bug, just don't have any evidence other than
> that it happened to us :)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx