On 2019-11-22 21:45, Paul Emmerich wrote:
On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy
<zoltan@xxxxxxxxxxxxxxxxxx> wrote:
The 2^31-1 in there seems to indicate an overflow somewhere - the way
we
were able to figure out where exactly
is to query the PG and compare the "up" and "acting" sets - only _one_
of them had the 2^31-1 number in place
of the correct OSD number. We restarted that and the PG started doing
its job and recovered.
no, this value is intentional (and shows up as 'None' on higher level
tools), it means no mapping could be found
thanks, didn't know.
check your crush map and crush rule
if it were indeed a crush rule or map issue, it would not have been
resolved by just restarting the primary OSD of the PG, would it?
the crush rule was created by running
ceph osd erasure-code-profile set ec42 k=4 m=2 crush-device-class=nvme
where the default failure domain is host; as I said we have 12 hosts,
so I don't see anything wrong here - it's all default...
this is why I suspect a bug, just don't have any evidence other than
that it happened to us :)
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx