On Fri, Nov 22, 2019 at 9:33 PM Zoltan Arnold Nagy <zoltan@xxxxxxxxxxxxxxxxxx> wrote: > The 2^31-1 in there seems to indicate an overflow somewhere - the way we > were able to figure out where exactly > is to query the PG and compare the "up" and "acting" sets - only _one_ > of them had the 2^31-1 number in place > of the correct OSD number. We restarted that and the PG started doing > its job and recovered. no, this value is intentional (and shows up as 'None' on higher level tools), it means no mapping could be found; check your crush map and crush rule Paul > > The issue seems to be going back to 2015: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001661.html > however no solution... > > I'm more concerned about the cluster not being able to recover (it's a > 4+2 EC pool across 12 hosts - plenty of room > to heal) than about the weird print-out. > > The VMs who wanted to access data in any of the affected PGs of course > died. > > Are we missing some settings to let the cluster self-heal even for EC > pools? First EC pool in production :) > > Cheers, > Zoltan > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx