Hi, The disks died, and were removed by: ceph osd out $osd ceph osd lost $osd ceph osd crush remove $osd ceph auth del $osd ceph osd rm $osd When writing my mails it was after the 'lost' or 'crush remove' step, not sure. But even the last step didn't fix the issue. It was like this: http://pastebin.com/UjSjVsJ0 Matyas On Tue, 5 Jul 2016, Sean Redmond wrote: > Hi, > > What happened to the missing 2 OSD's? > > 53 osds: 51 up, 51 in > > Thanks > > On Tue, Jul 5, 2016 at 4:04 PM, Matyas Koszik <koszik@xxxxxx> wrote: > > > > > Should you be interested, the solution to this was > > ceph pg $pg mark_unfound_lost delete > > for all pgs that had unfound objects, now the cluster is back in a healthy > > state. > > > > I think this is very counter-intuitive (why should totally unrelated pgs > > be affected by this?!) but at least the solution was simple. > > > > Matyas > > > > On Mon, 4 Jul 2016, Oliver Dzombic wrote: > > > > > Hi, > > > > > > did you already do something ( replacing drives or changing something ) ? > > > > > > You have 11 scrub errors, and ~ 11x inconsistent pg's > > > > > > The inconsistent pg's, for example: > > > > > > pg 4.3a7 is stuck unclean for 629.766502, current state > > > active+recovery_wait+degraded+inconsistent, last acting [10,21] > > > > > > are not on the down osd's 1 and 22 > > > > > > neighter of them. > > > > > > So the should not be missing. But they are. > > > > > > Anyway, i think the next step would be to start a pg_repair command and > > > see where the road goes. > > > > > > -- > > > Mit freundlichen Gruessen / Best regards > > > > > > Oliver Dzombic > > > IP-Interactive > > > > > > mailto:info@xxxxxxxxxxxxxxxxx > > > > > > Anschrift: > > > > > > IP Interactive UG ( haftungsbeschraenkt ) > > > Zum Sonnenberg 1-3 > > > 63571 Gelnhausen > > > > > > HRB 93402 beim Amtsgericht Hanau > > > Geschäftsfßhrung: Oliver Dzombic > > > > > > Steuer Nr.: 35 236 3622 1 > > > UST ID: DE274086107 > > > > > > > > > Am 03.07.2016 um 23:59 schrieb Matyas Koszik: > > > > > > > > Hi, > > > > > > > > I've continued restarting osds in the meantime, and it got somewhat > > > > better, but still very far from optimal. > > > > > > > > Here're the details you requested: > > > > > > > > http://pastebin.com/Vqgadz24 > > > > > > > > http://pastebin.com/vCL6BRvC > > > > > > > > Matyas > > > > > > > > > > > > On Sun, 3 Jul 2016, Oliver Dzombic wrote: > > > > > > > >> Hi, > > > >> > > > >> please provide: > > > >> > > > >> ceph health detail > > > >> > > > >> ceph osd tree > > > >> > > > >> -- > > > >> Mit freundlichen Gruessen / Best regards > > > >> > > > >> Oliver Dzombic > > > >> IP-Interactive > > > >> > > > >> mailto:info@xxxxxxxxxxxxxxxxx > > > >> > > > >> Anschrift: > > > >> > > > >> IP Interactive UG ( haftungsbeschraenkt ) > > > >> Zum Sonnenberg 1-3 > > > >> 63571 Gelnhausen > > > >> > > > >> HRB 93402 beim Amtsgericht Hanau > > > >> Geschäftsfßhrung: Oliver Dzombic > > > >> > > > >> Steuer Nr.: 35 236 3622 1 > > > >> UST ID: DE274086107 > > > >> > > > >> > > > >> Am 03.07.2016 um 21:36 schrieb Matyas Koszik: > > > >>> > > > >>> Hi, > > > >>> > > > >>> I recently upgraded to jewel (10.2.2) and now I'm confronted with a > > rather > > > >>> strange behavior: recovey does not progress in the way it should. If > > I > > > >>> restart the osds on a host, it'll get a bit better (or worse), like > > this: > > > >>> > > > >>> 50 pgs undersized > > > >>> recovery 43775/7057285 objects degraded (0.620%) > > > >>> recovery 87980/7057285 objects misplaced (1.247%) > > > >>> > > > >>> [restart osds on node1] > > > >>> > > > >>> 44 pgs undersized > > > >>> recovery 39623/7061519 objects degraded (0.561%) > > > >>> recovery 92142/7061519 objects misplaced (1.305%) > > > >>> > > > >>> [restart osds on node1] > > > >>> > > > >>> 43 pgs undersized > > > >>> 1116 requests are blocked > 32 sec > > > >>> recovery 38181/7061529 objects degraded (0.541%) > > > >>> recovery 90617/7061529 objects misplaced (1.283%) > > > >>> > > > >>> ... > > > >>> > > > >>> The current state is this: > > > >>> > > > >>> osdmap e38804: 53 osds: 51 up, 51 in; 66 remapped pgs > > > >>> pgmap v14797137: 4388 pgs, 8 pools, 13626 GB data, 3434 kobjects > > > >>> 27474 GB used, 22856 GB / 50330 GB avail > > > >>> 38172/7061565 objects degraded (0.541%) > > > >>> 90617/7061565 objects misplaced (1.283%) > > > >>> 8/3517300 unfound (0.000%) > > > >>> 4202 active+clean > > > >>> 109 active+recovery_wait+degraded > > > >>> 38 active+undersized+degraded+remapped+wait_backfill > > > >>> 15 active+remapped+wait_backfill > > > >>> 11 active+clean+inconsistent > > > >>> 8 active+recovery_wait+degraded+remapped > > > >>> 3 active+recovering+undersized+degraded+remapped > > > >>> 2 active+recovery_wait+undersized+degraded+remapped > > > >>> > > > >>> > > > >>> All the pools have size=2 min_size=1. > > > >>> > > > >>> (All the unfound blocks are on undersized pgs, and I cannot seem to > > be > > > >>> able to fix them without having replicas (?). They exist, but are > > > >>> outdated, from an earlier problem.) > > > >>> > > > >>> > > > >>> > > > >>> Matyas > > > >>> > > > >>> > > > >>> _______________________________________________ > > > >>> ceph-users mailing list > > > >>> ceph-users@xxxxxxxxxxxxxx > > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >>> > > > >> _______________________________________________ > > > >> ceph-users mailing list > > > >> ceph-users@xxxxxxxxxxxxxx > > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > >> > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com