Hi, did you already do something ( replacing drives or changing something ) ? You have 11 scrub errors, and ~ 11x inconsistent pg's The inconsistent pg's, for example: pg 4.3a7 is stuck unclean for 629.766502, current state active+recovery_wait+degraded+inconsistent, last acting [10,21] are not on the down osd's 1 and 22 neighter of them. So the should not be missing. But they are. Anyway, i think the next step would be to start a pg_repair command and see where the road goes. -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 Am 03.07.2016 um 23:59 schrieb Matyas Koszik: > > Hi, > > I've continued restarting osds in the meantime, and it got somewhat > better, but still very far from optimal. > > Here're the details you requested: > > http://pastebin.com/Vqgadz24 > > http://pastebin.com/vCL6BRvC > > Matyas > > > On Sun, 3 Jul 2016, Oliver Dzombic wrote: > >> Hi, >> >> please provide: >> >> ceph health detail >> >> ceph osd tree >> >> -- >> Mit freundlichen Gruessen / Best regards >> >> Oliver Dzombic >> IP-Interactive >> >> mailto:info@xxxxxxxxxxxxxxxxx >> >> Anschrift: >> >> IP Interactive UG ( haftungsbeschraenkt ) >> Zum Sonnenberg 1-3 >> 63571 Gelnhausen >> >> HRB 93402 beim Amtsgericht Hanau >> Geschäftsführung: Oliver Dzombic >> >> Steuer Nr.: 35 236 3622 1 >> UST ID: DE274086107 >> >> >> Am 03.07.2016 um 21:36 schrieb Matyas Koszik: >>> >>> Hi, >>> >>> I recently upgraded to jewel (10.2.2) and now I'm confronted with a rather >>> strange behavior: recovey does not progress in the way it should. If I >>> restart the osds on a host, it'll get a bit better (or worse), like this: >>> >>> 50 pgs undersized >>> recovery 43775/7057285 objects degraded (0.620%) >>> recovery 87980/7057285 objects misplaced (1.247%) >>> >>> [restart osds on node1] >>> >>> 44 pgs undersized >>> recovery 39623/7061519 objects degraded (0.561%) >>> recovery 92142/7061519 objects misplaced (1.305%) >>> >>> [restart osds on node1] >>> >>> 43 pgs undersized >>> 1116 requests are blocked > 32 sec >>> recovery 38181/7061529 objects degraded (0.541%) >>> recovery 90617/7061529 objects misplaced (1.283%) >>> >>> ... >>> >>> The current state is this: >>> >>> osdmap e38804: 53 osds: 51 up, 51 in; 66 remapped pgs >>> pgmap v14797137: 4388 pgs, 8 pools, 13626 GB data, 3434 kobjects >>> 27474 GB used, 22856 GB / 50330 GB avail >>> 38172/7061565 objects degraded (0.541%) >>> 90617/7061565 objects misplaced (1.283%) >>> 8/3517300 unfound (0.000%) >>> 4202 active+clean >>> 109 active+recovery_wait+degraded >>> 38 active+undersized+degraded+remapped+wait_backfill >>> 15 active+remapped+wait_backfill >>> 11 active+clean+inconsistent >>> 8 active+recovery_wait+degraded+remapped >>> 3 active+recovering+undersized+degraded+remapped >>> 2 active+recovery_wait+undersized+degraded+remapped >>> >>> >>> All the pools have size=2 min_size=1. >>> >>> (All the unfound blocks are on undersized pgs, and I cannot seem to be >>> able to fix them without having replicas (?). They exist, but are >>> outdated, from an earlier problem.) >>> >>> >>> >>> Matyas >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com