Negative number of objects degraded for extended period of time

Fred Yang <frederic.yang@xxxxxxxxx> · Thu, 13 Nov 2014 10:57:16 -0500

Hi,

The Ceph cluster we are running have few OSDs approaching to 95% 1+ weeks ago so I ran a reweight to balance it out, in the meantime, instructing application to purge data not required. But after large amount of data purge issued from application side(all OSDs' usage dropped below 20%), the cluster fall into this weird state for days, the "objects degraded" remain negative for more than 7 days, I'm seeing some IOs going on on OSDs consistently, but the number(negative) objects degraded does not change much:

2014-11-13 10:43:07.237292 mon.0 [INF] pgmap v5935301: 44816 pgs: 44713 active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27 active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33 active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 1473 GB data, 2985 GB used, 17123 GB / 20109 GB avail; 30172 kB/s wr, 58 op/s; -13582/1468299 objects degraded (-0.925%)
2014-11-13 10:43:08.248232 mon.0 [INF] pgmap v5935302: 44816 pgs: 44713 active+clean, 1 active+backfilling, 20 active+remapped+wait_backfill, 27 active+remapped+wait_backfill+backfill_toofull, 11 active+recovery_wait, 33 active+remapped+backfilling, 11 active+wait_backfill+backfill_toofull; 1473 GB data, 2985 GB used, 17123 GB / 20109 GB avail; 26459 kB/s wr, 51 op/s; -13582/1468303 objects degraded (-0.925%)

Any idea what might be happening here? It seems active+remapped+wait_backfill+backfill_toofull stuck?

     osdmap e43029: 36 osds: 36 up, 36 in
      pgmap v5935658: 44816 pgs, 32 pools, 1488 GB data, 714 kobjects
            3017 GB used, 17092 GB / 20109 GB avail
            -13438/1475773 objects degraded (-0.911%)
               44713 active+clean
                   1 active+backfilling
                  20 active+remapped+wait_backfill
                  27 active+remapped+wait_backfill+backfill_toofull
                  11 active+recovery_wait
                  33 active+remapped+backfilling
                  11 active+wait_backfill+backfill_toofull
  client io 478 B/s rd, 40170 kB/s wr, 80 op/s

The cluster is running on v0.72.2, we are planning to upgrade cluster to firefly, but I would like to get the cluster state clean first before the upgrade.

Thanks,
Fred
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com