Hello, On Mon, 22 Feb 2016 07:06:21 +0800 Vlad Blando wrote: > Hi Guys, > > After I adjusted PGs on my volume pool, these processes been running for > 2 days now. how do I speed things up? > > --- > [root@controller-node ~]# ceph -s > cluster 99cb7f6f-3441-4a94-bd4b-828183ecc393 > health HEALTH_ERR 330 pgs backfill; 41 pgs backfill_toofull; 2 pgs > backfilling; 3 pgs inconsistent; 1 pgs recovering; 71 pgs recovery_wait; > 405 pgs stuck unclean; recovery 4835319/22635543 objects degraded > (21.362%); 11 near full osd(s); 3 scrub errors > monmap e2: 3 mons at {ceph-node-1= > 10.107.200.1:6789/0,ceph-node-2=10.107.200.2:6789/0,ceph-node-3=10.107.200.3:6789/0}, > election epoch 496, quorum 0,1,2 ceph-node-1,ceph-node-2,ceph-node-3 > osdmap e11356: 27 osds: 27 up, 27 in > pgmap v42430325: 1536 pgs, 2 pools, 27592 GB data, 5783 kobjects > 83068 GB used, 17484 GB / 100553 GB avail > 4835319/22635543 objects degraded (21.362%) > 1 active+recovering+remapped > 36 active+recovery_wait > 1 active+remapped+backfill_toofull > 1128 active+clean > 2 active+clean+inconsistent > 289 active+remapped+wait_backfill > 1 active+remapped+inconsistent+wait_backfill > 1 active+clean+scrubbing+deep > 35 active+recovery_wait+remapped > 40 active+remapped+wait_backfill+backfill_toofull > 2 active+remapped+backfilling > client io 2555 kB/s rd, 73369 B/s wr, 177 op/s > [root@controller-node ~]# > --- > IIRC, you were asking about a nearly full cluster earlier. Pretty much the best, safest things to do before doing anything else is to get out of that state, buy either adding more OSDs or deleting objects. That said, googling "backfill_toofull" gives several helpful links, one of the 2 top ones stating the obvious, as in make space (or increase the full ratios if possible/sensible). The second mentions a bug (in Firefly, though), where restarting OSDs clearing this up as things remained stuck even after the OSD was below the threshold. Either way, those 41 backfill_toofull PGs won't be going anywhere until something is done. If you do a "watch ceph -s", do you see at least some recovery activity at this point? The 3 scrub errors wouldn't fill me with confidence either. Lastly, for the duration of the backfill, I'd turn off scrubs to improve the speed (and performance impact) of recovery. But of course recovery must first be possible (enough space) to boot. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com