Hi, here is what i did: almost 2 times a day, the rebalance process stops (how I mentioned in my last post) so I have to unset nodown/noup to let the osd flap one time and restart data balancing. After 6 days I am in this situation: the rebalance is "almost" complete but the 2 new nodes has *many* more data than the 4 old nodes, and now they are nearly full! Greg, guys, I don't have so many data to fill up the cluster! I have 30Tb stored in the cluster but as you can see, the logs tells 50424 GB data 2013-04-26 15:06:21.251519 mon.0 [INF] pgmap v1645774: 17280 pgs: 1 active, 16359 active+clean, 13 active+remapped+wait_backfill, 64 active+remapped+wait_backfill+backfill_toofull, 4 active+degraded+wait_backfill+backfill_toofull, 627 peering, 148 active+remapped+backfill_toofull, 18 active+degraded+backfill_toofull, 8 active+degraded+remapped+wait_backfill+backfill_toofull, 14 remapped+peering, 23 active+degraded+remapped+backfill_toofull, 1 active+clean+scrubbing+deep; 50424 GB data, 75622 GB used, 35957 GB / 108 TB avail; 383104/19367085 degraded (1.978%) So now I've changed the rules and set "type host" instead "type room" and let's see if the data rebalance correctly This is one of the 4 nodes: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 1942046216 1317021684 625024532 68% /var/lib/ceph/osd/ceph-00 /dev/sdb1 1942046216 1187226308 754819908 62% /var/lib/ceph/osd/ceph-01 /dev/sdc1 1942046216 937248052 1004798164 49% /var/lib/ceph/osd/ceph-02 /dev/sdd1 1942046216 1023586044 918460172 53% /var/lib/ceph/osd/ceph-03 /dev/sde1 1942046216 1137294252 804751964 59% /var/lib/ceph/osd/ceph-04 /dev/sdf1 1942046216 983870424 958175792 51% /var/lib/ceph/osd/ceph-05 /dev/sdg1 1942046216 1213362844 728683372 63% /var/lib/ceph/osd/ceph-06 /dev/sdh1 1942046216 1017003344 925042872 53% /var/lib/ceph/osd/ceph-07 /dev/sdi1 1942046216 1037107532 904938684 54% /var/lib/ceph/osd/ceph-08 /dev/sdj1 1942046216 1204167564 737878652 63% /var/lib/ceph/osd/ceph-09 /dev/sdk1 1942046216 1159791612 782254604 60% /var/lib/ceph/osd/ceph-10 And this is one of the 2 new nodes: Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 1942046216 1789510708 152535508 93% /var/lib/ceph/osd/ceph-44 /dev/sdb1 1942046216 1746028416 196017800 90% /var/lib/ceph/osd/ceph-45 /dev/sdc1 1942046216 1652933164 289113052 86% /var/lib/ceph/osd/ceph-46 /dev/sdd1 1942046216 1708856824 233189392 88% /var/lib/ceph/osd/ceph-47 /dev/sde1 1942046216 1777007984 165038232 92% /var/lib/ceph/osd/ceph-48 /dev/sdf1 1942046216 1655247564 286798652 86% /var/lib/ceph/osd/ceph-49 /dev/sdg1 1942046216 1143921172 798125044 59% /var/lib/ceph/osd/ceph-50 /dev/sdh1 1942046216 1621846420 320199796 84% /var/lib/ceph/osd/ceph-51 /dev/sdi1 1453908364 1258474780 195433584 87% /var/lib/ceph/osd/ceph-52 /dev/sdj1 1453908364 1257657764 196250600 87% /var/lib/ceph/osd/ceph-53 /dev/sdk1 1942046216 1668087216 273959000 86% /var/lib/ceph/osd/ceph-54 My tree is like this (I didn't write the osds) -1 122.5 root default -9 57.5 room p1 -3 44 rack r14 -4 22 host s101 -6 22 host s102 -13 13.5 rack r10 -12 13.5 host s103 -10 65 room p2 -7 22 rack r20 -5 22 host s202 -8 22 rack r22 -2 22 host s201 -14 21 rack r21 -11 21 host s203 And the rules: rule data { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type room step emit } rule metadata { ruleset 1 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type room step emit } rule rbd { ruleset 2 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type room step emit } 2013/4/25 Gregory Farnum <greg@xxxxxxxxxxx>: > On Tue, Apr 23, 2013 at 12:49 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> wrote: >> Hi, >> this morning I have this situation: >> health HEALTH_WARN 1540 pgs backfill; 30 pgs backfill_toofull; 113 >> pgs backfilling; 43 pgs degraded; 38 pgs peering; 5 pgs recovering; >> 484 pgs recovery_wait; 38 pgs stuck inactive; 2180 pgs stuck unclean; >> recovery 2153828/21551430 degraded (9.994%); noup,nodown flag(s) set >> monmap e1: 3 mons at >> {m1=192.168.21.11:6789/0,m2=192.168.21.12:6789/0,m3=192.168.21.13:6789/0}, >> election epoch 50, quorum 0,1,2 m1,m2,m3 >> osdmap e34624: 62 osds: 62 up, 62 in >> pgmap v1496556: 17280 pgs: 15098 active+clean, 1471 >> active+remapped+wait_backfill, 9 active+degraded+wait_backfill, 30 >> active+remapped+wait_backfill+ >> backfill_toofull, 462 >> active+recovery_wait, 18 peering, 109 active+remapped+backfilling, 1 >> active+clean+scrubbing, 30 active+degraded+remapped+wait_backfill, 22 >> active+recovery_wait+remapped, 20 remapped+peering, 4 >> active+degraded+remapped+backfilling, 1 active+clean+scrubbing+deep, 5 >> active+recovering; 50432 GB data, 76489 GB used, 36942 GB / 110 TB >> avail; 2153828/21551430 degraded (9.994%) >> mdsmap e52: 1/1/1 up {0=m1=up:active}, 2 up:standby >> >> No data movement >> The cephfs mounts works but many many directories are inaccessible: >> the clients hangs with just a simple "ls" >> >> ceph -w repeat to log these lines: http://pastebin.com/AN01wgfV >> >> What can I do to get better? > > As before, you need to get your RADOS cluster healthy. That's a fairly > unpleasant task once it manages to get full; you basically need to > carefully order what data moves where, when. Sometimes deleting extra > copies of known-healthy data can help. But it's not the sort of thing > we can do over the mailing list; I suggest you read the OSD operations > docs carefully and then make some careful changes. If you can bring in > temporary extra capacity that would help too. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com