Re: Cephfs unaccessible

Marco Aroldi <marco.aroldi@xxxxxxxxx> · Fri, 26 Apr 2013 15:35:55 +0200

Hi,
here is what i did:
almost 2 times a day, the rebalance process stops (how I mentioned in
my last post) so I have to unset nodown/noup to let the osd flap one
time and restart data balancing.
After 6 days I am in this situation: the rebalance is "almost"
complete but the 2 new nodes has *many* more data than the 4 old
nodes, and now they are nearly full!

Greg, guys, I don't have so many data to fill up the cluster! I have
30Tb stored in the cluster but as you can see, the logs tells 50424 GB
data

2013-04-26 15:06:21.251519 mon.0 [INF] pgmap v1645774: 17280 pgs: 1
active, 16359 active+clean, 13 active+remapped+wait_backfill, 64
active+remapped+wait_backfill+backfill_toofull, 4
active+degraded+wait_backfill+backfill_toofull, 627 peering, 148
active+remapped+backfill_toofull, 18 active+degraded+backfill_toofull,
8 active+degraded+remapped+wait_backfill+backfill_toofull, 14
remapped+peering, 23 active+degraded+remapped+backfill_toofull, 1
active+clean+scrubbing+deep; 50424 GB data, 75622 GB used, 35957 GB /
108 TB avail; 383104/19367085 degraded (1.978%)

So now I've changed the rules and set "type host" instead "type room"
and let's see if the data rebalance correctly

This is one of the 4 nodes:
Filesystem      1K-blocks       Used  Available Use% Mounted on
/dev/sda1      1942046216 1317021684  625024532  68% /var/lib/ceph/osd/ceph-00
/dev/sdb1      1942046216 1187226308  754819908  62% /var/lib/ceph/osd/ceph-01
/dev/sdc1      1942046216  937248052 1004798164  49% /var/lib/ceph/osd/ceph-02
/dev/sdd1      1942046216 1023586044  918460172  53% /var/lib/ceph/osd/ceph-03
/dev/sde1      1942046216 1137294252  804751964  59% /var/lib/ceph/osd/ceph-04
/dev/sdf1      1942046216  983870424  958175792  51% /var/lib/ceph/osd/ceph-05
/dev/sdg1      1942046216 1213362844  728683372  63% /var/lib/ceph/osd/ceph-06
/dev/sdh1      1942046216 1017003344  925042872  53% /var/lib/ceph/osd/ceph-07
/dev/sdi1      1942046216 1037107532  904938684  54% /var/lib/ceph/osd/ceph-08
/dev/sdj1      1942046216 1204167564  737878652  63% /var/lib/ceph/osd/ceph-09
/dev/sdk1      1942046216 1159791612  782254604  60% /var/lib/ceph/osd/ceph-10

And this is one of the 2 new nodes:
Filesystem      1K-blocks       Used  Available Use% Mounted on
/dev/sda1      1942046216 1789510708 152535508  93% /var/lib/ceph/osd/ceph-44
/dev/sdb1      1942046216 1746028416 196017800  90% /var/lib/ceph/osd/ceph-45
/dev/sdc1      1942046216 1652933164 289113052  86% /var/lib/ceph/osd/ceph-46
/dev/sdd1      1942046216 1708856824 233189392  88% /var/lib/ceph/osd/ceph-47
/dev/sde1      1942046216 1777007984 165038232  92% /var/lib/ceph/osd/ceph-48
/dev/sdf1      1942046216 1655247564 286798652  86% /var/lib/ceph/osd/ceph-49
/dev/sdg1      1942046216 1143921172 798125044  59% /var/lib/ceph/osd/ceph-50
/dev/sdh1      1942046216 1621846420 320199796  84% /var/lib/ceph/osd/ceph-51
/dev/sdi1      1453908364 1258474780 195433584  87% /var/lib/ceph/osd/ceph-52
/dev/sdj1      1453908364 1257657764 196250600  87% /var/lib/ceph/osd/ceph-53
/dev/sdk1      1942046216 1668087216 273959000  86% /var/lib/ceph/osd/ceph-54

My tree is like this (I didn't write the osds)

-1    122.5    root default
-9    57.5        room p1
-3    44            rack r14
-4    22                host s101
-6    22                host s102

-13    13.5            rack r10
-12    13.5                host s103

-10    65        room p2
-7    22            rack r20
-5    22                host s202

-8    22            rack r22
-2    22                host s201

-14    21            rack r21
-11    21                host s203

And the rules:
rule data {
        ruleset 0
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type room
        step emit
}
rule metadata {
        ruleset 1
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type room
        step emit
}
rule rbd {
        ruleset 2
        type replicated
        min_size 1
        max_size 10
        step take default
        step chooseleaf firstn 0 type room
        step emit
}

2013/4/25 Gregory Farnum <greg@xxxxxxxxxxx>:
> On Tue, Apr 23, 2013 at 12:49 AM, Marco Aroldi <marco.aroldi@xxxxxxxxx> wrote:
>> Hi,
>> this morning I have this situation:
>>    health HEALTH_WARN 1540 pgs backfill; 30 pgs backfill_toofull; 113
>> pgs backfilling; 43 pgs degraded; 38 pgs peering; 5 pgs recovering;
>> 484 pgs recovery_wait; 38 pgs stuck inactive; 2180 pgs stuck unclean;
>> recovery 2153828/21551430 degraded (9.994%); noup,nodown flag(s) set
>>    monmap e1: 3 mons at
>> {m1=192.168.21.11:6789/0,m2=192.168.21.12:6789/0,m3=192.168.21.13:6789/0},
>> election epoch 50, quorum 0,1,2 m1,m2,m3
>>    osdmap e34624: 62 osds: 62 up, 62 in
>>    pgmap v1496556: 17280 pgs: 15098 active+clean, 1471
>> active+remapped+wait_backfill, 9 active+degraded+wait_backfill, 30
>> active+remapped+wait_backfill+
>> backfill_toofull, 462
>> active+recovery_wait, 18 peering, 109 active+remapped+backfilling, 1
>> active+clean+scrubbing, 30 active+degraded+remapped+wait_backfill, 22
>> active+recovery_wait+remapped, 20 remapped+peering, 4
>> active+degraded+remapped+backfilling, 1 active+clean+scrubbing+deep, 5
>> active+recovering; 50432 GB data, 76489 GB used, 36942 GB / 110 TB
>> avail; 2153828/21551430 degraded (9.994%)
>>    mdsmap e52: 1/1/1 up {0=m1=up:active}, 2 up:standby
>>
>> No data movement
>> The cephfs mounts works but many many directories are inaccessible:
>> the clients hangs with just a simple "ls"
>>
>> ceph -w repeat to log these lines: http://pastebin.com/AN01wgfV
>>
>> What can I do to get better?
>
> As before, you need to get your RADOS cluster healthy. That's a fairly
> unpleasant task once it manages to get full; you basically need to
> carefully order what data moves where, when. Sometimes deleting extra
> copies of known-healthy data can help. But it's not the sort of thing
> we can do over the mailing list; I suggest you read the OSD operations
> docs carefully and then make some careful changes. If you can bring in
> temporary extra capacity that would help too.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com