Hello Paul, Could you post your CRUSH map, crushtool -d <CRUSH_MAP> caleb On Mon, Aug 6, 2012 at 1:01 PM, Yehuda Sadeh <yehuda@xxxxxxxxxxx> wrote: > > ---------- Forwarded message ---------- > From: Paul Pettigrew <Paul.Pettigrew@xxxxxxxxxxx> > Date: Sun, Aug 5, 2012 at 8:08 PM > Subject: RE: Crush not deliverying data uniformly -> HEALTH_ERR full osd > To: Yehuda Sadeh <yehuda@xxxxxxxxxxx> > Cc: "ceph-devel@xxxxxxxxxxxxxxx" <ceph-devel@xxxxxxxxxxxxxxx> > > > Hi Yehuda, we have: > > root@dsanb1-coy:/mnt/ceph# ceph osd dump | grep ^pool > pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num > 1472 pgp_num 1472 last_change 1 owner 0 crash_replay_interval 45 > pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins > pg_num 1472 pgp_num 1472 last_change 1 owner 0 > pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num > 1472 pgp_num 1472 last_change 1 owner 0 > pool 3 'backup' rep size 1 crush_ruleset 3 object_hash rjenkins pg_num > 1472 pgp_num 1472 last_change 1 owner 0 > > > -----Original Message----- > From: ceph-devel-owner@xxxxxxxxxxxxxxx > [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Yehuda Sadeh > Sent: Monday, 6 August 2012 11:16 AM > To: Paul Pettigrew > Cc: ceph-devel@xxxxxxxxxxxxxxx > Subject: Re: Crush not deliverying data uniformly -> HEALTH_ERR full osd > > On Sun, Aug 5, 2012 at 5:16 PM, Paul Pettigrew > <Paul.Pettigrew@xxxxxxxxxxx> wrote: > > > > Hi Ceph community > > > > We are at the stage of performance capacity testing, where significant > > amounts of backup data is being written to Ceph. > > > > The issue we have, is that the underlying HDD's are not being > > populated > > (roughly) uniformly, and our Ceph system hits a brick wall after a > > couple of days our 30TB storage system is no longer able to operate > > after having only stored ~7TB. > > > > Basically, despite HDD's (1:1 ratio between OSD and HDD) all being the > > same storage size and weighting in the Crushmap, we have disks either: > > a) using 1% space; > > b) using 48%; or > > c) using 96% > > Too precise a split to be an accident. See below for more detail > > (osd11-22 not expected to get data, per our crushmap): > > > > > > ceph pg dump > > <snip> > > pool 0 2442 0 0 0 10240000000 7302520 7302520 > > pool 1 57 0 0 0 127824767 5603518 5603518 > > pool 2 0 0 0 0 0 0 0 > > pool 3 1808757 0 0 0 7584377697985 1104048 1104048 > > sum 1811256 0 0 0 7594745522752 14010086 > > 14010086 > > osdstat kbused kbavail kb hb in hb out > > 0 930606904 1021178408 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 1 1874428 1949525164 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 2 928811428 1022963676 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 3 929733676 1022051996 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 4 1719124 1949678844 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 5 1853452 1949545892 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 6 930979476 1020807132 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 7 1808968 1949590496 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 8 934035924 1017759100 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 9 1855955384 94927432 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 10 933572004 1018232340 1953514584 > > [11,12,13,14,15,16,17,18,19,20,21,22] [] > > 11 2057096 953060760 957230808 > > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > > 12 2053512 953064656 957230808 > > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > > 13 2148732 972501316 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > > 14 2064640 972585104 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > > 15 1945388 972703468 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21] [] > > 16 2051708 972599412 976762584 > > [0,1,2,3,4,6,7,8,9,10,17,18,19,20,21] [] > > 17 2137632 952980216 957230808 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > 18 2000124 953117508 957230808 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > 19 2095124 972554492 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > 20 1986800 972662640 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > 21 2035204 972615332 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > 22 1961412 972687788 976762584 > > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > > sum 7475488140 25609393172 33131684328 > > > > 2012-08-06 10:03:58.964716 7f06783bb700 0 -- 10.32.0.10:0/15147 > > send_keepalive con 0x223f690, no pipe. > > > > > > root@dsanb1-coy:~# df -h > > Filesystem Size Used Avail Use% Mounted on > > /dev/md0 462G 12G 446G 3% / > > udev 12G 4.0K 12G 1% /dev > > tmpfs 4.8G 448K 4.8G 1% /run > > none 5.0M 0 5.0M 0% /run/lock > > none 12G 0 12G 0% /run/shm > > /dev/sdc 1.9T 888G 974G 48% > > /ceph-data/osd.0 > > /dev/sdd 1.9T 1.8G 1.9T 1% > > /ceph-data/osd.1 > > /dev/sdp 1.9T 891G 972G 48% > > /ceph-data/osd.10 > > /dev/sde 1.9T 886G 976G 48% > > /ceph-data/osd.2 > > /dev/sdf 1.9T 887G 975G 48% > > /ceph-data/osd.3 > > /dev/sdg 1.9T 1.7G 1.9T 1% > > /ceph-data/osd.4 > > /dev/sdh 1.9T 1.8G 1.9T 1% > > /ceph-data/osd.5 > > /dev/sdi 1.9T 888G 974G 48% > > /ceph-data/osd.6 > > /dev/sdm 1.9T 1.8G 1.9T 1% > > /ceph-data/osd.7 > > /dev/sdn 1.9T 891G 971G 48% > > /ceph-data/osd.8 > > /dev/sdo 1.9T 1.8T 91G 96% > > /ceph-data/osd.9 > > 10.32.0.10,10.32.0.25,10.32.0.11:6789:/ 31T 7.1T 24T 23% /mnt/ceph > > > > > > We are writing via fstab based cephfs mounts, and the above is going > > to pool3, which is a "backup" pool where we are testing replication > > level of 1x only. This should not have any effect though? Below will > > illustrate the layout we are using (above data writing issue is only > > going to the first node per our testing design): > > > > root@dsanb1-coy:~# ceph osd tree > > dumped osdmap tree epoch 136 > > # id weight type name up/down reweight > > -7 23 zone bak > > -6 23 rack 1nrack > > -2 11 host dsanb1-coy > > 0 2 osd.0 up 1 > > 1 2 osd.1 up 1 > > 10 2 osd.10 up 1 > > 2 2 osd.2 up 1 > > 3 2 osd.3 up 1 > > 4 2 osd.4 up 1 > > 5 2 osd.5 up 1 > > 6 2 osd.6 up 1 > > 7 2 osd.7 up 1 > > 8 2 osd.8 up 1 > > 9 2 osd.9 up 1 > > -1 23 zone default > > -3 23 rack 2nrack > > -2 11 host dsanb1-coy > > 0 2 osd.0 up 1 > > 1 2 osd.1 up 1 > > 10 2 osd.10 up 1 > > 2 2 osd.2 up 1 > > 3 2 osd.3 up 1 > > 4 2 osd.4 up 1 > > 5 2 osd.5 up 1 > > 6 2 osd.6 up 1 > > 7 2 osd.7 up 1 > > 8 2 osd.8 up 1 > > 9 2 osd.9 up 1 > > -4 6 host dsanb2-coy > > 11 1 osd.11 up 1 > > 12 1 osd.12 up 1 > > 13 1 osd.13 up 1 > > 14 1 osd.14 up 1 > > 15 1 osd.15 up 1 > > 16 1 osd.16 up 1 > > -5 6 host dsanb3-coy > > 17 1 osd.17 up 1 > > 18 1 osd.18 up 1 > > 19 1 osd.19 up 1 > > 20 1 osd.20 up 1 > > 21 1 osd.21 up 1 > > 22 1 osd.22 up 1 > > > > > > Has anybody got any suggestions? > > > > How many pgs per pool do you have? Specifically: > $ ceph osd dump | grep ^pool > > Thanks, > Yehuda > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo > info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html