Hi Yehuda, we have: root@dsanb1-coy:/mnt/ceph# ceph osd dump | grep ^pool pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 1472 pgp_num 1472 last_change 1 owner 0 crash_replay_interval 45 pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 1472 pgp_num 1472 last_change 1 owner 0 pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 1472 pgp_num 1472 last_change 1 owner 0 pool 3 'backup' rep size 1 crush_ruleset 3 object_hash rjenkins pg_num 1472 pgp_num 1472 last_change 1 owner 0 -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Yehuda Sadeh Sent: Monday, 6 August 2012 11:16 AM To: Paul Pettigrew Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: Re: Crush not deliverying data uniformly -> HEALTH_ERR full osd On Sun, Aug 5, 2012 at 5:16 PM, Paul Pettigrew <Paul.Pettigrew@xxxxxxxxxxx> wrote: > > Hi Ceph community > > We are at the stage of performance capacity testing, where significant > amounts of backup data is being written to Ceph. > > The issue we have, is that the underlying HDD's are not being > populated > (roughly) uniformly, and our Ceph system hits a brick wall after a > couple of days our 30TB storage system is no longer able to operate > after having only stored ~7TB. > > Basically, despite HDD's (1:1 ratio between OSD and HDD) all being the > same storage size and weighting in the Crushmap, we have disks either: > a) using 1% space; > b) using 48%; or > c) using 96% > Too precise a split to be an accident. See below for more detail > (osd11-22 not expected to get data, per our crushmap): > > > ceph pg dump > <snip> > pool 0 2442 0 0 0 10240000000 7302520 7302520 > pool 1 57 0 0 0 127824767 5603518 5603518 > pool 2 0 0 0 0 0 0 0 > pool 3 1808757 0 0 0 7584377697985 1104048 1104048 > sum 1811256 0 0 0 7594745522752 14010086 > 14010086 > osdstat kbused kbavail kb hb in hb out > 0 930606904 1021178408 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 1 1874428 1949525164 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 2 928811428 1022963676 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 3 929733676 1022051996 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 4 1719124 1949678844 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 5 1853452 1949545892 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 6 930979476 1020807132 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 7 1808968 1949590496 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 8 934035924 1017759100 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 9 1855955384 94927432 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 10 933572004 1018232340 1953514584 > [11,12,13,14,15,16,17,18,19,20,21,22] [] > 11 2057096 953060760 957230808 > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > 12 2053512 953064656 957230808 > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > 13 2148732 972501316 976762584 > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > 14 2064640 972585104 976762584 > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21,22] [] > 15 1945388 972703468 976762584 > [0,1,2,3,4,5,6,7,8,9,10,17,18,19,20,21] [] > 16 2051708 972599412 976762584 > [0,1,2,3,4,6,7,8,9,10,17,18,19,20,21] [] > 17 2137632 952980216 957230808 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > 18 2000124 953117508 957230808 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > 19 2095124 972554492 976762584 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > 20 1986800 972662640 976762584 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > 21 2035204 972615332 976762584 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > 22 1961412 972687788 976762584 > [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] [] > sum 7475488140 25609393172 33131684328 > > 2012-08-06 10:03:58.964716 7f06783bb700 0 -- 10.32.0.10:0/15147 > send_keepalive con 0x223f690, no pipe. > > > root@dsanb1-coy:~# df -h > Filesystem Size Used Avail Use% Mounted on > /dev/md0 462G 12G 446G 3% / > udev 12G 4.0K 12G 1% /dev > tmpfs 4.8G 448K 4.8G 1% /run > none 5.0M 0 5.0M 0% /run/lock > none 12G 0 12G 0% /run/shm > /dev/sdc 1.9T 888G 974G 48% > /ceph-data/osd.0 > /dev/sdd 1.9T 1.8G 1.9T 1% > /ceph-data/osd.1 > /dev/sdp 1.9T 891G 972G 48% > /ceph-data/osd.10 > /dev/sde 1.9T 886G 976G 48% > /ceph-data/osd.2 > /dev/sdf 1.9T 887G 975G 48% > /ceph-data/osd.3 > /dev/sdg 1.9T 1.7G 1.9T 1% > /ceph-data/osd.4 > /dev/sdh 1.9T 1.8G 1.9T 1% > /ceph-data/osd.5 > /dev/sdi 1.9T 888G 974G 48% > /ceph-data/osd.6 > /dev/sdm 1.9T 1.8G 1.9T 1% > /ceph-data/osd.7 > /dev/sdn 1.9T 891G 971G 48% > /ceph-data/osd.8 > /dev/sdo 1.9T 1.8T 91G 96% > /ceph-data/osd.9 > 10.32.0.10,10.32.0.25,10.32.0.11:6789:/ 31T 7.1T 24T 23% /mnt/ceph > > > We are writing via fstab based cephfs mounts, and the above is going > to pool3, which is a "backup" pool where we are testing replication > level of 1x only. This should not have any effect though? Below will > illustrate the layout we are using (above data writing issue is only > going to the first node per our testing design): > > root@dsanb1-coy:~# ceph osd tree > dumped osdmap tree epoch 136 > # id weight type name up/down reweight > -7 23 zone bak > -6 23 rack 1nrack > -2 11 host dsanb1-coy > 0 2 osd.0 up 1 > 1 2 osd.1 up 1 > 10 2 osd.10 up 1 > 2 2 osd.2 up 1 > 3 2 osd.3 up 1 > 4 2 osd.4 up 1 > 5 2 osd.5 up 1 > 6 2 osd.6 up 1 > 7 2 osd.7 up 1 > 8 2 osd.8 up 1 > 9 2 osd.9 up 1 > -1 23 zone default > -3 23 rack 2nrack > -2 11 host dsanb1-coy > 0 2 osd.0 up 1 > 1 2 osd.1 up 1 > 10 2 osd.10 up 1 > 2 2 osd.2 up 1 > 3 2 osd.3 up 1 > 4 2 osd.4 up 1 > 5 2 osd.5 up 1 > 6 2 osd.6 up 1 > 7 2 osd.7 up 1 > 8 2 osd.8 up 1 > 9 2 osd.9 up 1 > -4 6 host dsanb2-coy > 11 1 osd.11 up 1 > 12 1 osd.12 up 1 > 13 1 osd.13 up 1 > 14 1 osd.14 up 1 > 15 1 osd.15 up 1 > 16 1 osd.16 up 1 > -5 6 host dsanb3-coy > 17 1 osd.17 up 1 > 18 1 osd.18 up 1 > 19 1 osd.19 up 1 > 20 1 osd.20 up 1 > 21 1 osd.21 up 1 > 22 1 osd.22 up 1 > > > Has anybody got any suggestions? > How many pgs per pool do you have? Specifically: $ ceph osd dump | grep ^pool Thanks, Yehuda -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html