So just a little update... after replacing the original failed drive things seem to be progressing a little better however I noticed something else odd. Looking at a 'rados df' it looks like the system thinks that the data pool has 32 TB of data, this is only a 18TB raw system.
pool name category KB objects clones degraded unfound rd rd KB wr wr KB
data - 32811540110 894927 0 240445 0 1 0 2720415 4223435021
media_video - 1 1 0 0 0 2 1 2611361 1177389479
metadata - 210246 18482 0 4592 1 6970 561296 1253955 19500149
rbd - 330731965 82018 0 19584 0 26295 1612689 54606042 2127030019
total used 10915771968 995428
total avail 6657285104
total space 17573057072
Any recommendations on how I can sort out why it thinks it has way more data in that pool than it actually does?
Thanks in advance.
Berant
On Mon, May 6, 2013 at 4:43 PM, Berant Lemmenes <berant@xxxxxxxxxxxx> wrote:
TL;DRbobtail Ceph cluster unable to finish rebalance after drive failure, usage increasing even with no clients connected.....I've been running a test bobtail cluster for a couple of months and it's been working great. Last week I had a drive die and rebalance; durring that time another OSD crashed. All was still well, however as the second osd had just crashed I restarted made sure that it re-entered properly and rebalancing continued and then I went to bed.Waking up in the morning I found 2 OSDs were 100% full and two more were almost full. To get out of the situation I decreased the replication size from 3 to 2, and then also carefully (I believe carefully enough) remove some PGs in order to start things up again.I got things going again and things appeared to be rebalancing correctly; however it got to the point were it stopped at 1420 PGs active+clean and the rest were stuck backfilling.Looking at the PG dump, all of the PGs that were having issues were on osd.1. So I stopped it, verified things were continuing to rebalance after it was down/out and then formated osd.1's disk and put it back in.Since then I've not been able to get the cluster back to HEALTHY, due to a combination of OSDs dying while recovering (not due to disk failure, just crashes) as well as the used space in the cluster increasing abnormally.Right now I have all the clients disconnected and just the cluster rebalancing and the usage is increasing to the point where I have 12TB used when I have only < 3TB in cephfs and 2TB in a single RBD image (replication size 2). I've since shutdown the cluster so I don't fill it up.My crushmap is the default, here is the usual suspects. I'm happy to provide additional information.pg dump: http://pastebin.com/LUyu6Z09ceph osd tree:osd.8 is the failed drive (I will be replacing tonight), weight on osd.1 and osd.6 was done via reweight-by-utilization
# id weight type name up/down reweight-1 19.5 root default-3 19.5 rack unknownrack-2 19.5 host ceph-test0 1.5 osd.0 up 11 1.5 osd.1 up 0.60272 1.5 osd.2 up 13 1.5 osd.3 up 14 1.5 osd.4 up 15 2 osd.5 up 16 2 osd.6 up 0.66767 2 osd.7 up 18 2 osd.8 down 09 2 osd.9 up 110 2 osd.10 up 1
ceph -s:health HEALTH_WARN 24 pgs backfill; 85 pgs backfill_toofull; 29 pgs backfilling; 40 pgs degraded; 1 pgs recovery_wait; 121 pgs stuck unclean; recovery 109306/2091318 degraded (5.227%); recovering 3 o/s, 43344KB/s; 2 near full osd(s); noout flag(s) setmonmap e2: 1 mons at {a=10.200.200.21:6789/0}, election epoch 1, quorum 0 aosdmap e16251: 11 osds: 10 up, 10 inpgmap v3145187: 1536 pgs: 1414 active+clean, 6 active+remapped+wait_backfill, 10 active+remapped+wait_backfill+backfill_toofull, 4 active+degraded+wait_backfill+backfill_toofull, 22 active+remapped+backfilling, 42 active+remapped+backfill_toofull, 7 active+degraded+backfilling, 17 active+degraded+backfill_toofull, 1 active+recovery_wait+remapped, 4 active+degraded+remapped+wait_backfill+backfill_toofull, 8 active+degraded+remapped+backfill_toofull, 1 active+clean+scrubbing+deep; 31607 GB data, 12251 GB used, 4042 GB / 16293 GB avail; 109306/2091318 degraded (5.227%); recovering 3 o/s, 43344KB/smdsmap e3363: 1/1/1 up {0=a=up:active}rep size:pool 0 'data' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 897 owner 0 crash_replay_interval 45pool 1 'metadata' rep size 2 crush_ruleset 1 object_hash rjenkins pg_num 384 pgp_num 384 last_change 13364 owner 0pool 2 'rbd' rep size 2 crush_ruleset 2 object_hash rjenkins pg_num 384 pgp_num 384 last_change 13208 owner 0pool 4 'media_video' rep size 2 crush_ruleset 0 object_hash rjenkins pg_num 384 pgp_num 384 last_change 890 owner 0ceph.conf:[global]auth cluster required = cephxauth service required = cephxauth client required = cephxosd pool default size = 3osd pool default min size = 1osd pool default pg num = 366osd pool default pgp num = 366[osd]osd journal size = 1000journal_aio = true#osd recovery max active = 10osd mkfs type = xfsosd mkfs options xfs = -f -i size=2048osd mount options xfs = inode64,noatime[mon.a]host = ceph01mon addr = 10.200.200.21:6789[osd.0]# 1.5 TB SATAhost = ceph01devs = /dev/sdcweight = 1.5
[osd.1]# 1.5 TB SATAhost = ceph01devs = /dev/sddweight = 1.5[osd.2]# 1.5 TB SATAhost = ceph01devs = /dev/sdgweight = 1.5[osd.3]# 1.5 TB SATAhost = ceph01devs = /dev/sdjweight = 1.5
[osd.4]# 1.5 TB SATAhost = ceph01devs = /dev/sdkweight = 1.5[osd.5]# 2 TB SAShost = ceph01devs = /dev/sdfweight = 2[osd.6]# 2 TB SAShost = ceph01devs = /dev/sdhweight = 2
[osd.7]# 2 TB SAShost = ceph01devs = /dev/sdaweight = 2[osd.8]# 2 TB SAShost = ceph01devs = /dev/sdbweight = 2[osd.9]# 2 TB SAShost = ceph01devs = /dev/sdiweight = 2
[osd.10]# 2 TB SAShost = ceph01devs = /dev/sdeweight = 2[mds.a]host = ceph01
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com