Hi Christian, thanks for your input. I don't think the PG count is my issue. I've actually have too many PGs for the size of my cluster consisting of just 18 osds spread amongst 2 osd servers and 3 mons: root@arh-ibstorage1-ib:~# ceph -s health HEALTH_WARN 1 near full osd(s) too many PGs per OSD (604 > max 300) osdmap e79493: 18 osds: 18 up, 18 in pgmap v79839575: 5436 pgs, 18 pools, 15509 GB data, 6019 kobjects 5436 active+clean I will take a look at the "dealing with the full osd / help reweight" thread, thanks for pointing it out. Cheers Andrei ----- Original Message ----- > From: "Christian Balzer" <chibi@xxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxxxxxxxxx> > Cc: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx> > Sent: Wednesday, 6 April, 2016 04:36:30 > Subject: Re: rebalance near full osd > Hello, > > On Wed, 6 Apr 2016 04:18:40 +0100 (BST) Andrei Mikhailovsky wrote: > >> Hi >> >> I've just had a warning ( from ceph -s) that one of the osds is near >> full. Having investigated the warning, i've located that osd.6 is 86% >> full. The data distribution is nowhere near to being equal on my osds as >> you can see from the df command output below: >> > Firstly, read the very recent thread: > "dealing with the full osd / help reweight" > from this ML. > > You really want monitoring SW to keep track of disk utilization if you're > not doing it manually. > >> /dev/sdj1 2.8T 2.4T 413G 86% /var/lib/ceph/osd/ceph-6 >> /dev/sdb1 2.8T 2.1T 625G 78% /var/lib/ceph/osd/ceph-0 >> /dev/sdc1 2.8T 2.0T 824G 71% /var/lib/ceph/osd/ceph-1 >> /dev/sdd1 2.8T 1.5T 1.3T 55% /var/lib/ceph/osd/ceph-2 >> /dev/sde1 2.8T 1.7T 1.1T 63% /var/lib/ceph/osd/ceph-3 >> /dev/sdh1 2.8T 1.7T 1.1T 62% /var/lib/ceph/osd/ceph-4 >> /dev/sdf1 2.8T 1.9T 932G 67% /var/lib/ceph/osd/ceph-8 >> /dev/sdi1 2.8T 1.9T 880G 69% /var/lib/ceph/osd/ceph-5 >> /dev/sdg1 2.8T 2.0T 798G 72% /var/lib/ceph/osd/ceph-7 >> >> I seem to have the spread of over 30% disk utilisation between the osds, >> despite all my osds having the identical weight (ceph osd tree output): >> >> >> -2 24.56999 host arh-ibstorage1-ib >> 1 2.73000 osd.1 up 1.00000 1.00000 >> 3 2.73000 osd.3 up 1.00000 1.00000 >> 5 2.73000 osd.5 up 1.00000 1.00000 >> 6 2.73000 osd.6 up 1.00000 1.00000 >> 7 2.73000 osd.7 up 1.00000 1.00000 >> 8 2.73000 osd.8 up 1.00000 1.00000 >> 4 2.73000 osd.4 up 1.00000 1.00000 >> 0 2.73000 osd.0 up 1.00000 1.00000 >> 2 2.73000 osd.2 up 1.00000 1.00000 >> > This is just one host, are they all like that? > Full osd tree and even more importantly a "ceph -s" output. > Ceph isn't particular good at creating an even distribution, but if you > have too little PGs it gets worse, which would be my first suspicion here. > >> >> What would be the best way to correct the issue without having >> significant impact on the cluster IO? >> > Again, read the thread above. > Increasing the PG count (if that is part of your problem) will have massive > impact, but it needs to be done at some point. > Re-weighting (CRUSH, permanently) OSDs in small increments/decrements (you > want to keep the host weight more or less the same) of course also causes > data movement, but done right (see the thread) the impact can be minimized. > > Christian >> Many thanks >> >> Andrei > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com