resizing the OSD

chibi@xxxxxxx (Christian Balzer) · Sun, 7 Sep 2014 12:22:41 +0900

Hello,

On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote:

> Thanks Christian.  Replies inline.
> On Sep 6, 2014, at 8:04 AM, Christian Balzer <chibi at gol.com> wrote:
> 
> > 
> > Hello,
> > 
> > On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
> > 
> >> Hello Cephers,
> >> 
> >> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of
> >> the stuff seems to be working fine but we are seeing some degrading
> >> on the osd's due to lack of space on the osd's. 
> > 
> > Please elaborate on that degradation.
> 
> The degradation happened on few OSD's because it got quickly filled up.
> They were not of the same size as the other OSD's. Now I want to remove
> these OSD's and readd them with correct size to match the others.

Alright, that's good idea, uniformity helps. ^^

> > 
> >> Is there a way to resize the
> >> OSD without bringing the cluster down?
> >> 
> > 
> > Define both "resize" and "cluster down".
> 
> Basically I want to remove the OSD's with incorrect size and readd them
> with the size matching the other OSD's. 
> > 
> > As in, resizing how? 
> > Are your current OSDs on disks/LVMs that are not fully used and thus
> > could be grown?
> > What is the size of your current OSDs?
> 
> The size of current OSD's is 20GB and we do have more unused space on
> the disk that we can make the LVM bigger and increase the size of the
> OSD's. I agree that we need to have all the disks of same size and I am
> working towards that.Thanks.
> > 
OK, so your OSDs are backed by LVM. 
A curious choice, any particular reason to do so?

Either way, in theory you could grow things in place, obviously first the
LVM and then the underlying filesystem. Both ext4 and xfs support online
growing, so the OSD can keep running the whole time.
If you're unfamiliar with these things, play with them on a test machine
first. 

Now for the next step we will really need to know how you deployed ceph
and the result of "ceph osd tree" (not all 100 OSDs are needed, a sample of
a "small" and "big" OSD is sufficient).

Depending on the results (it will probably have varying weights depending
on the size and a reweight value of 1 for all) you will need to adjust the
weight of the grown OSD in question accordingly with "ceph osd crush
reweight". 
That step will incur data movement, so do it one OSD at a time.

> > The normal way of growing a cluster is to add more OSDs.
> > Preferably of the same size and same performance disks.
> > This will not only simplify things immensely but also make them a lot
> > more predictable.
> > This of course depends on your use case and usage patterns, but often
> > when running out of space you're also running out of other resources
> > like CPU, memory or IOPS of the disks involved. So adding more instead
> > of growing them is most likely the way forward.
> > 
> > If you were to replace actual disks with larger ones, take them (the
> > OSDs) out one at a time and re-add it. If you're using ceph-deploy, it
> > will use the disk size as basic weight, if you're doing things
> > manually make sure to specify that size/weight accordingly.
> > Again, you do want to do this for all disks to keep things uniform.
> > 
> > If your cluster (pools really) are set to a replica size of at least 2
> > (risky!) or 3 (as per Firefly default), taking a single OSD out would
> > of course never bring the cluster down.
> > However taking an OSD out and/or adding a new one will cause data
> > movement that might impact your cluster's performance.
> > 
> 
> We have a current replica size of 2 with 100 OSD's. How many can I loose
> without affecting the performance? I understand the impact of data
> movement.
> 
Unless your LVMs are in turn living on a RAID, a replica of 2 with 100
OSDs is begging Murphy for a double disk failure. I'm also curious on how
many actual physical disks those OSD live and how many physical hosts are
in your cluster.
So again, you can't loose more than one OSD at a time w/o loosing data.

The performance impact of losing a single OSD out of 100 should be small,
especially given the size of your OSDs. However w/o knowing your actual
cluster (hardware and otherwise) don't expect anybody here to make
accurate predictions. 

Christian

> --Jiten
> 
> 
> 
> 
> 
> > Regards,
> > 
> > Christian
> > -- 
> > Christian Balzer        Network/Systems Engineer                
> > chibi at gol.com   	Global OnLine Japan/Fusion Communications
> > http://www.gol.com/
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi at gol.com   	Global OnLine Japan/Fusion Communications
http://www.gol.com/