resizing the OSD

jshah2005@xxxxxx (JIten Shah) · Mon, 08 Sep 2014 09:53:58 -0700

On Sep 6, 2014, at 8:22 PM, Christian Balzer <chibi at gol.com> wrote:

> 
> Hello,
> 
> On Sat, 06 Sep 2014 10:28:19 -0700 JIten Shah wrote:
> 
>> Thanks Christian.  Replies inline.
>> On Sep 6, 2014, at 8:04 AM, Christian Balzer <chibi at gol.com> wrote:
>> 
>>> 
>>> Hello,
>>> 
>>> On Fri, 05 Sep 2014 15:31:01 -0700 JIten Shah wrote:
>>> 
>>>> Hello Cephers,
>>>> 
>>>> We created a ceph cluster with 100 OSD, 5 MON and 1 MSD and most of
>>>> the stuff seems to be working fine but we are seeing some degrading
>>>> on the osd's due to lack of space on the osd's. 
>>> 
>>> Please elaborate on that degradation.
>> 
>> The degradation happened on few OSD's because it got quickly filled up.
>> They were not of the same size as the other OSD's. Now I want to remove
>> these OSD's and readd them with correct size to match the others.
> 
> Alright, that's good idea, uniformity helps. ^^
> 
>>> 
>>>> Is there a way to resize the
>>>> OSD without bringing the cluster down?
>>>> 
>>> 
>>> Define both "resize" and "cluster down".
>> 
>> Basically I want to remove the OSD's with incorrect size and readd them
>> with the size matching the other OSD's. 
>>> 
>>> As in, resizing how? 
>>> Are your current OSDs on disks/LVMs that are not fully used and thus
>>> could be grown?
>>> What is the size of your current OSDs?
>> 
>> The size of current OSD's is 20GB and we do have more unused space on
>> the disk that we can make the LVM bigger and increase the size of the
>> OSD's. I agree that we need to have all the disks of same size and I am
>> working towards that.Thanks.
>>> 
> OK, so your OSDs are backed by LVM. 
> A curious choice, any particular reason to do so?

We already had lvm?s carved out for some other project and were not using it so we decided to have OSD?s on those LVMs

> 
> Either way, in theory you could grow things in place, obviously first the
> LVM and then the underlying filesystem. Both ext4 and xfs support online
> growing, so the OSD can keep running the whole time.
> If you're unfamiliar with these things, play with them on a test machine
> first. 
> 
> Now for the next step we will really need to know how you deployed ceph
> and the result of "ceph osd tree" (not all 100 OSDs are needed, a sample of
> a "small" and "big" OSD is sufficient).

Fixed all the sizes so all of them weight as 1
[jshah at pv11p04si-mzk001 ~]$ ceph osd tree
# id	weight	type name	up/down	reweight
-1	99	root default
-2	1		host pv11p04si-mslave0005
0	1			osd.0	up	1	
-3	1		host pv11p04si-mslave0006
1	1			osd.1	up	1	
-4	1		host pv11p04si-mslave0007
2	1			osd.2	up	1	
-5	1		host pv11p04si-mslave0008
3	1			osd.3	up	1	
-6	1		host pv11p04si-mslave0009
4	1			osd.4	up	1	
-7	1		host pv11p04si-mslave0010
5	1			osd.5	up	1	
> 
> Depending on the results (it will probably have varying weights depending
> on the size and a reweight value of 1 for all) you will need to adjust the
> weight of the grown OSD in question accordingly with "ceph osd crush
> reweight". 
> That step will incur data movement, so do it one OSD at a time.
> 
>>> The normal way of growing a cluster is to add more OSDs.
>>> Preferably of the same size and same performance disks.
>>> This will not only simplify things immensely but also make them a lot
>>> more predictable.
>>> This of course depends on your use case and usage patterns, but often
>>> when running out of space you're also running out of other resources
>>> like CPU, memory or IOPS of the disks involved. So adding more instead
>>> of growing them is most likely the way forward.
>>> 
>>> If you were to replace actual disks with larger ones, take them (the
>>> OSDs) out one at a time and re-add it. If you're using ceph-deploy, it
>>> will use the disk size as basic weight, if you're doing things
>>> manually make sure to specify that size/weight accordingly.
>>> Again, you do want to do this for all disks to keep things uniform.
>>> 
>>> If your cluster (pools really) are set to a replica size of at least 2
>>> (risky!) or 3 (as per Firefly default), taking a single OSD out would
>>> of course never bring the cluster down.
>>> However taking an OSD out and/or adding a new one will cause data
>>> movement that might impact your cluster's performance.
>>> 
>> 
>> We have a current replica size of 2 with 100 OSD's. How many can I loose
>> without affecting the performance? I understand the impact of data
>> movement.
>> 
> Unless your LVMs are in turn living on a RAID, a replica of 2 with 100
> OSDs is begging Murphy for a double disk failure. I'm also curious on how
> many actual physical disks those OSD live and how many physical hosts are
> in your cluster.

we have 1 physical disk on each host and 1 OSD per host. So we have 100 physical hosts for OSD?s and 5 physical hosts for MON + MDS.

> So again, you can't loose more than one OSD at a time w/o loosing data.
> 
> The performance impact of losing a single OSD out of 100 should be small,
> especially given the size of your OSDs. However w/o knowing your actual
> cluster (hardware and otherwise) don't expect anybody here to make
> accurate predictions. 
> 
> Christian
> 
>> --Jiten
>> 
>> 
>> 
>> 
>> 
>>> Regards,
>>> 
>>> Christian
>>> -- 
>>> Christian Balzer        Network/Systems Engineer                
>>> chibi at gol.com   	Global OnLine Japan/Fusion Communications
>>> http://www.gol.com/
>> 
>> 
> 
> 
> -- 
> Christian Balzer        Network/Systems Engineer                
> chibi at gol.com   	Global OnLine Japan/Fusion Communications
> http://www.gol.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140908/58a16da8/attachment.htm>