Re: CEPH I/O Performance with OpenStack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Robert for your response. I'm considering giving SAS 600G 15K a try before moving to SSD. It should give ~175 IOPS per disk.

Do you think the performance will be better if i goes with the following setup ?
4x OSD nodes 
2x SSD - RAID 1 for OS and Journal 
10x 600G SAS 15K - NO Raid
Two Replication.

According to the IOPS calculation you did for the 4TB. Please clarify is 1100 IOPS will be for the one node and the cluster IOPS =$number_of_nodes x $IOPS_per_node ?

If this formula is correct, That's being said the cluster on the 4TB - my current setup should give in total "2200 IOPS" and the new SAS setup should give "3500 IOPS" ?

Please correct me if i understand this wrong.

Thanks in advance,

On Tue, Jan 27, 2015 at 3:30 PM, Robert van Leeuwen <Robert.vanLeeuwen@xxxxxxxxxxxxx> wrote:
> I have two ceph nodes with the following specifications
> 2x CEPH - OSD - 2 Replication factor
> Model : SuperMicro X8DT3
> CPU : Dual intel E5620
> RAM : 32G
> HDD : 2x 480GB SSD RAID-1 ( OS and Journal )
>      22x 4TB SATA RAID-10 ( OSD )
>
> 3x Controllers - CEPH Monitor
> Model : ProLiant DL180 G6
> CPU : Dual intel E5620
> RAM : 24G
>
>
> If it's a hardware issue please help finding out an answer for the following 5 questions.

4 TB spinners do not give a lot of IOPS, about 100 random IOPS per disk.
In total it would just be 1100 IOPS: 44 disk times 100 IOPS divide by 2 for RAID and divide by 2 for replication factor.
There might be a bit of caching on the RAID controller and SSD journal but worst case you will get just 1100 IOPS.

> I need around 20TB storage, SuperMicro SC846TQ can get 24 hardisk.
> I may attach 24x 960G SSD - NO Raid - with 3x SuperMicro servers - replication factor 3.
>
>Or it's better to scale-out and put smaller disks on many servers such ( HP DL380pG8/2x Intel Xeon E5-2650 ) which can hold 12 hardisk
> And Attach 12x 960G SSD - NO Raid - 6x OSD nodes - replication factor 3.

An OSD for a SSD can easily eat a whole CPU core so 24 SSDs would be to much.
More smaller nodes also have the upside off smaller impact when a node breaks.
You could also look at the Supermicro  2u twin chassis with 2 servers with 12 disks in 2u.
Note that you will not get near to theoretical native performance of those combined SSDs (100000+ IOPS) but performance will be good none the less.
There have been a few threads about that here before so look back in the mail threads to find out more.

> 2. I'm using Mirantis/Fuel 5 for provisioning and deployment of nodes
> When i attach the new ceph osd nodes to the environment, Will the data be replicated automatically
> from my current old SuperMicro OSD nodes to the new servers after the deployment complete ?
Don't know the specifics of Fuel and how it manages the crush map.
Some of the data will end up there but not a copy of all data unless you specify the new servers as a new failure domain in the crush map.

> 3. I will use 2x 960G SSD RAID 1 for OS
> Is it recommended put the SSD journal disk as a separate partition on the same disk of OS ?
If you run with SSDs only I would put the journals together with the data SSDs.
It makes a lot of sense to have them on seperate SSDs when your data disks are spinners.
(because of the speed difference and bad random IOPS performance of spinners.)

> 4. Is it safe to remove the OLD ceph nodes while i'm currently using 2 replication factors after adding the new hardware nodes ?
It is probably not safe to just turn them off (as mentioned above it depend on the crush map failure domain layout)
The safe way would be to follow the documentation on how to remove an OSD: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
This will make sure the data is re-located before the OSD is removed.

> 5. Do i need RAID 1 for the journal hardisk ? and if not, What will happen if one of the journal HDD's failed ?
No, it is not required. Both have trade-offs.
Disks that are "behind the journal" will become unavailable when it happens.
RAID1 will be a bit easier to replace in case of a single SSD failure but is useless if the 2 SSDs fail at the same time (e.g. due to wear).
JBOD will reduce the write load and wear plus it has less impact when it does fail.

> 6. Should i use RAID Level for the drivers on OSD nodes ? or it's better to go without RAID ?
Without RAID usually makes for better performance. Benchmark your specific workload to be sure.
In general I would go for 3 replica's and no RAID.

Cheers,
Robert van Leeuwen


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux