Re: HW Raid vs. Multiple OSD

Oscar Segarra <oscar.segarra@xxxxxxxxx> · Mon, 13 Nov 2017 20:04:28 +0100

Hi Brady,

For me is very difficult to make a PoC because servers are very expensive. 

Then, may I understand that your advice is a RAID0 for each 4TB? For a balanced configuration... 

1 osd x 1 disk of 4TB
1 osd x 2 disks of 2TB
1 odd x 4 disks of 1 TB

Isn't it? 

Thanks a lot

El 13 nov. 2017 18:40, "Brady Deetz" <bdeetz@xxxxxxxxx> escribió:

On Nov 13, 2017 11:17 AM, "Oscar Segarra" <oscar.segarra@xxxxxxxxx> wrote:
Hi Brady, 
Thanks a lot again for your comments and experience.

This is a departure from what I've seen people do here. I agree that 100 VMs on 24 cores would be potentially over consolidating. But, when it comes to your storage, you probably don't want to lose the data and shouldn't skimp. Could you lower VMs per host to 75-80? 
--> Yes, that's the reason I'm asking this... If I create a RAID5 or RAID0 with 8 disks... I will have just a single OSD process and therefore, I can let 31 for myy 100 VDIs that I think can be enough.

Also, I notice you have no ssd storage. Are these VMs expected to be performant at all? 100 VMs accessing 8 spinners could cause some serious latency. 
--> I'm planning to use all SSD in my infraestructure in order to avoid IO issues. This might not be a problem

My mistake, I read 8x 8TB not 1TB. There are some decent sizing conversations on the list regarding all ssd deployments. If I were doing this and forced to scrape a few more cores per host, I would run some tests in different configurations. My guess is that 4x raid 0 per host will result in a nice compromise between overhead, performance, and consolidation ratio. But again, this is a not so advised configuration. No matter what, before I took this into production, I'd purchase enough hardware to do a proof of concept using a minimal configuration of 3 hosts. Then just run benchmarks with 1x raid 6, 1x raid 0, 4x raid 0, and no raid + pinned osd process 2-to-1 core. 

If none of that works, it's back to the drawing board for you. 

Minimum cluster size should be 3 because you are making 3 replicas with min_size 2. If you lose 1 host in a cluster of 2, you will likely lose access to data because 2 replicas existed on the host that went down. You will have a bad time if you run a cluster with 2 replicas. 
--> Yes, depend on the VDI nodes, starting from 3.

Thanks a lot in advance for your help!

2017-11-13 18:06 GMT+01:00 Brady Deetz <bdeetz@xxxxxxxxx>:

On Nov 13, 2017 10:44 AM, "Oscar Segarra" <oscar.segarra@xxxxxxxxx> wrote:
Hi Brady, 
Thanks a lot for your comments.

I can't think of a reason to use raid 5 and ceph together, even in a vdi instance. You're going to want throughput for this use case. What you can do is set the affinity of those osd processes to cores not in use by the VMs. I do think it will need to be more than 1 core. It is recommended that you dedicate 1 core per osd, but you could maybe get away with collocating the processes. You'd just have to experiment.
What we really need to help you is more information.
--> If my host has 32 cores and 8 disks or 8 OSDs and I have to pin each osd process to a core. I will have just 24 cores for all my host an windows guest load.

What hardware are you planning to use? 
--> I'm planning to use a standard server as ProLiant. In my configuration, each ProLiant will be compute for 100 VDIs and Storage node. Each ProLiant will have 32 cores, 384GB RAM and a RAID1 for OS

This is a departure from what I've seen people do here. I agree that 100 VMs on 24 cores would be potentially over consolidating. But, when it comes to your storage, you probably don't want to lose the data and shouldn't skimp. Could you lower VMs per host to 75-80? 
Also, I notice you have no ssd storage. Are these VMs expected to be performant at all? 100 VMs accessing 8 spinners could cause some serious latency. 

How many osd nodes do you plan to deploy?
--> Depends on the VDIs to deploy. If customer wants to deploy 100 VDIs then 2 OSD nodes will be deployed.

Minimum cluster size should be 3 because you are making 3 replicas with min_size 2. If you lose 1 host in a cluster of 2, you will likely lose access to data because 2 replicas existed on the host that went down. You will have a bad time if you run a cluster with 2 replicas. 

What will the network look like?
--> I'm planning to use a 10G. Don't know if with 1GB is enough.

For the sake of latency alone, you want 10gbps sfp+

Are you sure Ceph is the right solution for you?
--> Yes, I have tested some others like gluster but looks ceph is the one that fits better to my solution.

Have you read and do you understand the architecture docs for Ceph? 
--> Absolutely. 

Thanks a lot!

2017-11-13 17:27 GMT+01:00 Brady Deetz <bdeetz@xxxxxxxxx>:
I can't think of a reason to use raid 5 and ceph together, even in a vdi instance. You're going to want throughput for this use case. What you can do is set the affinity of those osd processes to cores not in use by the VMs. I do think it will need to be more than 1 core. It is recommended that you dedicate 1 core per osd, but you could maybe get away with collocating the processes. You'd just have to experiment.
What we really need to help you is more information.

What hardware are you planning to use? How many osd nodes do you plan to deploy?

What will the network look like?

Are you sure Ceph is the right solution for you?

Have you read and do you understand the architecture docs for Ceph? 

On Nov 13, 2017 5:26 AM, "Oscar Segarra" <oscar.segarra@xxxxxxxxx> wrote:
Hi, 
I'm designing my infraestructure. I want to provide 8TB (8 disks x 1TB each) of data per host just for Microsoft Windows 10 VDI. In each host I will have storage (ceph osd) and compute (on kvm).

I'd like to hear your opinion about theese two configurations:

1.- RAID5 with 8 disks (I will have 7TB but for me it is enough) + 1 OSD daemon
2.- 8 OSD daemons

I'm a little bit worried that 8 osd daemons can affect performance because all jobs running and scrubbing.

Another question is the procedure of a replacement of a failed disk. In case of a big RAID, replacement is direct. In case of many OSDs, the procedure is a little bit tricky.

http://ceph.com/geen-categorie/admin-guide-replacing-a-failed-disk-in-a-ceph-cluster/

What is your advice?

Thanks a lot everybody in advance...

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com