Re: Large storage nodes - best practices

Gilles Mocellin <gilles.mocellin@xxxxxxxxxxxxxx> · Tue, 06 Aug 2013 09:33:41 +0200

Le 06/08/2013 02:57, James Harper a écrit :

In the previous email, you are forgetting Raid1 has a write penalty of 2 since it
is mirroring and now we are talking about different types of raid and nothing
really to do about Ceph. One of the main advantages of Ceph is to have data
replicated so you don't have to do Raid to that degree. I am sure there is
math to do this but larger quantity of smaller nodes have better fail-over
than a few large nodes. If you are competing over CPU resources then you
can use Raid0 with minimal write penalty (never thought I suggest Raid0
haha). You may not max out the drive speed because of CPU but that is the
cost of switching to a data system the machine was not intended for. It
would be good information to know the limits of what a machine could do
with Ceph, so please do share if you do some tests.

Overall from my understanding it is generally better to move to the ideal
node size for Ceph then slowly deprecate the larger nodes also
fundamentally since replication is done at a higher level than individual
spinners. The idea of doing raid falls farther behind.

The reason for suggesting RAID1 was to ease the job of disk replacement, and also to minimise the chance of crashing. With 1 or 2 OSD's per node and many nodes it doesn't really matter if a screwy disk brings down your system. With 24 OSD's per node, bringing down a node is more of a big deal. Or maybe there is no chance of a failing disk causing an XFS oops these days? (I know it has in the past)

Also I think there won't be sufficient network bandwidth to saturate 24 disks so bringing it down to 12 RAID1 sets isn't such a problem. The RAID1 write penalty isn't significant for throughput as the writes are done in parallel, and you can get increased performance on read I think.

I don't use RAID1 on my setup, but then I don't have 24-36 disks per node!

Hello,

Just an idea, BTRFS can do RAID.
I know it is not production ready, but when it will be, it will 
certainly be the easiest way to have local redundancy (disk failure), 
for ease of management, while Ceph do remote replication against 
node/rack/datacenter failure.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com