Re: Real world benefit from SSD Journals for a more read than write cluster

"Wang, Warren" <Warren_Wang@xxxxxxxxxxxxxxxxx> · Thu, 9 Jul 2015 14:58:58 +0000

You'll take a noticeable hit on write latency. Whether or not it's tolerable will be up to you and the workload you have to capture. Large file operations are throughput efficient without an SSD journal, as long as you have enough spindles.

About the Intel P3700, you will only need 1 to keep up with 12 SATA drives. The 400 GB is probably okay if you keep the journal sizes small, but the 800 is probably safer if you plan on leaving these in production for a few years. Depends on the turnover of data on the servers.

The dual disk failure comment is pointing out that you are more exposed for data loss with 2 copies. You do need to understand that there is a possibility for 2 drives to fail either simultaneously, or one before the cluster is repaired. As usual, this is going to be a decision you need to decide if it's acceptable or not. We have many clusters, and some are 2, and others are 3. If your data resides nowhere else, then 3 copies is the safe thing to do. That's getting harder and harder to justify though, when the price of other storage solutions using erasure coding continues to plummet.

Warren

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Götz Reinicke - IT Koordinator
Sent: Thursday, July 09, 2015 4:47 AM
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  Real world benefit from SSD Journals for a more read than write cluster

Hi Christian,
Am 09.07.15 um 09:36 schrieb Christian Balzer:
> 
> Hello,
> 
> On Thu, 09 Jul 2015 08:57:27 +0200 Götz Reinicke - IT Koordinator wrote:
> 
>> Hi again,
>>
>> time is passing, so is my budget :-/ and I have to recheck the 
>> options for a "starter" cluster. An expansion next year for may be an 
>> openstack installation or more performance if the demands rise is 
>> possible. The "starter" could always be used as test or slow dark archive.
>>
>> At the beginning I was at 16SATA OSDs with 4 SSDs for journal per 
>> node, but now I'm looking for 12 SATA OSDs without SSD journal. Less 
>> performance, less capacity I know. But thats ok!
>>
> Leave the space to upgrade these nodes with SSDs in the future.
> If your cluster grows large enough (more than 20 nodes) even a single
> P3700 might do the trick and will need only a PCIe slot.

If I get you right, the 12Disk is not a bad idea, if there would be the need of SSD Journal I can add the PCIe P3700.

In the 12 OSD Setup I should get 2 P3700 one per 6 OSDs.

God or bad idea?

> 
>> There should be 6 may be with the 12 OSDs 8 Nodes with a repl. of 2.
>>
> Danger, Will Robinson.
> This is essentially a RAID5 and you're plain asking for a double disk 
> failure to happen.

May be I do not understand that. size = 2 I think is more sort of raid1 ... ? And why am I asking for for a double disk failure?

To less nodes, OSDs or because of the size = 2.

> 
> See this recent thread:
> "calculating maximum number of disk and node failure that can be 
> handled by cluster with out data loss"
> for some discussion and python script which you will need to modify 
> for
> 2 disk replication.
> 
> With a RAID5 failure calculator you're at 1 data loss event per 3.5 
> years...
> 

Thanks for that thread, but I dont get the point out of it for me.

I see that calculating the reliability is some sort of complex math ...

>> The workload I expect is more writes of may be some GB of Office 
>> files per day and some TB of larger video Files from a few users per week.
>>
>> At the end of this year we calculate to have +- 60 to 80 TB of lager 
>> videofiles in that cluster, which are accessed from time to time.
>>
>> Any suggestion on the drop of ssd journals?
>>
> You will miss them when the cluster does write, be it from clients or 
> when re-balancing a lost OSD.

I can imagine, that I might miss the SSD Journal, but if I can add the
P3700 later I feel comfy with it for now. Budget and evaluation related.

	Thanks for your helpful input and feedback. /Götz

--
Götz Reinicke
IT-Koordinator

Tel. +49 7141 969 82420
E-Mail goetz.reinicke@xxxxxxxxxxxxxxx

Filmakademie Baden-Württemberg GmbH
Akademiehof 10
71638 Ludwigsburg
www.filmakademie.de

Eintragung Amtsgericht Stuttgart HRB 205016

Vorsitzender des Aufsichtsrats: Jürgen Walter MdL Staatssekretär im Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg

Geschäftsführer: Prof. Thomas Schadt

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com