Re: journal or cache tier on SSDs ?

Christian Balzer <chibi@xxxxxxx> · Thu, 12 May 2016 10:10:36 +0900

Hello,

On Wed, 11 May 2016 11:27:28 -0400 Jonathan D. Proulx wrote:

> On Tue, May 10, 2016 at 10:40:08AM +0200, Yoann Moulin wrote:
> 
> :RadowGW (S3 and maybe swift for hadoop/spark) will be the main usage.
> Most of :the access will be in read only mode. Write access will only be
> done by the :admin to update the datasets.
> 
> No one seems to have pointed this out, but if your write workload isn't
> performance sensitive there's no point in using SSD for journals.
>
I most certainly did. ^_-

> Whether you can/should repurpose as a cache tier is another issue. I
> don't have any experince with that so can not comment.
> 
> But I think you should not use them as journals becasue each SSD
> becomes a single point of failure for multiple OSDs. 

He's talking about 24 storage nodes, any Ceph cluster (and certainly one
of that size) should be able to take the loss of a node in it's stride,
loosing 6 OSDs due to a SSD journal failure would be half of that impact. 

So while not ideal (I try to keep my SSD journal to OSD ratio at 4:1 or
below), it's something that should be fine in his case.

> I'm using
> mirrored 3600 series SSDs for journaling but they're the same
> generation and subject to identical write loads so I'm suspicious
> about wether this is useful or just twice as expensive.
> 
Both, really.
As in, you're insuring against a SSD failure that's not caused by wear-out
(like SATA channel failure), but you're not protecting against the most
likely failure mode (reaching it's wear-out limit) and spending money and
space (disk bay). 

If you have the money, bully for you.
You may want to swap single SSDs around (from another node) at some point
when they have sufficiently disparate utilization and thus wind up with
RAIDs where one SSD has at least a 10% different wear-out than the other.

> There's also additional complexity in deploy and management when you
> split off the journals just becuase it's a more complex system.  This
> part isn't too bad and can mostly be automated away, but if you don't
> need the performance why pay it.
> 
> I too work in an accademic research lab, so if you need to keep the
> donor happy by all means decide which way with the system is better.
> Leaving them as journal if cache doesn't fit isn't likely to cause
> much harm so long as you're replicating your data and can survive an
> ssd loss but you should do that to survive a spinning disk loss or
> storage node loss anyway.
> 
Exactly what I said, however his SSDs (as in 3500 OEM models) are very
much unsuited for journal use.

> But if I were you my choice would be between caching and moving them
> to a non-ceph use.
> 
A readforward or readonly cache-tier with very strict promotion rules is
probably the best fit for those SSDs, still not ideal though. 

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com