HW recommendations for OSD journals?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/16/2014 09:58 AM, Riccardo Murri wrote:
> Hello,
>
> I am new to Ceph; the group I'm working in is currently evaluating it
> for our new large-scale storage.
>
> Is there any recommendation for the OSD journals?  E.g., does it make
> sense to keep them on SSDs?  Would it make sense to host the journal
> on a RAID-1 array for added safety? (IOW: what happens if the journal
> device fails and the journal is lost?)
>
> Thanks for any explanation and suggestion!

Hi,

There are a couple of common configurations that make sense imho:

1) Leave journals on the same disks as the data (best to have them in 
their own partition).  This is a fairly safe option since the OSDs only 
have a single disk they rely on (IE minimize potential failures).  It 
can be slow, but it depends on the controller you use and possibly the 
IO scheduler.  Often times a controller with writeback cache seems to 
help avoid seek contention during writes, but you will currently lose 
about half your disk throughput to journal writes during sequential 
write IO.

2) Put journals on SSDs.  In this scenario you want to match your per 
journal SSD speed and disk speed.  IE if you have an SSD that can do 
400MB/s and disks that can do ~125MB/s of sequential writes, you 
probably want to put somewhere around 3-5 journals on the SSD depending 
on how much sequential write throughput matters to you.  OSDs are now 
dependant on both the spinning disk and the SSD not to fail, and one SSD 
failure will take down multiple OSDs.  You gain speed though and may not 
need more expensive controllers with WB cache (though they may still be 
useful to protect against power failure).

Some folks have used raid-1 LUNs for the journals and it works fine, but 
I'm not really a fan of it, especially with SSDs.  You are causing 
double the writes to the SSDs, and SSDs tend to fail in clumps based on 
the number of writes.  If the choice is between 6 journals per SSD 
RAID-1 or 3 journals per SSD JBOD, I'd choose the later.  I'd want to 
keep my overall OSD count high though to minimize the fallout from 3 
OSDs going down at once.

Arguably if you do the RAID1, can swap failed SSDs quickly, and 
anticipate that the remaining SSD is likely going to die soon after the 
first, maybe the RAID1 is worth it.  The disadvantages seem pretty steep 
to me though.

Mark

>
> Riccardo
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux