Re: Proper configuration of the SSDs in a storage brick

Sage Weil <sage@xxxxxxxxxxx> · Fri, 26 Oct 2012 09:33:52 -0700 (PDT)

On Fri, 26 Oct 2012, Wido den Hollander wrote:
> On 10/25/2012 03:30 PM, Stephen Perkins wrote:
> > Hi all,
> > 
> > In looking at the design of a storage brick (just OSDs), I have found a dual
> > power hardware solution that allows for 10 hot-swap drives and has a
> > motherboard with 2 SATA III 6G ports (for the SSDs) and 8 SATA II 3G (for
> > physical drives).  No RAID card. This seems a good match to me given my
> > needs.  This system also supports 10G Ethernet via an add in card, so please
> > assume that for the questions.  I'm also assuming 2TB or 3TB drives for the
> > 8 hot swap.  My workload is throughput intensive (writes mainly) and not IOP
> > heavy.
> > 
> > I have 2 questions and would love to hear from the group.
> > 
> > Question 1: What is the most appropriate configuration for the journal SSDs?
> > 
> > I'm not entirely sure what happens when you lose a journal drive.  If the
> > whole brick goes offline (i.e. all OSDs stop communicating with ceph), does
> > it make since to configure the SSDs into RAID1?
> > 
> 
> When you loose the journal these OSDs will commit suicide and in this case
> you'd loose 8 OSDs.

One small correction here: it depends.

If you use ext4 or XFS, then yes: losing the journal means the data disk 
is lost too.  

However, if you use btrfs, the data disk has consistent point-in-time 
checkpoints it can roll back to.  It is not useful from the perspective of 
a specific IO request (i.e., if client X wrote to 3 replicas, then all 3 
replicas lost their journals, the write may have been lost, along with the 
other 0.01% of writes that happened in the last several seconds).

On the other hand, for a single osd that loses the journal, you can 
initialize a new one and it can rejoin the cluster and will have 99.99% of 
the data in place, making reintegration/recovery quick and cheap

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html