PCI-E SSD Journal for SSD-OSD Disks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Excellent choice.. we did put 2 of them (350G) in Dell R720xd  with 
24x   10K   1.2Gb drives... Excellent performance.. we now need to test 
it on low latency switches .. (we are on regular switches 2x10GB per 
servers).

I know for that amount of OSD we would need 3 cards to max out IO but 
Network is still our bottleneck right now..


Steph.

On 14-05-15 02:19 PM, Tyler Wilson wrote:
> Hey All,
>
> Thanks for the quick responses! I have chosen the micron pci-e card 
> due to its benchmark results on
> http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_review 
> . Per the vendor the card has
> a 25PB life expectancy so I'm not terribly worried about it failing on 
> me too soon :)
>
> Christian Balzer <chibi at ...> writes:
>
> >
> > On Wed, 14 May 2014 19:28:17 -0500 Mark Nelson wrote:
> >
> > > On 05/14/2014 06:36 PM, Tyler Wilson wrote:
> > > > Hey All,
> > >
> > > Hi!
> > >
> > > >
> > > > I am setting up a new storage cluster that absolutely must have the
> > > > best read/write sequential speed  <at>  128k and the highest 
> IOps at 4k
> > > > read/write as possible.
> > >
> > > I assume random?
> > >
> > > >
> > > > My current specs for each storage node are currently;
> > > > CPU: 2x E5-2670V2
> > > > Motherboard: SM X9DRD-EF
> > > > OSD Disks: 20-30 Samsung 840 1TB
> > > > OSD Journal(s): 1-2 Micron RealSSD P320h
> > > > Network: 4x 10gb, Bridged
> > I assume you mean 2x10Gb bonded for public and 2x10Gb for cluster 
> network?
> >
> > The SSDs you specified would read at about 500MB/s, meaning that 
> only 4 of
> > them would already saturate your network uplink.
> > For writes (assuming journal on SSDs, see below) you reach that 
> point with
> > just 8 SSDs.
> >
>
> the 4x 10gb will be ceph-storage only traffic with public and 
> management being on-board interfaces.
> This is expandable to 80Gbps if needed.
>
>
> > > > Memory: 32-96GB depending on need
> > RAM is pretty cheap these days and a large pagecache on the storage 
> nodes
> > is always quite helpful.
> >
>
> Noted, I wasn't sure how Ceph used the linux memory cache or if it 
> would benefit us.
>
> > > >
> >
> > How many of these nodes are you planning to deploy initially?
> > As always and especially when going for performance, more and smaller
> > nodes tend to be better, also less impact if one goes down.
> > And in your case it is easier to balance storage and network bandwidth,
> > see above.
> >
>
> 2 storage nodes per location at start, these are serving OpenStack 
> VM's so whenever it gets utilized
> enough to warrant more.
>
> > > > Does anyone see any potential bottlenecks in the above specs? 
> What kind
> > > > of improvements or configurations can we make on the OSD config 
> side?
> > > > We are looking to run this with 2 replication.
> > >
> > > Likely you'll run into latency due to context switching and lock
> > > contention in the OSDs and maybe even some kernel slowness. 
>  Potentially
> > > you could end up CPU limited too, even with E5-2670s given how 
> fast all
> > > of those SSDs are.  I'd suggest considering a chassis without an
> > > expander backplane and using multiple controllers with the drives
> > > directly attached.
> > >
> >
> > Indeed, I'd be worried about that as well, same with the
> > chassis/controller bit.
> >
>
>
> Thanks for the advise on the controller card, we will look into 
> different chassis options w/ the LSI
> cards recommended on the InkTank docs.
> Would running a different distribution affect this at all? Our target 
> was CentOS 6 however if a more
> recent kernel would make a difference we could switch.
>
> > > There's work going into improving things on the Ceph side but I don't
> > > know how much of it has even hit our wip branches in github yet. 
>  So for
> > > now ymmv, but there's a lot of work going on in this area as it's
> > > something that lots of folks are interested in.
> > >
> > If you look at the current "Slow IOPS on RBD compared to journal and
> > backing devices" thread and the Inktank document referenced in it
> >
> > 
> https://objects.dreamhost.com/inktankweb/Inktank_Hardware_Configuration_Guide.pdf 
>
> >
> > you should probably assume no more than 800 random write IOPS and 4000
> > random read IOPS per OSD (4KB block size).
> > That later number I can also reproduce with my cluster.
> >
> > Now I expect those numbers to go up as Ceph is improved, but for the 
> time
> > being those limits might influence your choice of hardware.
> >
> > > I'd also suggest testing whether or not putting all of the 
> journals on
> > > the RealSSD cards actually helps you that much over just putting your
> > > journals on the other SSDs.  The advantage here is that by putting
> > > journals on the 2.5" SSDs, you don't lose a pile of OSDs if one of 
> those
> > > PCIE cards fails.
> > >
> > More than seconded, I could only find READ values on the Micron site 
> which
> > makes me very suspicious, as the journal's main role is to be able to
> > WRITE as fast as possible. Also all journals combined ought to be faster
> > than your final storage.
> > Lastly there was no endurance data on the Micron site either and 
> with ALL
> > your writes having to through those devices I'd be dead scared to deploy
> > them.
> >
> > I'd spend that money on the case and controllers as mentioned above and
> > better storage SSDs.
> >
> > I was going to pipe up about the Samsungs, but Mark Kirkwood did beat me
> > to it.
> > Unless you can be 100% certain that your workload per storage SSD
> > doesn't exceed 40GB/day I'd stay very clear of them.
> >
> > Christian
> >
>
> Would it be possible to have redundant journals in this case? Per
> http://www.storagereview.com/micron_realssd_p320h_enterprise_pcie_review 
> the 350gb model has 25PB
> expectancy. On a purely IO/ps level from benchmarking with 4k writes 
> the Micron is 25x faster than the
> Samsung 840's we tested with, hence the move to PCI-e journals.
>
>
> > > The only other thing I would be careful about is making sure that 
> your
> > > SSDs are good about dealing with power failure during writes.  Not 
> all
> > > SSDs behave as you would expect.
> > >
> > > >
> > > > Thanks for your guys assistance with this.
> > >
> > > np, good luck!
> > >
> > > >
> > > >
> > > > _______________________________________________
> > > > ceph-users mailing list
> > > > ceph-users at ...
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users at ...
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> >
>
> Thanks again for the responses!
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140515/9dbf5df8/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux