Using Ramdisk wi

onlydebian@xxxxxxxxx (debian Only) · Fri, 1 Aug 2014 17:30:51 +0700

i am looking for the method how to ramdisk with Ceph , just for test
environment, i do not have enough SSD for each osd.   but do not how to
move osd journal to a tmpfs or ramdisk.

hope some one can give some guide.

2014-07-31 8:58 GMT+07:00 Christian Balzer <chibi at gol.com>:

>
> On Wed, 30 Jul 2014 18:17:16 +0200 Josef Johansson wrote:
>
> > Hi,
> >
> > Just chippin in,
> > As RAM is pretty cheap right now, it could be an idea to fill all the
> > memory slots in the OSDs, bigger chance that the data you've requested
> > is actually in ram already then.
> >
> While that is very, VERY true, it won't help his perceived bad read speeds
> much, as they're not really caused by the OSDs per se.
>
> > You should go with DC S3700 400GB for the journals at least..
> >
> That's probably going overboard in the other direction.
> While on paper this would be the first model to handle the sequential
> write speeds of 3 HDDs, that kind of scenario is pretty unrealistic.
> Even with just one client writing they will never reach those speeds due
> to FS overhead, parallel writes caused by replication and so forth.
>
> The only scenario where this makes some sense is one with short, very high
> write spikes that can be handled by the journal (both in size and ceph
> settings like filestore max/min sync interval), followed by long enough
> pauses to scribble the data to the HDDs.
>
> In the end for nearly all use cases obsessing over high write speeds is a
> fallacy, one is much more likely to run out of steam due to IOPS caused by
> much smaller transactions.
>
> What would worry me about the small DC 3500 is the fact that it is only
> rated for about 38GB writes/day over 5 years. Now this could be very well
> within the deployment parameters, but we don't know.
>
> A 200GB DC S3700 should be fine here, higher endurance, about 3 times the
> speed of the DC 3500 120GB for sequential writes and 8 times for write
> IOPS.
>
> Christian
>
> > Cheers,
> > Josef
> >
> > On 30/07/14 17:12, Christian Balzer wrote:
> > > On Wed, 30 Jul 2014 10:50:02 -0400 German Anders wrote:
> > >
> > >> Hi Christian,
> > >>       How are you? Thanks a lot for the answers, mine in red.
> > >>
> > > Most certainly not in red on my mail client...
> > >
> > >> --- Original message ---
> > >>> Asunto: Re: [ceph-users] Using Ramdisk wi
> > >>> De: Christian Balzer <chibi at gol.com>
> > >>> Para: <ceph-users at lists.ceph.com>
> > >>> Cc: German Anders <ganders at despegar.com>
> > >>> Fecha: Wednesday, 30/07/2014 11:42
> > >>>
> > >>>
> > >>> Hello,
> > >>>
> > >>> On Wed, 30 Jul 2014 09:55:49 -0400 German Anders wrote:
> > >>>
> > >>>> Hi Wido,
> > >>>>
> > >>>>              How are you? Thanks a lot for the quick response. I
> > >>>> know that is
> > >>>> heavy cost on using ramdisk, but also i want to try that to see if i
> > >>>> could get better performance, since I'm using a 10GbE network with
> > >>>> the following configuration and i can't achieve more than 300MB/s of
> > >>>> throughput on rbd:
> > >>>>
> > >>> Testing the limits of Ceph with a ramdisk based journal to see what
> > >>> is possible in terms of speed (and you will find that it is
> > >>> CPU/protocol bound) is fine.
> > >>> Anything resembling production is a big no-no.
> > >> Got it, did you try flashcache from facebook or dm-cache?
> > > No.
> > >
> > >>>
> > >>>
> > >>>> MON Servers (3):
> > >>>>              2x Intel Xeon E3-1270v3 @3.5Ghz (8C)
> > >>>>              32GB RAM
> > >>>>              2x SSD Intel 120G in RAID1 for OS
> > >>>>              1x 10GbE port
> > >>>>
> > >>>> OSD Servers (4):
> > >>>>              2x Intel Xeon E5-2609v2 @2.5Ghz (8C)
> > >>>>              64GB RAM
> > >>>>              2x SSD Intel 120G in RAID1 for OS
> > >>>>              3x SSD Intel 120G for Journals (3 SAS disks: 1 SSD
> > >>>> Journal)
> > >>> You're not telling us WHICH actual Intel SSDs you're using.
> > >>> If those are DC3500 ones, then 300MB/s totoal isn't a big surprise
> > >>> at all,
> > >>> as they are capable of 135MB/s writes at most.
> > >> The SSD model is Intel SSDSC2BB120G4 firm D2010370
> > > That's not really an answer, but then again Intel could have chosen
> > > model numbers that resemble their product names.
> > >
> > > That is indeed a DC 3500, so my argument stands.
> > > With those SSDs for your journals, much more than 300MB/s per node is
> > > simply not possible, never mind how fast or slow the HDDs perform.
> > >
> > >>>
> > >>>
> > >>>>              9x SAS 3TB 6G for OSD
> > >>> That would be somewhere over 1GB/s in theory, but give file system
> > >>> and other overheads (what is your replication level?) that's a very
> > >>> theoretical value indeed.
> > >> The RF is 2, so perf should be much better, also notice that read
> > >> perf is really poor, around 62MB/s...
> > >>
> > > A replication factor of 2 means that each write is amplified by 2.
> > > So half of your theoretical performance is gone already.
> > >
> > > Do your tests with atop or iostat running on all storage nodes.
> > > Determine where the bottleneck is, the journals SSDs or the HDDs or
> > > (unlikely) something else.
> > >
> > > Read performance sucks balls with RBD (at least individually), it can
> > > be improved by fondling the readahead value. See:
> > >
> > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/8817
> > >
> > > This is something the Ceph developers are aware of and hopefully will
> > > address in the future:
> > >
> https://wiki.ceph.com/Planning/Blueprints/Emperor/Kernel_client_read_ahead_optimization
> > >
> > > Christian
> > >
> > >>>
> > >>>
> > >>> Christian
> > >>>
> > >>>>              2x 10GbE port (1 for Cluster Network, 1 for Public
> > >>>> Network)
> > >>>>
> > >>>> - 10GbE Switches (1 for Cluster interconnect and 1 for Public
> > >>>> network)
> > >>>> - Using Ceph Firefly version 0.80.4.
> > >>>>
> > >>>>              The thing is that with fio, rados bench and vdbench
> > >>>> tools we
> > >>>> only see 300MB/s on writes (rand and seq) with bs of 4m and 16
> > >>>> threads, that's pretty low actually, yesterday i was talking in the
> > >>>> ceph irc and i hit with the presentation that someone from Fujitsu
> > >>>> do on Frankfurt and also with some mails with some config at 10GbE
> > >>>> and he achieve almost 795MB/s and more... i would like to know if
> > >>>> possible how to implement that so we could improve our ceph cluster
> > >>>> a little bit more, i actually configure the scheduler on the SSD's
> > >>>> disks both OS and Journal to [noop] but still didn't notice any
> > >>>> improvement. That's why we would like to try RAMDISK on Journals,
> > >>>> i've noticed that he implement that on their Ceph cluster.
> > >>>>
> > >>>> I will really appreciate the help on this. Also if you need me to
> > >>>> send you some more information about the  Ceph scheme please let me
> > >>>> know. Also if someone could share some detail conf info will really
> > >>>> help!
> > >>>>
> > >>>> Thanks a lot,
> > >>>>
> > >>>>
> > >>>> German Anders
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>> --- Original message ---
> > >>>>> Asunto: Re: [ceph-users] Using Ramdisk wi
> > >>>>> De: Wido den Hollander <wido at 42on.com>
> > >>>>> Para: <ceph-users at lists.ceph.com>
> > >>>>> Fecha: Wednesday, 30/07/2014 10:34
> > >>>>>
> > >>>>> On 07/30/2014 03:28 PM, German Anders wrote:
> > >>>>>>
> > >>>>>> Hi Everyone,
> > >>>>>>
> > >>>>>>                                Anybody is using ramdisk to put
> > >>>>>> the Journal on it? If
> > >>>>>> so, could
> > >>>>>> you please share the commands to implement that? since I'm having
> > >>>>>> some issues with that and want to test that out to see if i could
> > >>>>>> get better
> > >>>>>> performance.
> > >>>>> Don't do this. When you loose the journal, you loose the OSD. So a
> > >>>>> reboot of the machine effectively trashes the data on that OSD.
> > >>>>>
> > >>>>> Wido
> > >>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Thanks in advance,
> > >>>>>>
> > >>>>>> *German Anders
> > >>>>>> *
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> ceph-users mailing list
> > >>>>>> ceph-users at lists.ceph.com
> > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Wido den Hollander
> > >>>>> 42on B.V.
> > >>>>> Ceph trainer and consultant
> > >>>>>
> > >>>>> Phone: +31 (0)20 700 9902
> > >>>>> Skype: contact42on
> > >>>>> _______________________________________________
> > >>>>> ceph-users mailing list
> > >>>>> ceph-users at lists.ceph.com
> > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>
> > >>> --
> > >>> Christian Balzer        Network/Systems Engineer
> > >>> chibi at gol.com     Global OnLine Japan/Fusion Communications
> > >>> http://www.gol.com/
> > >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users at lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi at gol.com           Global OnLine Japan/Fusion Communications
> http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140801/59af1b9e/attachment.htm>