i am looking for the method how to ramdisk with Ceph , just for test environment, i do not have enough SSD for each osd. but do not how to move osd journal to a tmpfs or ramdisk. hope some one can give some guide. 2014-07-31 8:58 GMT+07:00 Christian Balzer <chibi at gol.com>: > > On Wed, 30 Jul 2014 18:17:16 +0200 Josef Johansson wrote: > > > Hi, > > > > Just chippin in, > > As RAM is pretty cheap right now, it could be an idea to fill all the > > memory slots in the OSDs, bigger chance that the data you've requested > > is actually in ram already then. > > > While that is very, VERY true, it won't help his perceived bad read speeds > much, as they're not really caused by the OSDs per se. > > > You should go with DC S3700 400GB for the journals at least.. > > > That's probably going overboard in the other direction. > While on paper this would be the first model to handle the sequential > write speeds of 3 HDDs, that kind of scenario is pretty unrealistic. > Even with just one client writing they will never reach those speeds due > to FS overhead, parallel writes caused by replication and so forth. > > The only scenario where this makes some sense is one with short, very high > write spikes that can be handled by the journal (both in size and ceph > settings like filestore max/min sync interval), followed by long enough > pauses to scribble the data to the HDDs. > > In the end for nearly all use cases obsessing over high write speeds is a > fallacy, one is much more likely to run out of steam due to IOPS caused by > much smaller transactions. > > What would worry me about the small DC 3500 is the fact that it is only > rated for about 38GB writes/day over 5 years. Now this could be very well > within the deployment parameters, but we don't know. > > A 200GB DC S3700 should be fine here, higher endurance, about 3 times the > speed of the DC 3500 120GB for sequential writes and 8 times for write > IOPS. > > Christian > > > Cheers, > > Josef > > > > On 30/07/14 17:12, Christian Balzer wrote: > > > On Wed, 30 Jul 2014 10:50:02 -0400 German Anders wrote: > > > > > >> Hi Christian, > > >> How are you? Thanks a lot for the answers, mine in red. > > >> > > > Most certainly not in red on my mail client... > > > > > >> --- Original message --- > > >>> Asunto: Re: [ceph-users] Using Ramdisk wi > > >>> De: Christian Balzer <chibi at gol.com> > > >>> Para: <ceph-users at lists.ceph.com> > > >>> Cc: German Anders <ganders at despegar.com> > > >>> Fecha: Wednesday, 30/07/2014 11:42 > > >>> > > >>> > > >>> Hello, > > >>> > > >>> On Wed, 30 Jul 2014 09:55:49 -0400 German Anders wrote: > > >>> > > >>>> Hi Wido, > > >>>> > > >>>> How are you? Thanks a lot for the quick response. I > > >>>> know that is > > >>>> heavy cost on using ramdisk, but also i want to try that to see if i > > >>>> could get better performance, since I'm using a 10GbE network with > > >>>> the following configuration and i can't achieve more than 300MB/s of > > >>>> throughput on rbd: > > >>>> > > >>> Testing the limits of Ceph with a ramdisk based journal to see what > > >>> is possible in terms of speed (and you will find that it is > > >>> CPU/protocol bound) is fine. > > >>> Anything resembling production is a big no-no. > > >> Got it, did you try flashcache from facebook or dm-cache? > > > No. > > > > > >>> > > >>> > > >>>> MON Servers (3): > > >>>> 2x Intel Xeon E3-1270v3 @3.5Ghz (8C) > > >>>> 32GB RAM > > >>>> 2x SSD Intel 120G in RAID1 for OS > > >>>> 1x 10GbE port > > >>>> > > >>>> OSD Servers (4): > > >>>> 2x Intel Xeon E5-2609v2 @2.5Ghz (8C) > > >>>> 64GB RAM > > >>>> 2x SSD Intel 120G in RAID1 for OS > > >>>> 3x SSD Intel 120G for Journals (3 SAS disks: 1 SSD > > >>>> Journal) > > >>> You're not telling us WHICH actual Intel SSDs you're using. > > >>> If those are DC3500 ones, then 300MB/s totoal isn't a big surprise > > >>> at all, > > >>> as they are capable of 135MB/s writes at most. > > >> The SSD model is Intel SSDSC2BB120G4 firm D2010370 > > > That's not really an answer, but then again Intel could have chosen > > > model numbers that resemble their product names. > > > > > > That is indeed a DC 3500, so my argument stands. > > > With those SSDs for your journals, much more than 300MB/s per node is > > > simply not possible, never mind how fast or slow the HDDs perform. > > > > > >>> > > >>> > > >>>> 9x SAS 3TB 6G for OSD > > >>> That would be somewhere over 1GB/s in theory, but give file system > > >>> and other overheads (what is your replication level?) that's a very > > >>> theoretical value indeed. > > >> The RF is 2, so perf should be much better, also notice that read > > >> perf is really poor, around 62MB/s... > > >> > > > A replication factor of 2 means that each write is amplified by 2. > > > So half of your theoretical performance is gone already. > > > > > > Do your tests with atop or iostat running on all storage nodes. > > > Determine where the bottleneck is, the journals SSDs or the HDDs or > > > (unlikely) something else. > > > > > > Read performance sucks balls with RBD (at least individually), it can > > > be improved by fondling the readahead value. See: > > > > > > http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/8817 > > > > > > This is something the Ceph developers are aware of and hopefully will > > > address in the future: > > > > https://wiki.ceph.com/Planning/Blueprints/Emperor/Kernel_client_read_ahead_optimization > > > > > > Christian > > > > > >>> > > >>> > > >>> Christian > > >>> > > >>>> 2x 10GbE port (1 for Cluster Network, 1 for Public > > >>>> Network) > > >>>> > > >>>> - 10GbE Switches (1 for Cluster interconnect and 1 for Public > > >>>> network) > > >>>> - Using Ceph Firefly version 0.80.4. > > >>>> > > >>>> The thing is that with fio, rados bench and vdbench > > >>>> tools we > > >>>> only see 300MB/s on writes (rand and seq) with bs of 4m and 16 > > >>>> threads, that's pretty low actually, yesterday i was talking in the > > >>>> ceph irc and i hit with the presentation that someone from Fujitsu > > >>>> do on Frankfurt and also with some mails with some config at 10GbE > > >>>> and he achieve almost 795MB/s and more... i would like to know if > > >>>> possible how to implement that so we could improve our ceph cluster > > >>>> a little bit more, i actually configure the scheduler on the SSD's > > >>>> disks both OS and Journal to [noop] but still didn't notice any > > >>>> improvement. That's why we would like to try RAMDISK on Journals, > > >>>> i've noticed that he implement that on their Ceph cluster. > > >>>> > > >>>> I will really appreciate the help on this. Also if you need me to > > >>>> send you some more information about the Ceph scheme please let me > > >>>> know. Also if someone could share some detail conf info will really > > >>>> help! > > >>>> > > >>>> Thanks a lot, > > >>>> > > >>>> > > >>>> German Anders > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>> --- Original message --- > > >>>>> Asunto: Re: [ceph-users] Using Ramdisk wi > > >>>>> De: Wido den Hollander <wido at 42on.com> > > >>>>> Para: <ceph-users at lists.ceph.com> > > >>>>> Fecha: Wednesday, 30/07/2014 10:34 > > >>>>> > > >>>>> On 07/30/2014 03:28 PM, German Anders wrote: > > >>>>>> > > >>>>>> Hi Everyone, > > >>>>>> > > >>>>>> Anybody is using ramdisk to put > > >>>>>> the Journal on it? If > > >>>>>> so, could > > >>>>>> you please share the commands to implement that? since I'm having > > >>>>>> some issues with that and want to test that out to see if i could > > >>>>>> get better > > >>>>>> performance. > > >>>>> Don't do this. When you loose the journal, you loose the OSD. So a > > >>>>> reboot of the machine effectively trashes the data on that OSD. > > >>>>> > > >>>>> Wido > > >>>>> > > >>>>>> > > >>>>>> > > >>>>>> Thanks in advance, > > >>>>>> > > >>>>>> *German Anders > > >>>>>> * > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> ceph-users mailing list > > >>>>>> ceph-users at lists.ceph.com > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>>> > > >>>>> > > >>>>> -- > > >>>>> Wido den Hollander > > >>>>> 42on B.V. > > >>>>> Ceph trainer and consultant > > >>>>> > > >>>>> Phone: +31 (0)20 700 9902 > > >>>>> Skype: contact42on > > >>>>> _______________________________________________ > > >>>>> ceph-users mailing list > > >>>>> ceph-users at lists.ceph.com > > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>> > > >>> -- > > >>> Christian Balzer Network/Systems Engineer > > >>> chibi at gol.com Global OnLine Japan/Fusion Communications > > >>> http://www.gol.com/ > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users at lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi at gol.com Global OnLine Japan/Fusion Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140801/59af1b9e/attachment.htm>