Re: Designing a cluster guide

Tomasz Paszkowski <ss7pro@xxxxxxxxx> · Mon, 21 May 2012 17:07:25 +0200

Another great thing that should be mentioned is:
https://github.com/facebook/flashcache/. It gives really huge
performance improvements for reads/writes (especialy on FunsionIO
drives) event without using librbd caching :-)

On Sat, May 19, 2012 at 6:15 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
> Hi,
>
> For your journal , if you have money, you can use
>
> stec zeusram ssd drive. (around 2000€ /8GB / 100000 iops read/write with 4k block).
> I'm using them with zfs san, they rocks for journal.
> http://www.stec-inc.com/product/zeusram.php
>
> another interessesting product is ddrdrive
> http://www.ddrdrive.com/
>
> ----- Mail original -----
>
> De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>
> À: "Gregory Farnum" <greg@xxxxxxxxxxx>
> Cc: ceph-devel@xxxxxxxxxxxxxxx
> Envoyé: Samedi 19 Mai 2012 10:37:01
> Objet: Re: Designing a cluster guide
>
> Hi Greg,
>
> Am 17.05.2012 23:27, schrieb Gregory Farnum:
>>> It mentions for example "Fast CPU" for the mds system. What does fast
>>> mean? Just the speed of one core? Or is ceph designed to use multi core?
>>> Is multi core or more speed important?
>> Right now, it's primarily the speed of a single core. The MDS is
>> highly threaded but doing most things requires grabbing a big lock.
>> How fast is a qualitative rather than quantitative assessment at this
>> point, though.
> So would you recommand a fast (more ghz) Core i3 instead of a single
> xeon for this system? (price per ghz is better).
>
>> It depends on what your nodes look like, and what sort of cluster
>> you're running. The monitors are pretty lightweight, but they will add
>> *some* load. More important is their disk access patterns — they have
>> to do a lot of syncs. So if they're sharing a machine with some other
>> daemon you want them to have an independent disk and to be running a
>> new kernel&glibc so that they can use syncfs rather than sync. (The
>> only distribution I know for sure does this is Ubuntu 12.04.)
> Which kernel and which glibc version supports this? I have searched
> google but haven't found an exact version. We're using debian lenny
> squeeze with a custom kernel.
>
>>> Regarding the OSDs is it fine to use an SSD Raid 1 for the journal and
>>> perhaps 22x SATA Disks in a Raid 10 for the FS or is this quite absurd
>>> and you should go for 22x SSD Disks in a Raid 6?
>> You'll need to do your own failure calculations on this one, I'm
>> afraid. Just take note that you'll presumably be limited to the speed
>> of your journaling device here.
> Yeah that's why i wanted to use a Raid 1 of SSDs for the journaling. Or
> is this still too slow? Another idea was to use only a ramdisk for the
> journal and backup the files while shutting down to disk and restore
> them after boot.
>
>> Given that Ceph is going to be doing its own replication, though, I
>> wouldn't want to add in another whole layer of replication with raid10
>> — do you really want to multiply your storage requirements by another
>> factor of two?
> OK correct bad idea.
>
>>> Is it more useful the use a Raid 6 HW Controller or the btrfs raid?
>> I would use the hardware controller over btrfs raid for now; it allows
>> more flexibility in eg switching to xfs. :)
> OK but overall you would recommand running one osd per disk right? So
> instead of using a Raid 6 with for example 10 disks you would run 6 osds
> on this machine?
>
>>> Use single socket Xeon for the OSDs or Dual Socket?
>> Dual socket servers will be overkill given the setup you're
>> describing. Our WAG rule of thumb is 1GHz of modern CPU per OSD
>> daemon. You might consider it if you decided you wanted to do an OSD
>> per disk instead (that's a more common configuration, but it requires
>> more CPU and RAM per disk and we don't know yet which is the better
>> choice).
> Is there also a rule of thumb for the memory?
>
> My biggest problem with ceph right now is the awful slow speed while
> doing random reads and writes.
>
> Sequential read and writes are at 200Mb/s (that's pretty good for bonded
> dual Gbit/s). But random reads and write are only at 0,8 - 1,5 Mb/s
> which is def. too slow.
>
> Stefan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
> --
>
> --
>
>
>
>
>        Alexandre D erumier
> Ingénieur Système
> Fixe : 03 20 68 88 90
> Fax : 03 20 68 90 81
> 45 Bvd du Général Leclerc 59100 Roubaix - France
> 12 rue Marivaux 75002 Paris - France
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Tomasz Paszkowski
SS7, Asterisk, SAN, Datacenter, Cloud Computing
+48500166299
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html