Re: Designing a cluster guide

Sławomir Skowron <szibis@xxxxxxxxx> · Mon, 21 May 2012 23:22:14 +0200

Maybe good for journal will be two cheap MLC Intel drives on Sandforce
(320/520), 120GB or 240GB, and HPA changed to 20-30GB only for
separate journaling partitions with hardware RAID1.

I like to test setup like this, but maybe someone have any real life info ??

On Mon, May 21, 2012 at 5:07 PM, Tomasz Paszkowski <ss7pro@xxxxxxxxx> wrote:
> Another great thing that should be mentioned is:
> https://github.com/facebook/flashcache/. It gives really huge
> performance improvements for reads/writes (especialy on FunsionIO
> drives) event without using librbd caching :-)
>
>
>
> On Sat, May 19, 2012 at 6:15 PM, Alexandre DERUMIER <aderumier@xxxxxxxxx> wrote:
>> Hi,
>>
>> For your journal , if you have money, you can use
>>
>> stec zeusram ssd drive. (around 2000€ /8GB / 100000 iops read/write with 4k block).
>> I'm using them with zfs san, they rocks for journal.
>> http://www.stec-inc.com/product/zeusram.php
>>
>> another interessesting product is ddrdrive
>> http://www.ddrdrive.com/
>>
>> ----- Mail original -----
>>
>> De: "Stefan Priebe" <s.priebe@xxxxxxxxxxxx>
>> À: "Gregory Farnum" <greg@xxxxxxxxxxx>
>> Cc: ceph-devel@xxxxxxxxxxxxxxx
>> Envoyé: Samedi 19 Mai 2012 10:37:01
>> Objet: Re: Designing a cluster guide
>>
>> Hi Greg,
>>
>> Am 17.05.2012 23:27, schrieb Gregory Farnum:
>>>> It mentions for example "Fast CPU" for the mds system. What does fast
>>>> mean? Just the speed of one core? Or is ceph designed to use multi core?
>>>> Is multi core or more speed important?
>>> Right now, it's primarily the speed of a single core. The MDS is
>>> highly threaded but doing most things requires grabbing a big lock.
>>> How fast is a qualitative rather than quantitative assessment at this
>>> point, though.
>> So would you recommand a fast (more ghz) Core i3 instead of a single
>> xeon for this system? (price per ghz is better).
>>
>>> It depends on what your nodes look like, and what sort of cluster
>>> you're running. The monitors are pretty lightweight, but they will add
>>> *some* load. More important is their disk access patterns — they have
>>> to do a lot of syncs. So if they're sharing a machine with some other
>>> daemon you want them to have an independent disk and to be running a
>>> new kernel&glibc so that they can use syncfs rather than sync. (The
>>> only distribution I know for sure does this is Ubuntu 12.04.)
>> Which kernel and which glibc version supports this? I have searched
>> google but haven't found an exact version. We're using debian lenny
>> squeeze with a custom kernel.
>>
>>>> Regarding the OSDs is it fine to use an SSD Raid 1 for the journal and
>>>> perhaps 22x SATA Disks in a Raid 10 for the FS or is this quite absurd
>>>> and you should go for 22x SSD Disks in a Raid 6?
>>> You'll need to do your own failure calculations on this one, I'm
>>> afraid. Just take note that you'll presumably be limited to the speed
>>> of your journaling device here.
>> Yeah that's why i wanted to use a Raid 1 of SSDs for the journaling. Or
>> is this still too slow? Another idea was to use only a ramdisk for the
>> journal and backup the files while shutting down to disk and restore
>> them after boot.
>>
>>> Given that Ceph is going to be doing its own replication, though, I
>>> wouldn't want to add in another whole layer of replication with raid10
>>> — do you really want to multiply your storage requirements by another
>>> factor of two?
>> OK correct bad idea.
>>
>>>> Is it more useful the use a Raid 6 HW Controller or the btrfs raid?
>>> I would use the hardware controller over btrfs raid for now; it allows
>>> more flexibility in eg switching to xfs. :)
>> OK but overall you would recommand running one osd per disk right? So
>> instead of using a Raid 6 with for example 10 disks you would run 6 osds
>> on this machine?
>>
>>>> Use single socket Xeon for the OSDs or Dual Socket?
>>> Dual socket servers will be overkill given the setup you're
>>> describing. Our WAG rule of thumb is 1GHz of modern CPU per OSD
>>> daemon. You might consider it if you decided you wanted to do an OSD
>>> per disk instead (that's a more common configuration, but it requires
>>> more CPU and RAM per disk and we don't know yet which is the better
>>> choice).
>> Is there also a rule of thumb for the memory?
>>
>> My biggest problem with ceph right now is the awful slow speed while
>> doing random reads and writes.
>>
>> Sequential read and writes are at 200Mb/s (that's pretty good for bonded
>> dual Gbit/s). But random reads and write are only at 0,8 - 1,5 Mb/s
>> which is def. too slow.
>>
>> Stefan
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>>
>> --
>>
>>
>>
>>
>>        Alexandre D erumier
>> Ingénieur Système
>> Fixe : 03 20 68 88 90
>> Fax : 03 20 68 90 81
>> 45 Bvd du Général Leclerc 59100 Roubaix - France
>> 12 rue Marivaux 75002 Paris - France
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Tomasz Paszkowski
> SS7, Asterisk, SAN, Datacenter, Cloud Computing
> +48500166299
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
-----
Pozdrawiam

Sławek "sZiBis" Skowron
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html