Re: deployment architecture practices / new ideas?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 06.11.2013 15:05, schrieb Gautam Saxena:
> We're looking to deploy CEPH on about 8 Dell servers to start, each of
> which typically contain 6 to 8 harddisks with Perc RAID controllers which
> support write-back cache (~512 MB usually). Most machines have between 32
> and 128 GB RAM. Our questions are as follows. Please feel free to comment
> on even just one of the questions below if that's the area of your
> expertise/interest.
> 
> 
>    1. Based on various "best practice" guides, they suggest putting the OS
>    on a separate disk. But, we though that would not be good because we'd
>    sacrifice a whole disk on each machine (~3 TB) or even two whole disks (~6
>    TB) if we did a hardware RAID 1 on it. So, do people normally just
>    sacrifice one whole disk? Specifically, we came up with this idea:
>       1. We set up all hard disks as "pass-through" in the raid controller,
>       so that the RAID controller's cache is still in effect, but the OS sees
>       just a bunch of disks (6 to 8 in our case)
>       2. We then do a SOFTWARE-baised RAID 1 (using Centos 6.4) for the OS
>       across all 6 to 8 hardisks
>       3. We then do a SOFTWARE-baised RAID 0 (using Centos 6.4) for the
>       SWAP space.
>       4. *Does anyone see any flaws in our idea above? We think that RAID 1
>       is not computationally expensive for the machines to computer,
> and most of
>       the time, the OS should be in RAM. Similarly, we think RAID 0 should be
>       easy for the CPU to compute, and hopefully, we won't hit much SWAP if we
>       have enough RAM. And this way, we don't sacrific 1 or 2 whole disks for
>       just the OS.*

Why not simply using smaller disks for the system?

That is what we do: use e.g. 500G 2.5" disks (e.g. WB VelociRaptor) for
the root system, if needed put these disks into a RAID1 HW-Raid. I would
always prefer hw- oder sw raid.

You could use the not needed space on these disks also for the OSD
journals if you use HDDs with 10.000 RPM.

>    2. Based on the performance benchmark blog of Marc Nelson (
>    http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/),
>    has anything substantially changed since then? Specifically, it suggests
>    that SSDs may not be really necessary if one has raid controllers with
>    write-back cache. Is this still true even though the article was written
>    with a version of CEPH that was over 1 year old? (Marc suggests that things
>    may change with newer versions of CEPH)
>    3. Based on our understanding, it would seem that CEPH can deliver very
>    high throughput performance (especially for reads) if dozens and dozeons of
>    hard disks are being accessed simultaneously across multiple machines. So,
>    we could have several GBs throughput, right? (CEPH never advertises the
>    advantage of read throughput with distributed architecture, so I'm
>    wondering if I'm missing something.) If so, then is it reasonable to assume
>    that one common bottleneck is the ethernet? So if we only use 1 NIC card at
>    1 GBs, that'll be a major bottleneck? If so, we're thinking of trying to
>    "bond" multiple 1 GB/s ethernet cards to make a "bonded" ethernet
>    connection of 4 GBs (4 * 1 GB/s). 

Even these 4GB could be easily your bottleneck. That depends on you
workload. Especially if you use separated networks for the clients and
the cluster OSD backend (replication, backfill, recovery).


>    But we didn't see anyone discuss this
>    strategy? Is there any holes in it? Or does CEPH "automatically" take
>    advantage of multiple NIC cards without us having to deal with the
>    complexity (and expense of buying a new switch which supports bonding) for
>    doing bonding? That is, is it possible and a good idea to have CEPH OSDs be
>    set up to use specific NICs, so that we spread the load? (We read through
>    the recommendation of having different NICs for front-end traffic vs
>    back-end traffic, but we're not worried about network attacks -- so we're
>    thinking that just creating a "big" fat ethernet pipe gives us the most
>    flexibility.)

Depending on your budget it may makes sense to use 10G cards instead.

Separated traffic networkss isn't only about DDoS it's also to make sure
your replication traffic doesn't affect your client traffic and vice versa.

Danny
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux