Re: Replicate/AFR Using Broadcast/Multicast?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/13/2010 01:22 PM, Beat Rubischon wrote:
Hi Gordan!

Quoting<gordan@xxxxxxxxxx>  (13.10.10 10:06):

What sort of a cluster are you running with that many nodes? RHCS?
Heartbeat? Something else entirely? In what arrangement?

High performance clusters. The main target Gluster was made for :-)

I'm curious about your use case. I'm guessing it is mostly dependant on throughput and not particularly sensitive to I/O latency.

Even the most expensive GigE switch chassis could be killed by 125+ MBytes
of traffic which is almost nothing :-)
Sounds like a typical example of cost not being a good measure of
quality and performance. :)

It's simply a technical limit. Think about what broadcast is and how it
passes a switch.

I'm fully aware of that, but if your switching fabric can't handle the full rated bandwidth of the switch, that's pretty poor. Then again, I expect specmanship* everywhere these days and don't believe any figures until I've tested them myself.

In Infiniband...
Sure, but historically in the networking space, non-ethernet
technologies have always been niche, cost ineffective in terms of
price/performance and only had a temporary performance advantage.

Right.  You'll be surprised but the price per port is much lower in the
Infiniband world compared to the 10GigE world. When using GlusterFS inside a
datacenter Infiniband could be a good choice.

Maybe this year. Unlikely to be the case next year.

Right now more storage nodes means slower storage, and that should
really be addressed.

Wrong. Assuming you have a "distribute" concept. 10 clients talks to 5
servers. Storing a file means the client writes the file to one of the
servers. Reading the same. So the bandwidth of each server is accumulated.
With GigE this means you'll have about 600MBytes/s network bandwidth.
Additional servers will add additional bandwidth - as long as you scale not
only servers but also clients. One small exception: The lookup of a file
must be directed to all servers. One of the reasons why GlusterFS is
"better" for a smaller amount of large files as for a large amount of
smaller files.

Multiple lookup causes latency, and latency is already a serious issue on Gluster. I'm talking about the straight replicate case. The number of replicas is inversely proportional to the throughput.

Right when you use a "replicate" concept. Your client has to write to both
members of the replica.

I usually run with server-side replication specifically for that reason - I can have a dedicated VLAN for storage servers with as much network bandwidth I can throw at it. Then I can have the servers sort out the replication overheads between them, rather than needing a multiple of bandwidth to the clients as well.

Additional replicas will consume additional
bandwith. But hey - who needs more then two replicas? BTW: The servers will
never talk to each other. It's always the client who transfers the data.

Unless you use server-side replicate, which is much more manageable and controllable in terms of bandwidth requirements. And trust me, > 2 replicas is useful. I have seen both disks in a RAID1 stripe fail more than once.

The perfect solution is probably a "distribute" over a "replicate". Mirror
the files over two bricks. Use your mirrors to bild a large filesystem with
replicate. Your performance will scale with the amount of bricks but you'll
keep the stability of a fully redundant setup.

Depends on your use case. Sometimes it is more useful to have all the data locally available for read-performance. But in that case write performance goes through the floor with that many replicase. Broadcasting the writes only once would solve it in one fell swoop.

Gordan

*specmanship, n: The art of misrepresenting capabilities of a device for marketting purposes, typically by saying it will do X and Y when it cannot in fact to X and Y at the same time.



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux