Giving up [ was: Re: read-subvolume]

joe at julianfamily.org (Joe Julian) · Wed, 10 Jul 2013 11:36:08 -0700

My minimal donation:

On 07/10/2013 04:01 AM, Allan Latham wrote:
> There seems to be a problem with the way gluster is going.
> For me it would be an ideal solution if it actually worked.
Actually working is always the ideal. Actually working for all possible 
use cases... may be a little more difficult (though still ideal).
> I have a simple scenario and it just simply doesn't work. Reading over
> the network when the file is available locally is plainly wrong. Our
> application cannot take the performance hit nor the extra network traffic.
It's not "wrong" just not the way you envision it.

Typically, in a scaled scenario where clustered storage has the 
strongest advantage, you'll have a limited number of storage servers and 
a much greater number of application servers. The likelihood that any of 
those application servers is going to have the file they want locally, 
even if they're shared-use, is pretty slim. Engineering for that 
probability is the "correct" solution in that use case.
> I would suggest:
>
> 1. get a simple minimalist configuration working - 2 hosts and
> replication only.
> 2. make it bomb-proof.
> 2a. it must cope with network failures, random reboots etc.
> 2b. if it stops it has to auto-recover quickly.
So far, all done within reasonable parameters. "bomb proof" is an 
obvious exaggeration and is unattainable. If you literally blow up all 
your servers, you're going to lose data.
> 2c. if it can't it needs thorough documentation and adequate logs so a
> reasonable sysop can rescue it.
Define "reasonable sysop". Correcting from any failure that isn't 
automatic is going to require a certain amount of understanding about 
clustering, split-brain, and split-brain recovery. That's not your 
typical first-tier sysop, IMHO.
> 2d. it needs a fast validation scanner which verifies that data is where
> it should be and is identical everywhere (md5sum).
md5sum isn't the fastest checksum algorithm.
> 3. make it efficient (read local whenever possible - use rsync
> techniques - remove scalability obstacles so it doesn't get
> exponentially slower as more files are replicated)
See earlier point about scaled systems. Also it does not get 
"exponentially slower as more files are replicated". That would be silly.
> 4. when that works expand to multiple hosts and clever distribution
> techniques.
> (repeat items 2 and 3 in the more complex environment)
>
> If it doesn't work rock solid in a simple scenario it will never work in
> a large scale cluster.
Not necessarily true. That's like comparing Apples to Orchards 
<http://joejulian.name/blog/dont-get-stuck-micro-engineering-for-scale/>.
>
> Until point 3 is reached I cannot use it - which is a great
> disappointment for me as well as the good guys doing the development.
Consider expanding your thinking to bits you have more control over. 
Network latency is probably the biggest. Consider using low-latency 
10Gig cards(1) and switches(2) or infiniband.
>
> Good luck and thanks again
>
> Allan
1) http://www.solarflare.com makes sub microsecond latency adapters that 
can utilize a userspace driver pinned to the cpu doing the request 
eliminating a context switch
2) http://www.aristanetworks.com/en/products/7100t is a 2.5 microsecond 
switch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130710/979665c5/attachment.html>