Giving up [ was: Re: read-subvolume]

jdarcy at redhat.com (Jeff Darcy) · Wed, 10 Jul 2013 08:58:10 -0400

On 07/10/2013 07:01 AM, Allan Latham wrote:
> I have a simple scenario and it just simply doesn't work. Reading over
> the network when the file is available locally is plainly wrong. Our
> application cannot take the performance hit nor the extra network traffic.

Another victim of our release process.  :(  Code was added to choose the local 
subvolume whenever possible in *June 2012* (commit 0baa65b6).  Further fixes 
and related changes, including a user-submitted patch to force this choice for 
sites with more complex needs, have gone in since then.  None of them have made 
it into a release yet, since 3.4 is still in beta and the changes have not been 
backported into 3.3.anything (including 3.3.1 which I see you were using).  All 
I can offer is an apology.

> 1. get a simple minimalist configuration working - 2 hosts and
> replication only.
> 2. make it bomb-proof.
> 2a. it must cope with network failures, random reboots etc.
> 2b. if it stops it has to auto-recover quickly.
> 2c. if it can't it needs thorough documentation and adequate logs so a
> reasonable sysop can rescue it.

This is one of my own pet peeves.  I will personally be working on the 
internals documentation soon, so users will at least have a chance of 
understanding what the often-cryptic log messages really mean.  Improvements to 
logging, event reporting, and so on are also ongoing, albeit slowly and not 
under my direct purview.

> 2d. it needs a fast validation scanner which verifies that data is where
> it should be and is identical everywhere (md5sum).

How fast is fast?  What would be an acceptable time for such a scan on a volume 
containing (let's say) ten million files?

> 3. make it efficient (read local whenever possible - use rsync
> techniques - remove scalability obstacles so it doesn't get
> exponentially slower as more files are replicated)

Can you explain "exponentially"?  The time for a full scan should increase 
*linearly* with number of files.  That's bad enough, and it's why we're 
starting to get away from reliance on full scans in favor of logging or 
journaling approaches, but if you're seeing exponential behavior then something 
is amiss.

> 4. when that works expand to multiple hosts and clever distribution
> techniques.

That would be a fine sentiment for a new project, but it's not really an option 
when there are already thousands of users relying on the "clever distribution 
techniques" and many other features in production.  We do have to fix their 
bugs too, so we can't devote all of our resources to improving or 
reimplementing replication.  Believe me, I wish we could.

Thank you for your constructive feedback.  I hope that we can use it to make 
things better for everyone.