Re: trusted.glusterfs.version xattr

Kevan Benson <kbenson@xxxxxxxxxxxxxxx> · Tue, 06 May 2008 15:38:41 -0700

Gordan Bobic wrote:
I suspect this isn't a problem that can be solved without having a 
proper journal of metadata per directory, so that upon connection, the 
whole journal can be replayed.

You could sort of bodge it and use timestamps as the primary version and 
the xattr version as secondary, bit that is no less dangerous - it only 
takes one machine to be out of sync, and we are again looking at massive 
scope for data loss.

You could bodge the bodge further to work around this by ensuring that 
the nodes are heartbeating current times to sync between them and 
without the sync no data exchange takes place. But that then complicates 
things because what do you do when a node connects and is out of sync, 
but in the future? Who wins on time sync? Who has the latest 
authoritative copy?

I think the most sane way of addressing this is to have a fully logged 
directory metadata journal. But then we are back to the journalling for 
fast updates issue with a journal shadow volume, which is non-trivial to 
implement.

Unless there is some kind of a major mitigating circumstance, it seems 
that between this and the race condition that Martin is talking about on 
the other thread, GlusterFS in it's current is just too dangerous to use 
in most environments that I can think of. And unlike Gareth a few days 
ago, I'm not talking about performance issues - I'm talking about scope 
for data loss in very valid and very common use cases. :'(

Hmm, what about trusted.glusterfs.createtime (epoch time) as a major 
version number, and trusted.glusterfs.version as the minor version 
number.  Couple that with a glusterfs master time node (defaults to lock 
node) and you should have a fairly consistent cluster, right?

I seem to remember toying with this idea a few months ago on this list, 
for a different problem.  I can't recall whether it was shot full of 
holes at that time (and I guess I'm too lazy to search the archives 
before posting ;)

--

-Kevan Benson
-A-1 Networks