Hi Martin, I will respond to this email later today after reading
the entire thread.
I really want to understand the issue and help you out. We always
have heated discussions even in our labs. We only take it
positively :) Your feedback is very valuable to us.
Thanks and Regards,
--
Anand Babu Periasamy
GPG Key ID: 0x62E15A31
Blog [http://ab.freeshell.org]
The GNU Operating System [http://www.gnu.org]
Z RESEARCH Inc [http://www.zresearch.com]
Martin Fick wrote:
--- Anand Babu Periasamy <ab@xxxxxxxxxx> wrote:
If application doesn't use locking in a multi-user
mode, data can be corrupted with or without AFR.
With AFR in place, corruption can also result in
disparate set of data, other than losing the order
of writes. No file system can guarantee integrity,
if applications do not synchronize writes in
multiuser mode.
No other (non-buggy) posix filesystem would ever
return two different results for the same read without
a write in between (and then potentially do the same
again without a write!). It simply violates posix
(and most other filesystem) semantics. This is not a
case of corruption. I do not want to belabor the
point, but I am not sure that you are talking about
the same situation as I am, I will repost the details.
Please don't take this the wrong way, but sometimes
details are overlooked in these long threads.
In other words, what prevents conflicts when
client A & B both write to the same file? Could
A's write to subvolume A succeed before B's write
to subvolume A, and at the same time B's write to
subvolume B succeed before A's write to subvolume
B?
The answer I got was a 'yes' this means that now on
subvolume A version 73 of a file may be completely
different than version 73 of the same file on
subvolume B without either of the nodes having failed.
In fact, I imagine this is possible while running AFR
on a single node with both subvolumes on the same node
as AFR if the glusterfsd daemon is running multiple
threads! I imagine this is unlikely, but it might in
fact be more likely since a thread could block right
after writing to the first subvolume giving the second
thread plenty of room to start a new write to both
subvolumes.
I think that many (but probably not enough) people
using AFR understand that split brain situations are
possible when node subvolumes go down. However, I
imagine that most people using AFR think that if they
have fancy resillient hardware with high uptimes and
reliable, possibly even multi-path networking devices
in use with glusterfs that they are not going to
experience a split brain situation unless a node and
or router/switch goes down. What I am describing is
exactly that, split brain under ordinary non hardware
failure conditions, certainly not posix behavior, not
something that could happen with every other
filesystem as you claim.
Even if we introduce atomic writes within AFR,
Again, atomicity is not the issue.
it still doesn't fix application's bugs. It will
only slow down writes for well behaved
applications.
I understand that any solution for this is likely to
hurt performance, although I suggested a solution that
I believe might actually not. I am curious if you
think my "quick-heal" approach would hurt performance?
And, of course, sacrificing certain behaviors for
performance is a common tradeoff that many are willing
to, and should be able to make, but who would
sacrifice reliability if it can be done without
hurting performance?
While I personally hope for a solution to this, I
certainly don't "expect" one, but I really think that
it is important that people are informed about and
understand this potential problem.
Cheers,
-Martin
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ