Re: solutions for split brain situation

Stephan von Krawczynski <skraw@xxxxxxxxxx> · Wed, 16 Sep 2009 15:17:31 +0200

On Tue, 15 Sep 2009 20:14:57 -0400
Mark Mielke <mark@xxxxxxxxxxxxxx> wrote:

> On 09/15/2009 07:45 PM, Michael Cassaniti wrote:
> > Don't try bypassing the mountpoint to perform file operations _period_ 
> > . You can always have a replicate mountpoint configured on the server 
> > (i.e. a client for replicate), as well as the server side. NFS should 
> > run on top of this replicate mountpoint. This (poor) graphic may help. 
> > Note that everything is running on the same machine:
> 
> Agree. The main feature of GlusterFS in this regard has to do with read 
> - not write. It is very cool that if GlusterFS with cluster/replicate 
> completely fails, the backing store is still accessible to recover from. 
> However, this is not a license to write or re-write the backing story as 
> we see fit. It should be treated as read-only if it is used at all.
> 
> Note that even read-only does not work if one starts to use the 
> BerkeleyDB storage method, distribute, or stripe.
> 
> In any case - dropping extended attributes for a system that relies on 
> extended attributes is a lossy backup, and should be expected to be 
> invalid if restored into place. Even if the extended attributes are kept 
> in the backup - I think it only decreases the risk, it does not 
> eliminate it. Writes really should not bypass the mount point.
> 
> Cheers,
> mark

Well, Michael, Mark, maybe we should talk more about real setups and less
about theory. In theory everything you say makes sense, and clearly your
"don't do that" approach is clean.
Unfortunately the real world ist not that clean and hardly ever can be bent to
be. But fortunately theory looks at setup that are rare in real world.
Lets make a trivial setup, lots of data for webservers and some ftp servers
for feeding in and deleting old. The first thing in sight: compared to the
reads there are very few writes, mostly sequential logfiles. And another
thing: most of the data does not get read nor written the whole day long.
This is a pretty common example I would say. Since really very few changes are
going on compared to the total amount of stored data you may call the
situation pseudo-static.
What would you expect in that setup? Lets say the bad boys (ftp servers) are
local feeds and not going over glusterfs for some unknown reason.
What do they really do to the data? They delete (the data is gone afterwards,
so there is no problem at all), they write new files. It should be very simple
for glusterfs to detect a local fed new file, because it has no xattribs at
all (assuming every glusterfs-fed file has some (*)). So basically all you
have to do is try to write-lock the file on the backend store, create its
xattribs default, unlock and do a stat for self-healing the other subvolumes -
lets call such a thing "import".
Does that really sound unsolvable? (For simplicity we assume such local feeds
only on the first subvolume, and the cluster being replicate)

(*) IF not every glusterfs file has xattribs then "import" is even simpler and
can be done by just stat'ing. This case sounds pretty automagically happening
on first touching of the new file over glusterfs mountpoint.

Another story: the backup 
I am pretty astonished that you all talk about backuping the xattribs. But
according to your own clean philosophy there should be no problem for backups
without xattribs as long as they are read in from the glusterfs mountpoint.
Since other applications do not honor the xattribs either that can only mean
that a backup must be a complete snapshot without them.
Backup with xattribs in this sense can only be useful at all if read local
from the backend store to be able to recover that backend later on - including
the information hidden in the xattribs. But since you would not want to deal
with local data at all this should be no backup method at all.
Even from my bad boy position I would not backup xattribs via local feed. The
reason for me lies in restore. If I local-restore a file without xattribs
I give glusterfs a realistic change to notice that this is a local fed file and
should probably be handled like discussed above ("import"). But if I
local-restore a file with xattribs it is likely that these contain a currently
invalid state. My guess is that this will harm glusterfs more than not having
xattribs for the file at all because there is possibly no good way to find out
the invalid state.
So how is topic xattribs linked to topic backup according to your opinion?

-- 
Regards,
Stephan