[Gluster-devel] Fencing FOPs on data-split-brained files

jdarcy at redhat.com (Jeff Darcy) · Tue, 19 Nov 2013 08:49:02 -0500

On 11/16/2013 04:20 AM, Emmanuel Dreyfus wrote:
> Anand Avati <avati at gluster.org> wrote:
>
>> Regarding your concern about complications while healing - we
>> should change our "manual fixing" instructions to:
>>
>> - go to backend, access through gfid path or normal path - rmxattr
>> the afr changelogs - truncate the file to 0 bytes (like ">
>> filename")
>
> What about a adding a gluster command like this? gluster rm path [-r]
> [-b brick|-a] -r recursive -b specify brick, use relative path to
> find if not specified -a remove on all other bricks
>
> That could even help removing files in non split-brain scenarios,
> where it takes age because rm -rf goes through several FOPs for each
> file.

Actually, it might be even better if we had a command to bless/promote a
single copy of the file and delete/truncate *all others* instead of
having to do them one at a time.  I submitted a patch a while ago to do
something like this.

	http://review.gluster.org/#/c/4132/

This is going to be strictly necessary some day, as we stop recommending
hard-to-support direct modification of brick contents in favor of
in-line mechanisms.  Eventually we might even use SELinux or similar to
preclude back-end twiddling altogether.

The one thing that's a bit awkward about both your suggestion and mine
is how bricks are identified.  Identifying bricks by their internal
translator names is *totally* wrong, but identifying them by server:port
is barely any better since those relationships can change.  Ideally, a
brick would always be identified by a unique ID, with its current
location as an attribute.  This includes not just CLI and logs, but also
things like xattr names used by AFR (and DHT if/when some other issues
with how we store layout information are addressed).  IMO fixing brick
identification is necessary before we can provide a proper heal/rm function.