Re: Fencing FOPs on data-split-brained files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Based on an internal discussion we had, I am putting forward some points on the proposed changes:


lookup: For files in data split-brain (DSB), allow lookup to succeed and return the inode attributes (struct iatt) from the file which has the bigger size.
     
        For files in metadata split-brain (MSB), allow lookup to succeed and use the below resolution:
   
Mismatching attribute Resolution:
time(a_time/m_time/c_time) 
return the one which has newest m_time
uid/ gid return the uid/gid as root:root so that further FOPS will fail due to lack of permission
nlink return the bigger of the values
file permission (st_mode) return AND of the file permissions.

         For files in entry split-brain (ESB), lookup has to fail.

     
Note that if lookup gets called before the other FOPS, then the above is the expected behaviour. If it doesn't (due to caching, or the split brain occurring after lookup happens etc),
then we need to define what happens on each FOP:

stat: If file is in split-brain, send stat to all subvolumes,and perform the same steps as done in lookup (i.e. perform same checks as above).

write: Allow writes to go through irrespective of the type of split-brain. This is in marked difference with the current behaviour where we disallow writes to DSB files.The rationale is that the write could include a truncate to zero, which is a valid use case for resolving the split-brained file if the user wishes to do so.

read: Do not allow reads irrespective of the type of split-brain. This would serve as a indication to user that file is in split-brain.

get(f)attr: For DSB, allow it.
            For MSB, Don't allow.        
          
set(f)attr: For DSB and MSB, allow it.
                  
touch (create), hardlink, softlink, rename, chown, chmod, unlink:Allow the operation for all type of split-brains

Forcing look ups to occur for readdirp:If a directory is in split brain and a readdirp is issued, after getting the entries, AFR needs to check them for split-brains and for those entries which are in split-brain,it needs to set the inode to null before unwinding the reply to the parent xlator. What we are essentially doing here is downgrading a readdirp to a readdir, thereby ensuring that a lookup is always triggered if that file is accessed again.

Thanks,
Ravi





On 12/27/2013 04:40 PM, Ravishankar N wrote:



-------- Original Message --------
Subject: Re: Fencing FOPs on data-split-brained files
Date: Tue, 19 Nov 2013 16:03:14 +0530
From: Ravishankar N <ravishankar@xxxxxxxxxx>
To: Anand Avati <avati@xxxxxxxxxxx>
CC: Gluster Devel <gluster-devel@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>


On 11/16/2013 01:42 AM, Anand Avati wrote:
Ravi,
We should not mix up data and entry operation domains, if a file is in data split brain that should not stop a user from rename/link/unlink operations on the file.

Regarding your concern about complications while healing - we should change our "manual fixing" instructions to:

- go to backend, access through gfid path or normal path
- rmxattr the afr changelogs
- truncate the file to 0 bytes (like "> filename")

Accessing the path through gfid and truncating to 0 bytes addresses your concerns about hardlinks/renames.

Avati



Resending the mail again as there was no response
-Ravi

All,

I have tabulated what operations must/ mustn't be permitted in case of different split brains. Some of the columns are '?' as I am not sure what the expected behaviour should be. Could we have this validated?


File Operation permitted Type of Split Brain
Data SB Metadata SB Entry SB


Same entry gfid mismatch SB Different entries
write No Yes (currently no) No Yes
read No Yes (currently no) No Yes
getfattr Yes No No Yes
lookup ? ? No Yes
stat/fstat ? ? No Yes
setfattr Yes No No Yes
touch Yes Yes No Yes
hard link creation Yes Yes No Yes
soft link creation Yes Yes Yes Yes
rename Yes Yes no Yes
chown Yes Yes Currently No Yes
chmod Yes Yes Currently No Yes
unlink Yes Yes Currently No Yes
readdir N/A N/A ? ?

- stat() also reports the file size. If a data split-brained file has different sizes, should stat succeed?
- Likewise if metadata split brain is due to different access permissions, say one brick has file chmod'ed with 777 and the other brick has it with 744, should we allow read/write if the corresponding permission bits are *not* conflciting ? ( as of today they aren't allowed)

Also,In the table above, Entry Split brain has 2 cases-
i) where same entry has different gfids
ii) each brick  has different entries for the same directory (which can cause deleted files to appear in case of conservative merge).
Should we allow readdir in either case?

Thanks,
Ravi

On Wed, Nov 13, 2013 at 3:01 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:
Hi,

Currenly in glusterfs, when there is a data splt-brain (only) on a file, we disallow the following operations from the mount-point by returning EIO to the application:
- Writes to the file (truncate, dd, echo, cp etc)
- Reads to the file (cat)
- Reading extended attributes (getfattr) [1]

However we do permit the following operations:
-creating hardlinks
-creating symlinks
-mv
-setattr
-chmod
-chown
--touch
-ls
-stat

While it makes sense to allow `ls` and `stat`, is it okay to  add checks in the FOPS to disallow the other operations? Allowing creation of links and changing file attributes only seems to complicate things before the admin can go to the backend bricks and resolve the splitbrain (by deleteing all but the healthy copy of the file including hardlinks). More so if the file is renamed before addressing the split-brain.
Please share your thoughs.

Thanks,
Ravi

[1] http://review.gluster.org/#/c/5988/
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users





_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux