Hi,
Based on an internal discussion we had, I am putting
forward some points on the proposed changes:
lookup: For files in data split-brain (DSB), allow
lookup to succeed and return the inode attributes (struct
iatt) from the file which has the bigger size.
For files in metadata split-brain (MSB), allow
lookup to succeed and use the below resolution:
Mismatching attribute |
Resolution: |
time(a_time/m_time/c_time)
|
return the one which has newest m_time |
uid/ gid |
return the uid/gid as root:root so that
further FOPS will fail due to lack of permission |
nlink |
return the bigger of the values |
file permission (st_mode) |
return AND of the file permissions. |
For files in entry split-brain (ESB), lookup has
to fail.
Note that if lookup gets called before the other FOPS,
then the above is the expected behaviour. If it doesn't
(due to caching, or the split brain occurring after lookup
happens etc),
then we need to define what happens on each FOP:
stat: If file is in split-brain, send stat to all
subvolumes,and perform the same steps as done in lookup
(i.e. perform same checks as above).
write: Allow writes to go through irrespective of
the type of split-brain. This is in marked difference with
the current behaviour where we disallow writes to DSB
files.The rationale is that the write could include a
truncate to zero, which is a valid use case for resolving
the split-brained file if the user wishes to do so.
read: Do not allow reads irrespective of the type
of split-brain. This would serve as a indication to user
that file is in split-brain.
get(f)attr: For DSB, allow it.
For MSB, Don't allow.
set(f)attr: For DSB and MSB, allow it.
touch (create), hardlink, softlink, rename, chown,
chmod, unlink:Allow the operation for all type of
split-brains
Forcing look ups to occur for readdirp:If a directory is
in split brain and a readdirp is issued, after
getting the entries, AFR needs to check them for
split-brains and for those entries which are in
split-brain,it needs to set the inode to null before
unwinding the reply to the parent xlator. What we are
essentially doing here is downgrading a readdirp to a
readdir, thereby ensuring that a lookup is always
triggered if that file is accessed again.
Thanks,
Ravi
On 12/27/2013 04:40 PM, Ravishankar N wrote:
-------- Original Message --------
On 11/16/2013 01:42 AM, Anand Avati
wrote:
Ravi,
We should not mix up data and entry operation domains,
if a file is in data split brain that should not stop a
user from rename/link/unlink operations on the file.
Regarding your concern about complications while
healing - we should change our "manual fixing"
instructions to:
- go to backend, access through gfid path or normal
path
- rmxattr the afr changelogs
- truncate the file to 0 bytes (like "> filename")
Accessing the path through gfid and truncating to 0
bytes addresses your concerns about hardlinks/renames.
Avati
Resending the
mail again as there was no response
-Ravi
All,
I have tabulated what operations must/ mustn't be permitted in
case of different split brains. Some of the columns are '?' as
I am not sure what the expected behaviour should be. Could we
have this validated?
File
Operation permitted |
Type
of Split Brain |
Data
SB |
Metadata
SB |
Entry SB |
|
|
Same
entry gfid mismatch SB |
Different
entries |
write |
No |
Yes
(currently no) |
No |
Yes |
read |
No |
Yes
(currently no) |
No |
Yes |
getfattr |
Yes |
No |
No |
Yes |
lookup |
? |
? |
No |
Yes |
stat/fstat |
? |
? |
No |
Yes |
setfattr |
Yes |
No |
No |
Yes |
touch |
Yes |
Yes |
No |
Yes |
hard link creation |
Yes |
Yes |
No |
Yes |
soft link creation |
Yes |
Yes |
Yes |
Yes |
rename |
Yes |
Yes |
no |
Yes |
chown |
Yes |
Yes |
Currently
No |
Yes |
chmod |
Yes |
Yes |
Currently
No |
Yes |
unlink |
Yes |
Yes |
Currently
No |
Yes |
readdir |
N/A |
N/A |
? |
? |
- stat() also reports the file size. If a data split-brained
file has different sizes, should stat succeed?
- Likewise if metadata split brain is due to different
access permissions, say one brick has file chmod'ed with 777
and the other brick has it with 744, should we allow
read/write if the corresponding permission bits are *not*
conflciting ? ( as of today they aren't allowed)
Also,In the table above, Entry Split brain has 2 cases-
i) where same entry has different gfids
ii) each brick has different entries for the same directory
(which can cause deleted files to appear in case of
conservative merge).
Should we allow readdir in either case?
Thanks,
Ravi
|