Oops, you are right. For entry operations the current version of the
parent directory is not checked, just to avoid this problem.
This means that mkdir will be sent to all alive subvolumes. However it
still selects the group of answers that have a minimum quorum equal or
greater than #bricks - redundancy. So it should be still valid.
Xavi
On 01/06/16 06:51, Pranith Kumar Karampuri wrote:
Xavi,
But if we keep winding only to good subvolumes, there is a case
where bad subvolumes will never catch up right? i.e. if we keep creating
files in same directory and everytime self-heal completes there are more
entries mounts would have created on the good subvolumes alone. I think
I must have missed this in the reviews if this is the current behavior.
It was not in the earlier releases. Right?
Pranith
On Tue, May 31, 2016 at 2:17 PM, Raghavendra G <raghavendra@xxxxxxxxxxx
<mailto:raghavendra@xxxxxxxxxxx>> wrote:
On Tue, May 31, 2016 at 12:37 PM, Xavier Hernandez
<xhernandez@xxxxxxxxxx <mailto:xhernandez@xxxxxxxxxx>> wrote:
Hi,
On 31/05/16 07:05, Raghavendra Gowdappa wrote:
+gluster-devel, +Xavi
Hi all,
The context is [1], where bricks do pre-operation checks
before doing a fop and proceed with fop only if pre-op check
is successful.
@Xavi,
We need your inputs on behavior of EC subvolumes as well.
If I understand correctly, EC shouldn't have any problems here.
EC sends the mkdir request to all subvolumes that are currently
considered "good" and tries to combine the answers. Answers that
match in return code, errno (if necessary) and xdata contents
(except for some special xattrs that are ignored for combination
purposes), are grouped.
Then it takes the group with more members/answers. If that group
has a minimum size of #bricks - redundancy, it is considered the
good answer. Otherwise EIO is returned because bricks are in an
inconsistent state.
If there's any answer in another group, it's considered bad and
gets marked so that self-heal will repair it using the good
information from the majority of bricks.
xdata is combined and returned even if return code is -1.
Is that enough to cover the needed behavior ?
Thanks Xavi. That's sufficient for the feature in question. One of
the main cases I was interested in was what would be the behaviour
if mkdir succeeds on "bad" subvolume and fails on "good" subvolume.
Since you never wind mkdir to "bad" subvolume(s), this situation
never arises.
Xavi
[1] http://review.gluster.org/13885
regards,
Raghavendra
----- Original Message -----
From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx
<mailto:pkarampu@xxxxxxxxxx>>
To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx
<mailto:rgowdapp@xxxxxxxxxx>>
Cc: "team-quine-afr" <team-quine-afr@xxxxxxxxxx
<mailto:team-quine-afr@xxxxxxxxxx>>, "rhs-zteam"
<rhs-zteam@xxxxxxxxxx <mailto:rhs-zteam@xxxxxxxxxx>>
Sent: Tuesday, May 31, 2016 10:22:49 AM
Subject: Re: dht mkdir preop check, afr and
(non-)readable afr subvols
I think you should start a discussion on gluster-devel
so that Xavi gets a
chance to respond on the mails as well.
On Tue, May 31, 2016 at 10:21 AM, Raghavendra Gowdappa
<rgowdapp@xxxxxxxxxx <mailto:rgowdapp@xxxxxxxxxx>>
wrote:
Also note that we've plans to extend this pre-op
check to all dentry
operations which also depend parent layout. So, the
discussion need to
cover all dentry operations like:
1. create
2. mkdir
3. rmdir
4. mknod
5. symlink
6. unlink
7. rename
We also plan to have similar checks in lock codepath
for directories too
(planning to use hashed-subvolume as lock-subvolume
for directories). So,
more fops :)
8. lk (posix locks)
9. inodelk
10. entrylk
regards,
Raghavendra
----- Original Message -----
From: "Raghavendra Gowdappa"
<rgowdapp@xxxxxxxxxx <mailto:rgowdapp@xxxxxxxxxx>>
To: "team-quine-afr" <team-quine-afr@xxxxxxxxxx
<mailto:team-quine-afr@xxxxxxxxxx>>
Cc: "rhs-zteam" <rhs-zteam@xxxxxxxxxx
<mailto:rhs-zteam@xxxxxxxxxx>>
Sent: Tuesday, May 31, 2016 10:15:04 AM
Subject: dht mkdir preop check, afr and
(non-)readable afr subvols
Hi all,
I have some queries related to the behavior of
afr_mkdir with respect to
readable subvols.
1. While winding mkdir to subvols does afr check
whether the subvolume is
good/readable? Or does it wind to all subvols
irrespective of whether a
subvol is good/bad? In the latter case, what if
a. mkdir succeeds on non-readable subvolume
b. fails on readable subvolume
What is the result reported to higher layers
in the above scenario? If
mkdir is failed, is it cleaned up on
non-readable subvolume where it
failed?
I am interested in this case as dht-preop check
relies on layout xattrs
and I
assume layout xattrs in particular (and all
xattrs in general) are
guaranteed to be correct only on a readable
subvolume of afr. So, in
essence
we shouldn't be winding down mkdir on
non-readable subvols as whatever
the
decision brick makes as part of pre-op check is
inherently flawed.
regards,
Raghavendra
--
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-devel
--
Raghavendra G
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
http://www.gluster.org/mailman/listinfo/gluster-devel
--
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel