Sorry, missing lines from the attachment.
On 05/04/2017 03:24 PM, Miklós Fokin
wrote:
Hello,
I seem to have discovered what caused half of the problem.
I did update the bug report with a more detailed description, but
the short version is that the attached diff solves the issue when
we get an fstat with a size of 0 after killing a brick (not
letting the first update to fsync be from an arbiter).
My question is: should I make a review about it or should further
needed changes be investigated first?
Best regards,
Miklós
On 04/26/2017 12:58 PM, Miklós Fokin wrote:
Thanks for the response.
We didn't have the options set that the first two reviews were
about.
The third was about changes to performance.readdir-ahead.
I turned this feature off today with prefetch being turned on on
my computer, and the bug still appeared, so I would think that
the commit would not fix it either.
Best regards,
Miklós
On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
Recently we had worked on some patches
to ensure correct stats are returned.
https://review.gluster.org/15759
https://review.gluster.org/15659
https://review.gluster.org/16419
Referring to these patches and bugs associated with them might
give you some insight into the nature of the problem. The
major culprit was interaction between readdir-ahead and
stat-prefetch. So, the issue you are seeing might be addressed
by these patches.
----- Original Message -----
From: "Miklós Fokin"
<miklos.fokin@xxxxxxxxxxxx>
To: gluster-devel@xxxxxxxxxxx
Sent: Tuesday, April 25, 2017 3:42:52 PM
Subject: fstat problems when killing with
stat prefetch turned on
Hello,
I tried reproducing the problem that Mateusz Slupny was
experiencing
before (stat returning bad st_size value on self-healing) on
my own
computer with only 3 bricks (one being an arbiter) on
3.10.0.
The result with such a small setup was that the bug appeared
both on
killing and during the self-healing process, but only rarely
(once in
hundreds of tries) and only with performance.stat-prefetch
turned on.
This might be a completely different issue as on the setup
Matt was
using, he could reproduce it with the mentioned option being
off, it
always happened but only during recovery, not after killing.
I did submit a bug report about this:
https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
The problem is as Matt wrote is that this causes data
corruption if one
is to use the returned size on writing.
Could I get some pointers as to what parts of the gluster
code I should
be looking at to figure out what the problem might be?
Thanks in advance,
Miklós
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel
|
diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index ac834e9..d6185ca 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
int child_index = (long) cookie;
int read_subvol = 0;
call_stub_t *stub = NULL;
+ afr_private_t *private = this->private;
local = frame->local;
@@ -3327,7 +3328,8 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
LOCK (&frame->lock);
{
if (op_ret == 0) {
- if (local->op_ret == -1) {
+ if (local->op_ret == -1 && this->private &&
+ !AFR_IS_ARBITER_BRICK (private, child_index)) {
local->op_ret = 0;
local->cont.inode_wfop.prebuf = *prebuf;
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel