Re: fstat problems when killing with stat prefetch turned on

Miklós Fokin <miklos.fokin@xxxxxxxxxxxx> · Thu, 4 May 2017 15:29:09 +0200

Sorry, missing lines from the attachment.

    
    

    On 05/04/2017 03:24 PM, Miklós Fokin
      wrote:

    
    Hello,
      

      

      I seem to have discovered what caused half of the problem.
      

      I did update the bug report with a more detailed description, but
      the short version is that the attached diff solves the issue when
      we get an fstat with a size of 0 after killing a brick (not
      letting the first update to fsync be from an arbiter).
      

      My question is: should I make a review about it or should further
      needed changes be investigated first?
      

      

      Best regards,
      

      Miklós
      

      

      

      On 04/26/2017 12:58 PM, Miklós Fokin wrote:
      

      Thanks for the response.
        

        We didn't have the options set that the first two reviews were
        about.
        

        The third was about changes to performance.readdir-ahead.
        

        I turned this feature off today with prefetch being turned on on
        my computer, and the bug still appeared, so I would think that
        the commit would not fix it either.
        

        

        Best regards,
        

        Miklós
        

        

        

        On 04/25/2017 01:26 PM, Raghavendra Gowdappa wrote:
        

        Recently we had worked on some patches
          to ensure correct stats are returned.
          

          

          https://review.gluster.org/15759
          

          https://review.gluster.org/15659
          

          https://review.gluster.org/16419
          

          

          Referring to these patches and bugs associated with them might
          give you some insight into the nature of the problem. The
          major culprit was interaction between readdir-ahead and
          stat-prefetch. So, the issue you are seeing might be addressed
          by these patches.
          

          

          ----- Original Message -----
          

          From: "Miklós Fokin"
            <miklos.fokin@xxxxxxxxxxxx>
            

            To: gluster-devel@xxxxxxxxxxx
            

            Sent: Tuesday, April 25, 2017 3:42:52 PM
            

            Subject:  fstat problems when killing with
            stat prefetch    turned on
            

            

            Hello,
            

            

            I tried reproducing the problem that Mateusz Slupny was
            experiencing
            

            before (stat returning bad st_size value on self-healing) on
            my own
            

            computer with only 3 bricks (one being an arbiter) on
            3.10.0.
            

            The result with such a small setup was that the bug appeared
            both on
            

            killing and during the self-healing process, but only rarely
            (once in
            

            hundreds of tries) and only with performance.stat-prefetch
            turned on.
            

            This might be a completely different issue as on the setup
            Matt was
            

            using, he could reproduce it with the mentioned option being
            off, it
            

            always happened but only during recovery, not after killing.
            

            I did submit a bug report about this:
            

            https://bugzilla.redhat.com/show_bug.cgi?id=1444892.
            

            

            The problem is as Matt wrote is that this causes data
            corruption if one
            

            is to use the returned size on writing.
            

            Could I get some pointers as to what parts of the gluster
            code I should
            

            be looking at to figure out what the problem might be?
            

            

            Thanks in advance,
            

            Miklós
            

            

            _______________________________________________
            

            Gluster-devel mailing list
            

            Gluster-devel@xxxxxxxxxxx
            

            http://lists.gluster.org/mailman/listinfo/gluster-devel
            

          
        
        

      
      

      

      
      

      _______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

diff --git a/xlators/cluster/afr/src/afr-common.c b/xlators/cluster/afr/src/afr-common.c
index ac834e9..d6185ca 100644
--- a/xlators/cluster/afr/src/afr-common.c
+++ b/xlators/cluster/afr/src/afr-common.c
@@ -3318,6 +3318,7 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
         int child_index = (long) cookie;
 	int read_subvol = 0;
 	call_stub_t *stub = NULL;
+	afr_private_t *private = this->private;
 
         local = frame->local;
 
@@ -3327,7 +3328,8 @@ afr_fsync_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
         LOCK (&frame->lock);
         {
                 if (op_ret == 0) {
-                        if (local->op_ret == -1) {
+                        if (local->op_ret == -1 && this->private &&
+                            !AFR_IS_ARBITER_BRICK (private, child_index)) {
 				local->op_ret = 0;
 
                                 local->cont.inode_wfop.prebuf  = *prebuf;
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel