Re: gluster 3.4 self-heal

Ravishankar N <ravishankar@xxxxxxxxxx> · Tue, 27 May 2014 23:05:42 +0530



    On 05/27/2014 08:47 PM, Ivano Talamo
      wrote:

    
    Dear
      all,
      

      we have a replicated volume (2 servers with 1 brick each) on
      Scientific Linux 6.2 with gluster 3.4.
      

      Everything was running fine until we shutdown of of the two and
      kept it down for 2 months.
      

      When it came up again the volume could be healed and we have the
      following symptoms
      

      (call #1 the always-up server, #2 the server that was kept down):
      

      -doing I/O on the volume has very bad performances (impossible to
      keep VM images on it)
      

    A replica's bricks are not supposed to be intentionally kept down
    even for hours, leave alone months 
        :-(  ; If you do; then when it does come backup,
    there would be tons of stuff to heal, so a performance hit is
    expected.

    -on
      #1 there's 3997354 files on .glusterfs/indices/xattrop/ and the
      number doesn't go down
      

    When #2 was down, did the I/O involve directory renames? (see if
    there are entries on .glusterfs/landfill on #2). If yes then this is
    a known issue and a fix is in progress :
    http://review.gluster.org/#/c/7879/

    
    -on
      #1 gluster volume heal vol1 info the first time takes a lot to end
      and doesn't show nothing.
      

    This is fixed in glusterfs 3.5  where heal info is much more
    responsive.

    after
      that it prints "Another transaction is in progress. Please try
      again after sometime."
      

      Furthermore on #1 glusterhd.log is full of messages like this:
      

      [2014-05-27 15:07:44.145326] W
      [client-rpc-fops.c:1538:client3_3_inodelk_cbk] 0-vol1-client-0:
      remote operation failed: No such file or directory
      

      [2014-05-27 15:07:44.145880] W
      [client-rpc-fops.c:1640:client3_3_entrylk_cbk] 0-vol1-client-0:
      remote operation failed: No such file or directory
      

      [2014-05-27 15:07:44.146070] E
      [afr-self-heal-entry.c:2296:afr_sh_post_nonblocking_entry_cbk]
      0-vol1-replicate-0: Non Blocking entrylks failed for
      <gfid:bfbe65db-7426-4ca0-bf0b-7d1a28de2052>.
      

      [2014-05-27 15:13:34.772856] E
      [afr-self-heal-data.c:1270:afr_sh_data_open_cbk]
      0-vol1-replicate-0: open of
      <gfid:18a358e0-23d3-4f56-8d74-f5cc38a0d0ea> failed on child
      vol1-client-0 (No such file or directory)
      

      On #2 bricks I see some updates, ie. new filenames appearing and
      .glusterfs/indices/xattrop/ is usually empy.
      

      Do you know what's happening? How can we fix this?
      

    You could try a `gluster volume heal vol1 full` to see if the bricks
    get synced.

    
    Regards,

    Ravi

    
      thank you,
      

      Ivano
      

      _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users
    
    
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users