self-heal failed

dtaylor at vocalabs.com (Daniel Taylor) · Wed, 09 Jan 2013 08:04:07 -0600

It seems to me that what you need to do is replace the failed brick, or 
simply rebuild the filesystem and let gluster attempt to restore it from 
a *clean* filesystem.

I haven't seen anywhere that allows gluster to actually change the 
replication count on a live cluster, which is what you seem to be 
requesting.

On 01/09/2013 07:57 AM, Liang Ma wrote:
> Todd,
>
> Thanks for your reply. But how can I take this brick offline? Since 
> the gluster volume has replicate count 2, it won't allow me to remove 
> one brick. Is there a command which can take one replicate brick offline?
>
> Many thanks.
>
> Liang
>
>
> On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca 
> <mailto:pfaff at rhpcs.mcmaster.ca>> wrote:
>
>     Liang,
>
>     I don't claim to know the answer to your question, and my
>     knowledge of zfs
>     is minimal at best so I may be way off base here, but it seems to
>     me that
>     your attempted random corruption with this command:
>
>
>       dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
>
>     is likely going to corrupt the underlying zfs filesystem metadata, not
>     just file data, and I wouldn't expect gluster to be able to fixed a
>     brick's corrupted filesystem.  Perhaps you now have to take the brick
>     offline, fix any zfs filesystem errors if possible, bring the
>     brick back
>     online and see what then happens with self-heal.
>
>     --
>     Todd Pfaff <pfaff at mcmaster.ca <mailto:pfaff at mcmaster.ca>>
>     http://www.rhpcs.mcmaster.ca/
>
>
>     On Tue, 8 Jan 2013, Liang Ma wrote:
>
>         Hi There,
>
>         I'd like to test and understand the self heal feature of
>         glusterfs. This is
>         what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.
>
>         gluster volume create gtest replica 2 gluster3:/zfs-test
>         gluster4:/zfs-test
>         where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.
>
>         To simulate a random corruption on node gluster3
>
>         dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
>
>         Now zfs detected the corrupted files
>
>           pool: zfs-test
>          state: ONLINE
>         status: One or more devices has experienced an error resulting
>         in data
>                 corruption.  Applications may be affected.
>         action: Restore the file in question if possible.  Otherwise
>         restore the
>                 entire pool from backup.
>            see: http://zfsonlinux.org/msg/ZFS-8000-8A
>          scan: none requested
>         config:
>
>                 NAME        STATE     READ WRITE CKSUM
>                 zfs-test   ONLINE       0     0 2.29K
>                   sda6     ONLINE       0     0 4.59K
>
>         errors: Permanent errors have been detected in the following
>         files:
>
>                 /zfs-test/<xattrdir>/trusted.gfid
>                
>         /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46
>                
>         /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat
>         trdir>/trusted.gfid
>                
>         /zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat
>         trdir>/trusted.gfid
>
>         Now the gluster log file shows the self heal can't fix the
>         corruption
>         [2013-01-08 12:46:03.371214] W
>         [afr-common.c:1196:afr_detect_self_heal_by_iatt]
>         2-gtest-replicate-0:
>         /K.iso: gfid different on subvolume
>         [2013-01-08 12:46:03.373539] E
>         [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk]
>         2-gtest-replicate-0:
>         Missing Gfids for /K.iso
>         [2013-01-08 12:46:03.385701] E
>         [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk]
>         2-gtest-replicate-0: background  gfid self-heal failed on /K.iso
>         [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_cbk]
>         0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)
>
>         where K.iso is one of the sample files affected by the dd command.
>
>         So could anyone tell me what is the best way to repair the
>         simulated
>         corruption?
>
>         Thank you.
>
>         Liang
>
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

-- 
Daniel Taylor             VP Operations       Vocal Laboratories, Inc
dtaylor at vocalabs.com                                     612-235-5711