self-heal failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Daniel,

Ok, if gluster can't self-heal from this situation, I hope at least I can
manually restore the volume by using the good brick available. So would you
please tell me how can I "simply rebuild the filesystem and let gluster
attempt to restore it from a *clean* filesystem"?

Many thanks.

Liang


On Wed, Jan 9, 2013 at 9:04 AM, Daniel Taylor <dtaylor at vocalabs.com> wrote:

> It seems to me that what you need to do is replace the failed brick, or
> simply rebuild the filesystem and let gluster attempt to restore it from a
> *clean* filesystem.
>
> I haven't seen anywhere that allows gluster to actually change the
> replication count on a live cluster, which is what you seem to be
> requesting.
>
>
> On 01/09/2013 07:57 AM, Liang Ma wrote:
>
>> Todd,
>>
>> Thanks for your reply. But how can I take this brick offline? Since the
>> gluster volume has replicate count 2, it won't allow me to remove one
>> brick. Is there a command which can take one replicate brick offline?
>>
>> Many thanks.
>>
>> Liang
>>
>>
>> On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca<mailto:
>> pfaff at rhpcs.mcmaster.**ca <pfaff at rhpcs.mcmaster.ca>>> wrote:
>>
>>     Liang,
>>
>>     I don't claim to know the answer to your question, and my
>>     knowledge of zfs
>>     is minimal at best so I may be way off base here, but it seems to
>>     me that
>>     your attempted random corruption with this command:
>>
>>
>>       dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
>>
>>     is likely going to corrupt the underlying zfs filesystem metadata, not
>>     just file data, and I wouldn't expect gluster to be able to fixed a
>>     brick's corrupted filesystem.  Perhaps you now have to take the brick
>>     offline, fix any zfs filesystem errors if possible, bring the
>>     brick back
>>     online and see what then happens with self-heal.
>>
>>     --
>>     Todd Pfaff <pfaff at mcmaster.ca <mailto:pfaff at mcmaster.ca>>
>>
>>     http://www.rhpcs.mcmaster.ca/
>>
>>
>>     On Tue, 8 Jan 2013, Liang Ma wrote:
>>
>>         Hi There,
>>
>>         I'd like to test and understand the self heal feature of
>>         glusterfs. This is
>>         what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.
>>
>>         gluster volume create gtest replica 2 gluster3:/zfs-test
>>         gluster4:/zfs-test
>>         where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.
>>
>>         To simulate a random corruption on node gluster3
>>
>>         dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480
>>
>>         Now zfs detected the corrupted files
>>
>>           pool: zfs-test
>>          state: ONLINE
>>         status: One or more devices has experienced an error resulting
>>         in data
>>                 corruption.  Applications may be affected.
>>         action: Restore the file in question if possible.  Otherwise
>>         restore the
>>                 entire pool from backup.
>>            see: http://zfsonlinux.org/msg/ZFS-**8000-8A<http://zfsonlinux.org/msg/ZFS-8000-8A>
>>          scan: none requested
>>         config:
>>
>>                 NAME        STATE     READ WRITE CKSUM
>>                 zfs-test   ONLINE       0     0 2.29K
>>                   sda6     ONLINE       0     0 4.59K
>>
>>         errors: Permanent errors have been detected in the following
>>         files:
>>
>>                 /zfs-test/<xattrdir>/trusted.**gfid
>>                        /zfs-test/.glusterfs/b0/1e/**
>> b01ec17c-14cc-4999-938b-**b4a71e358b46
>>                        /zfs-test/.glusterfs/b0/1e/**
>> b01ec17c-14cc-4999-938b-**b4a71e358b46/<xat
>>         trdir>/trusted.gfid
>>                        /zfs-test/.glusterfs/dd/8c/**
>> dd8c6797-18c3-4f3b-b1ca-**86def2b578c5/<xat
>>         trdir>/trusted.gfid
>>
>>         Now the gluster log file shows the self heal can't fix the
>>         corruption
>>         [2013-01-08 12:46:03.371214] W
>>         [afr-common.c:1196:afr_detect_**self_heal_by_iatt]
>>         2-gtest-replicate-0:
>>         /K.iso: gfid different on subvolume
>>         [2013-01-08 12:46:03.373539] E
>>         [afr-self-heal-common.c:1419:**afr_sh_common_lookup_cbk]
>>         2-gtest-replicate-0:
>>         Missing Gfids for /K.iso
>>         [2013-01-08 12:46:03.385701] E
>>         [afr-self-heal-common.c:2160:**afr_self_heal_completion_cbk]
>>         2-gtest-replicate-0: background  gfid self-heal failed on /K.iso
>>         [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_**
>> cbk]
>>         0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)
>>
>>         where K.iso is one of the sample files affected by the dd command.
>>
>>         So could anyone tell me what is the best way to repair the
>>         simulated
>>         corruption?
>>
>>         Thank you.
>>
>>         Liang
>>
>>
>>
>>
>>
>> ______________________________**_________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>>
>
> --
> Daniel Taylor             VP Operations       Vocal Laboratories, Inc
> dtaylor at vocalabs.com                                     612-235-5711
>
> ______________________________**_________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130109/178370cf/attachment.html>


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux