self-heal failed

ma.satops at gmail.com (Liang Ma) · Wed, 9 Jan 2013 08:57:36 -0500

Todd,

Thanks for your reply. But how can I take this brick offline? Since the
gluster volume has replicate count 2, it won't allow me to remove one
brick. Is there a command which can take one replicate brick offline?

Many thanks.

Liang

On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca> wrote:

> Liang,
>
> I don't claim to know the answer to your question, and my knowledge of zfs
> is minimal at best so I may be way off base here, but it seems to me that
> your attempted random corruption with this command:
>
>
>   dd if=/dev/urandom of=/dev/**sda6 bs=1024 count=20480
>
> is likely going to corrupt the underlying zfs filesystem metadata, not
> just file data, and I wouldn't expect gluster to be able to fixed a
> brick's corrupted filesystem.  Perhaps you now have to take the brick
> offline, fix any zfs filesystem errors if possible, bring the brick back
> online and see what then happens with self-heal.
>
> --
> Todd Pfaff <pfaff at mcmaster.ca>
> http://www.rhpcs.mcmaster.ca/
>
>
> On Tue, 8 Jan 2013, Liang Ma wrote:
>
>  Hi There,
>>
>> I'd like to test and understand the self heal feature of glusterfs. This
>> is
>> what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS.
>>
>> gluster volume create gtest replica 2 gluster3:/zfs-test
>> gluster4:/zfs-test
>> where zfs-test is a zfs pool on partition /dev/sda6 in both nodes.
>>
>> To simulate a random corruption on node gluster3
>>
>> dd if=/dev/urandom of=/dev/**sda6 bs=1024 count=20480
>>
>> Now zfs detected the corrupted files
>>
>>   pool: zfs-test
>>  state: ONLINE
>> status: One or more devices has experienced an error resulting in data
>>         corruption.  Applications may be affected.
>> action: Restore the file in question if possible.  Otherwise restore the
>>         entire pool from backup.
>>    see: http://zfsonlinux.org/msg/ZFS-**8000-8A<http://zfsonlinux.org/msg/ZFS-8000-8A>
>>  scan: none requested
>> config:
>>
>>         NAME        STATE     READ WRITE CKSUM
>>         zfs-test   ONLINE       0     0 2.29K
>>           sda6     ONLINE       0     0 4.59K
>>
>> errors: Permanent errors have been detected in the following files:
>>
>>         /zfs-test/<xattrdir>/trusted.**gfid
>>         /zfs-test/.glusterfs/b0/1e/**b01ec17c-14cc-4999-938b-**
>> b4a71e358b46
>>         /zfs-test/.glusterfs/b0/1e/**b01ec17c-14cc-4999-938b-**
>> b4a71e358b46/<xat
>> trdir>/trusted.gfid
>>         /zfs-test/.glusterfs/dd/8c/**dd8c6797-18c3-4f3b-b1ca-**
>> 86def2b578c5/<xat
>> trdir>/trusted.gfid
>>
>> Now the gluster log file shows the self heal can't fix the corruption
>> [2013-01-08 12:46:03.371214] W
>> [afr-common.c:1196:afr_detect_**self_heal_by_iatt] 2-gtest-replicate-0:
>> /K.iso: gfid different on subvolume
>> [2013-01-08 12:46:03.373539] E
>> [afr-self-heal-common.c:1419:**afr_sh_common_lookup_cbk]
>> 2-gtest-replicate-0:
>> Missing Gfids for /K.iso
>> [2013-01-08 12:46:03.385701] E
>> [afr-self-heal-common.c:2160:**afr_self_heal_completion_cbk]
>> 2-gtest-replicate-0: background  gfid self-heal failed on /K.iso
>> [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_**cbk]
>> 0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available)
>>
>> where K.iso is one of the sample files affected by the dd command.
>>
>> So could anyone tell me what is the best way to repair the simulated
>> corruption?
>>
>> Thank you.
>>
>> Liang
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130109/6bdc8059/attachment-0001.html>