Hi Daniel, Ok, if gluster can't self-heal from this situation, I hope at least I can manually restore the volume by using the good brick available. So would you please tell me how can I "simply rebuild the filesystem and let gluster attempt to restore it from a *clean* filesystem"? Many thanks. Liang On Wed, Jan 9, 2013 at 9:04 AM, Daniel Taylor <dtaylor at vocalabs.com> wrote: > It seems to me that what you need to do is replace the failed brick, or > simply rebuild the filesystem and let gluster attempt to restore it from a > *clean* filesystem. > > I haven't seen anywhere that allows gluster to actually change the > replication count on a live cluster, which is what you seem to be > requesting. > > > On 01/09/2013 07:57 AM, Liang Ma wrote: > >> Todd, >> >> Thanks for your reply. But how can I take this brick offline? Since the >> gluster volume has replicate count 2, it won't allow me to remove one >> brick. Is there a command which can take one replicate brick offline? >> >> Many thanks. >> >> Liang >> >> >> On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca<mailto: >> pfaff at rhpcs.mcmaster.**ca <pfaff at rhpcs.mcmaster.ca>>> wrote: >> >> Liang, >> >> I don't claim to know the answer to your question, and my >> knowledge of zfs >> is minimal at best so I may be way off base here, but it seems to >> me that >> your attempted random corruption with this command: >> >> >> dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480 >> >> is likely going to corrupt the underlying zfs filesystem metadata, not >> just file data, and I wouldn't expect gluster to be able to fixed a >> brick's corrupted filesystem. Perhaps you now have to take the brick >> offline, fix any zfs filesystem errors if possible, bring the >> brick back >> online and see what then happens with self-heal. >> >> -- >> Todd Pfaff <pfaff at mcmaster.ca <mailto:pfaff at mcmaster.ca>> >> >> http://www.rhpcs.mcmaster.ca/ >> >> >> On Tue, 8 Jan 2013, Liang Ma wrote: >> >> Hi There, >> >> I'd like to test and understand the self heal feature of >> glusterfs. This is >> what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 LTS. >> >> gluster volume create gtest replica 2 gluster3:/zfs-test >> gluster4:/zfs-test >> where zfs-test is a zfs pool on partition /dev/sda6 in both nodes. >> >> To simulate a random corruption on node gluster3 >> >> dd if=/dev/urandom of=/dev/sda6 bs=1024 count=20480 >> >> Now zfs detected the corrupted files >> >> pool: zfs-test >> state: ONLINE >> status: One or more devices has experienced an error resulting >> in data >> corruption. Applications may be affected. >> action: Restore the file in question if possible. Otherwise >> restore the >> entire pool from backup. >> see: http://zfsonlinux.org/msg/ZFS-**8000-8A<http://zfsonlinux.org/msg/ZFS-8000-8A> >> scan: none requested >> config: >> >> NAME STATE READ WRITE CKSUM >> zfs-test ONLINE 0 0 2.29K >> sda6 ONLINE 0 0 4.59K >> >> errors: Permanent errors have been detected in the following >> files: >> >> /zfs-test/<xattrdir>/trusted.**gfid >> /zfs-test/.glusterfs/b0/1e/** >> b01ec17c-14cc-4999-938b-**b4a71e358b46 >> /zfs-test/.glusterfs/b0/1e/** >> b01ec17c-14cc-4999-938b-**b4a71e358b46/<xat >> trdir>/trusted.gfid >> /zfs-test/.glusterfs/dd/8c/** >> dd8c6797-18c3-4f3b-b1ca-**86def2b578c5/<xat >> trdir>/trusted.gfid >> >> Now the gluster log file shows the self heal can't fix the >> corruption >> [2013-01-08 12:46:03.371214] W >> [afr-common.c:1196:afr_detect_**self_heal_by_iatt] >> 2-gtest-replicate-0: >> /K.iso: gfid different on subvolume >> [2013-01-08 12:46:03.373539] E >> [afr-self-heal-common.c:1419:**afr_sh_common_lookup_cbk] >> 2-gtest-replicate-0: >> Missing Gfids for /K.iso >> [2013-01-08 12:46:03.385701] E >> [afr-self-heal-common.c:2160:**afr_self_heal_completion_cbk] >> 2-gtest-replicate-0: background gfid self-heal failed on /K.iso >> [2013-01-08 12:46:03.385760] W [fuse-bridge.c:292:fuse_entry_** >> cbk] >> 0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data available) >> >> where K.iso is one of the sample files affected by the dd command. >> >> So could anyone tell me what is the best way to repair the >> simulated >> corruption? >> >> Thank you. >> >> Liang >> >> >> >> >> >> ______________________________**_________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users> >> > > -- > Daniel Taylor VP Operations Vocal Laboratories, Inc > dtaylor at vocalabs.com 612-235-5711 > > ______________________________**_________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.**org/mailman/listinfo/gluster-**users<http://supercolony.gluster.org/mailman/listinfo/gluster-users> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130109/178370cf/attachment.html>