Liang, I suppose my choice of words was misleading. What I mean is: - unmount the corrupted brick filesystem - try to check and repair the brick filesystem - if repair fails, re-create the filesystem - remount the brick filesystem but, as I said, I'm not very familiar with zfs. Based on my quick glance at some zfs documentation it sounds to me like online zfs check-and-repair may be possible (this is oracle zfs documentation and I have no idea how the linux zfs implementation compares): http://docs.oracle.com/cd/E23823_01/html/819-5461/gbbwa.html but since you're a zfs user you likely already know much more about zfs than I do. Todd On Wed, 9 Jan 2013, Liang Ma wrote: > Todd, > > Thanks for your reply. But how can I take this brick offline? Since the > gluster volume has replicate count 2, it won't allow me to remove one brick. > Is there a command which can take one replicate brick offline? > > Many thanks. > > Liang > > > On Tue, Jan 8, 2013 at 3:02 PM, Todd Pfaff <pfaff at rhpcs.mcmaster.ca> wrote: > Liang, > > I don't claim to know the answer to your question, and my > knowledge of zfs > is minimal at best so I may be way off base here, but it seems > to me that > your attempted random corruption with this command: > > ? dd?if=/dev/urandom?of=/dev/sda6?bs=1024?count=20480 > > is likely going to corrupt the underlying zfs filesystem metadata, not > just file data, and I wouldn't expect gluster to be able to fixed a > brick's corrupted filesystem. ?Perhaps you now have to take the brick > offline, fix any zfs filesystem errors if possible, bring the brick > back > online and see what then happens with self-heal. > > -- > Todd Pfaff <pfaff at mcmaster.ca> > http://www.rhpcs.mcmaster.ca/ > > On Tue, 8 Jan 2013, Liang Ma wrote: > > Hi There, > > I'd like to test and understand the self heal feature of > glusterfs. This is > what I did with 3.3.1-ubuntu1~precise4 on Ubuntu 12.04.1 > LTS. > > gluster volume create gtest replica 2 gluster3:/zfs-test > gluster4:/zfs-test > where zfs-test is a zfs pool on partition /dev/sda6 in > both nodes. > > To simulate a random corruption on node gluster3 > > dd?if=/dev/urandom?of=/dev/sda6?bs=1024?count=20480 > > Now zfs detected the corrupted files > > ? pool: zfs-test > ?state: ONLINE > status: One or more devices has experienced an error > resulting in data > ? ? ? ? corruption. ?Applications may be affected. > action: Restore the file in question if possible. > ?Otherwise restore the > ? ? ? ? entire pool from backup. > ? ?see: http://zfsonlinux.org/msg/ZFS-8000-8A > ?scan: none requested > config: > > ? ? ? ? NAME ? ? ? ?STATE ? ? READ WRITE CKSUM > ? ? ? ? zfs-test ? ONLINE ? ? ? 0 ? ? 0 2.29K > ? ? ? ? ? sda6 ? ? ONLINE ? ? ? 0 ? ? 0 4.59K > > errors: Permanent errors have been detected in the > following files: > > ? ? ? ? /zfs-test/<xattrdir>/trusted.gfid > ? ? ? ? > /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46 > ? ? ? ? > /zfs-test/.glusterfs/b0/1e/b01ec17c-14cc-4999-938b-b4a71e358b46/<xat > trdir>/trusted.gfid > ? ? ? ? > /zfs-test/.glusterfs/dd/8c/dd8c6797-18c3-4f3b-b1ca-86def2b578c5/<xat > trdir>/trusted.gfid > > Now the gluster log file shows the self heal can't fix the > corruption > [2013-01-08 12:46:03.371214] W > [afr-common.c:1196:afr_detect_self_heal_by_iatt] > 2-gtest-replicate-0: > /K.iso: gfid different on subvolume > [2013-01-08 12:46:03.373539] E > [afr-self-heal-common.c:1419:afr_sh_common_lookup_cbk] > 2-gtest-replicate-0: > Missing Gfids for /K.iso > [2013-01-08 12:46:03.385701] E > [afr-self-heal-common.c:2160:afr_self_heal_completion_cbk] > 2-gtest-replicate-0: background? gfid self-heal failed on > /K.iso > [2013-01-08 12:46:03.385760] W > [fuse-bridge.c:292:fuse_entry_cbk] > 0-glusterfs-fuse: 11901: LOOKUP() /K.iso => -1 (No data > available) > > where K.iso is one of the sample files affected by the dd > command. > > So could anyone tell me what is the best way to repair the > simulated > corruption? > > Thank you. > > Liang > > > > >