> Self heal happens whenever a lookup happens on an in-consistent file. The > commands ls -laR, find do lookup on all the files recursively under the > directory we specify. > Let's take an example: - replica 2 cluster (2 peers) with 500K files - during weekend the peer we call '1' disconnects for a short time (say 30 minutes) and when connection comes up again, about 10K files where modified or created. - on monday the Administrator hasn't any knowledge of the network glitch (let's suppose he didn't implement any sort of network logging system) - after 3 days, 1K of the 10K files modified during network glitch are still unaccessed; in the afternoon the peer 2 hard crashes due to a total hardware failure (MB replace needed) Now we have 1K files unaccessible or obsolete! I think that when a peer comes back, self-healing should start automatically. Of course we could write a shell script that tests network and issues an 'ls -laR' command when needed, but this is a sort of dirty solution. Raf > Pranith. > > ----- Original Message ----- > From: "Mohit Anchlia" <mohitanchlia at gmail.com> > To: "Pranith Kumar. Karampuri" <pranithk at gluster.com>, > gluster-users at gluster.org > Sent: Wednesday, March 16, 2011 3:19:13 AM > Subject: Re: Best practices after a peer failure? > > I thought self healing is possible only after we run "ls -alR or find > .." . It looks self healing is supposed to work when a dead node is > brought up, is that true? > > On Tue, Mar 15, 2011 at 6:07 AM, Pranith Kumar. Karampuri > <pranithk at gluster.com> wrote: >> hi R.C., >> Could you please give the exact steps when you log the bug. Please also >> give the output of gluster peer status on both the machines after >> restart. zip the files under /usr/local/var/log/glusterfs/ and >> /etc/glusterd on both the machines when this issue happens. This should >> help us debug the issue. >> >> Thanks >> Pranith. >> >> ----- Original Message ----- >> From: "R.C." <milanraf at gmail.com> >> To: gluster-users at gluster.org >> Sent: Tuesday, March 15, 2011 4:14:24 PM >> Subject: Re: Best practices after a peer failure? >> >> I've figured out the problem. >> >> If you mount the glusterfs with native client on a peer, if another peer >> crashes then doesn't self-heal after reboot. >> >> Should I put this issue in the bug tracker? >> >> Bye >> >> Raf >> >> >> ----- Original Message ----- >> From: "R.C." <milanraf at gmail.com> >> To: <gluster-users at gluster.org> >> Sent: Monday, March 14, 2011 11:41 PM >> Subject: Best practices after a peer failure? >> >> >>> Hello to the list. >>> >>> I'm practicing GlusterFS in various topologies by means of multiple >>> Virtualbox VMs. >>> >>> As the standard system administrator, I'm mainly interested in disaster >>> recovery scenarios. The first being a replica 2 configuration, with one >>> peer crashing (actually stopping VM abruptly) during data writing to the >>> volume. >>> After rebooting the stopped VM and relaunching the gluster deamon >>> (service >>> glusterd start), the cluster doesn't start healing by itself. >>> I've also tried the suggested commands: >>> find <gluster-mount> -print0 | xargs --null stat >/dev/null >>> and >>> find <gluster-mount> -type f -exec dd if='{}' of=/dev/null bs=1M \; > >>> /dev/null 2>&1 >>> without success. >>> A rebalance command recreates replicas but, when accessing cluster, the >>> always-alive client is the only one committing data to disk. >>> >>> Where am I misoperating? >>> >>> Thank you for your support. >>> >>> Raf >>> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >