self-heal failed

dtaylor at vocalabs.com (Daniel Taylor) · Thu, 10 Jan 2013 11:18:51 -0600

I've run replace-brick on missing bricks before, it should still work.

On the other hand, data corruption is the worst case failure mode.
The one time I hit data corruption on a node my final answer ended up 
being to rebuild the cluster from scratch and restore the best copy of 
the data I had (mix of backups and live data).

On 01/10/2013 11:12 AM, Liang Ma wrote:
>
> Thank you Daniel for you more comments.
>
> Now I can remove the damaged zfs brick after rebooting the system. But 
> then what can I do to rejoin a new brick? I can't run gluster volume 
> replace-brick because the old brick is gone. I can't even remove the 
> old brick because the gluster's replicate count is 2. So what is the 
> right procedure to replace a failed brick for replicate gluster volume?
>
> Liang
>
>
> On Thu, Jan 10, 2013 at 11:57 AM, Daniel Taylor <dtaylor at vocalabs.com 
> <mailto:dtaylor at vocalabs.com>> wrote:
>
>     I'm not familiar with zfs in particular, but it should have given
>     you a message saying why it won't unmount.
>
>     In the worst case you can indeed remove the mount point from
>     /etc/fstab and reboot. A hard reboot may be necessary in a case
>     like this.
>
>
>     On 01/10/2013 10:43 AM, Liang Ma wrote:
>
>
>         Yes, I stopped the glusterfs service on the damaged system but
>         zfs still won't allow me to umount the filesystem. Maybe I
>         should try to shutdown the entire system.
>
>
>         On Wed, Jan 9, 2013 at 10:28 AM, Daniel Taylor
>         <dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
>         <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>>
>         wrote:
>
>
>             On 01/09/2013 08:31 AM, Liang Ma wrote:
>
>
>                 Hi Daniel,
>
>                 Ok, if gluster can't self-heal from this situation, I
>         hope at
>                 least I can manually restore the volume by using the good
>                 brick available. So would you please tell me how can I
>         "simply
>                 rebuild the filesystem and let gluster attempt to
>         restore it
>                 from a *clean* filesystem"?
>
>
>             Trimmed for space.
>
>             You could do as Tom Pfaff suggests, but given the odds of data
>             corruption carrying forward I'd do the following:
>             Shut down gluster on the damaged system.
>             Unmount the damaged filesystem.
>             Reformat the damaged filesystem as new (throwing away any
>             potential corruption that might not get caught on rebuild)
>             Mount the new filesystem at the original mount point
>             Restart gluster
>
>             In the event of corruption due to hardware failure you'd
>         be doing
>             this on replacement hardware.
>             The key is you have to have a functional filesystem for
>         gluster to
>             work with.
>
>
>             --     Daniel Taylor             VP Operations Vocal
>         Laboratories, Inc
>         dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>
>         <mailto:dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com>>
>         612-235-5711 <tel:612-235-5711>
>             <tel:612-235-5711 <tel:612-235-5711>>
>
>             _______________________________________________
>             Gluster-users mailing list
>         Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>         <mailto:Gluster-users at gluster.org
>         <mailto:Gluster-users at gluster.org>>
>         http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>
>
>     -- 
>     Daniel Taylor             VP Operations       Vocal Laboratories, Inc
>     dtaylor at vocalabs.com <mailto:dtaylor at vocalabs.com> 612-235-5711
>     <tel:612-235-5711>
>
>     _______________________________________________
>     Gluster-users mailing list
>     Gluster-users at gluster.org <mailto:Gluster-users at gluster.org>
>     http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Daniel Taylor             VP Operations       Vocal Laboratories, Inc
dtaylor at vocalabs.com                                     612-235-5711