Re: outage post-mortem

Nicolas Ochem <nicolas.ochem@xxxxxxxxx> · Fri, 28 Mar 2014 14:58:06 -0700

Joe,
Thanks for your reply.
I grep'd the logs for the name of one of the files that had become unreachable over NFS after resync (i/o error). It comes up in <volumename>.log and nfs.log on the node that had stayed online:

The relevant logs are here :
https://gist.github.com/nicolasochem/f9d24a2bf57b0d40bb7d

One important piece of information is that the node that was taken offline had previously filled up the root filesystem because of a memory/southbridge issue which filled /var/log/messages completely. Upon restoration of the machine, gluster did not come up because one file in /var/lib/gluster/peers was empty.

The issue is described there : https://bugzilla.redhat.com/show_bug.cgi?id=858732

I removed the empty peer file, glusterd started, and then I started having i/o errors described in my original mail.

The key log data is IMO : "background  meta-data data missing-entry self-heal failed on"

Based on this and the log, could it be that gluster failed to write to /var/lib/gluster because of disk full, which caused issues ?

On Fri, Mar 28, 2014 at 8:13 AM, Joe Julian <joe@xxxxxxxxxxxxxxxx> wrote:

On March 27, 2014 11:08:03 PM PDT, Nicolas Ochem <nicolas.ochem@xxxxxxxxx> wrote:

>Hi list,

>I would like to describe an issue I had today with Gluster and ask for

>opinion:

>

>I have a replicated mount with 2 replica. There is about 1TB of

>production

>data in there in around 100.000 files. They sit on 2x Supermicro

>x9dr3-ln4f

>machines with a RAID array of 18TB each, 64gb of ram, 2x Xeon CPUs, as

>recommended in Red Hat hardware guidelines for storage server. They

>have a

>10gb link between each other. I am running gluster 3.4.2 on centos 6.5

>

>This storage is NFS-mounted to a lot of production servers. A very

>little

>part of this data is actually useful, the rest is legacy.

>

>Due to some unrelated issue with one of the supermicro server (faulty

>memory), I had to take one of the nodes offline for 3 days.

>

>When I brought it back up, some files and directories ended up in

>heal-failed state (but no split-brain). Unfortunately that were the

>critical files that had been edited in the last 3 days. On the NFS

>mounts,

>attempts to read these files resulted in I/O error.

>

>I was able to fix a few of these files by manually removing them in

>each

>brick and then copying them to the mounted volume again. But I did not

>know

>what to do when full directories were unreachable because of "heal

>failed".

>

>I later read that healing could take time and that heal-failed may be a

>transient state (is that correct?

>http://stackoverflow.com/questions/19257054/is-it-normal-to-get-a-lot-of-heal-failed-entries-in-a-gluster-mount),

>but at the time I thought that was beyond recovery, so I proceeded to

>destroy the gluster volume. Then on one of the replicas I moved the

>content

>of the brick to another directory, created another volume with the same

>name, then copied the content of the brick to the mounted volume. This

>took

>around 2 hours. Then I had to reboot all my NFS-mounted machines which

>were

>in "stale NFS file handle" state.

>

>Few questions :

>- I realize that I cannot expect 1TB of data to heal instantly, but is

>there any way for me to know if  the system would have recovered

>eventually

>despite being shown as "heal failed" ?

>- if yes, what amount of files and filesize should I clean-up from my

>volume to make this time go under 10 minutes ?

>- would native gluster mounts instead of NFS have been of help here ?

>- would any other course of action have resulted in faster recovery

>time ?

>- is there a way in such situation to make one replica have authority

>about

>the correct status of the filesystem  ?

>

>Thanks in advance for your replies.

>

>

Although the self-heal daemon can take time to heal all the files, accessing a file that needs healed does trigger the heal to be performed immediately by the client (the nfs server is the client in this case).

Like pretty much all errors in GlusterFS, you would have had to look in the logs to find why something as vague as "heal failed" happened.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users