Re: GFS2 filesystem consistency error

Bob Peterson <rpeterso@xxxxxxxxxx> · Tue, 23 Feb 2016 13:00:31 -0500 (EST)

----- Original Message -----
> Bob Peterson <rpeterso@xxxxxxxxxx> writes:
> 
> 
> [...]
> 
> > Hi Daniel,
> >
> > I'm downloading the metadata now. I'll let you know what I find.
> > It may take a while because my storage is a bit in flux at the moment.
> 
> Ok, thanks a lot for looking at our problems.
> 
> Regards.
> --
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6  2AAD CC1E 9E5B 7A6F E2DF

Hi Daniel,

I took a look at that metadata you sent me, but I didn't find any evidence
relating to the problem you posted. Either the corruption happened a long
time prior to your saving of the metadata, or else the metadata was saved
after an fsck.gfs2 fixed (or attempted to fix) the problem?

One thing's for sure: I don't see any evidence of wild file system corruption;
certainly nothing that can account for those errors.

You said the problem seemed to revolve around a gfs2_grow operation, right?
Can you make sure the lvm2 volume group has the clustered bit set?
Please do the "vgs" command and see if that volume has "c" listed in its
flags. If not, it could have caused problems for the gfs2_grow.

I've seen problems like this very rarely. Once was a legitimate bug in
GFS2 that we fixed in RHEL5, but I assume your kernel is newer than that.
The other problem we weren't able to solve because there was no evidence
of what went wrong.

My only working theory is this:

This might be related to the transition between "unlinked" dinodes and
"free". After a file is deleted, it goes to "unlinked" and has to be
transitioned to "free". This sometimes goes wrong because of the way
it needs to check what other nodes in the cluster are doing.

Maybe: If you have three nodes, and a file was unlinked on node 1, then
maybe the internode communication got confused and nodes 2 and 3 both
tried to transition it from Unlinked to Free. That is only a theory, and
there is absolutely no proof. However, I have a set of patches that are
experimental, and not even in the upstream kernel yet (hopefully soon!)
that try to tighten up and fix problems like this. It's much more common
for multiple nodes to try to transition from Unlinked to Free, and they
all fail, leaving the file in an "Unlinked" state.

Regards,

Bob Peterson
Red Hat File Systems

-- 
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster