Bob Peterson <rpeterso@xxxxxxxxxx> writes: > Hi Daniel, Hello, > I took a look at that metadata you sent me, but I didn't find any evidence > relating to the problem you posted. Either the corruption happened a long > time prior to your saving of the metadata, or else the metadata was saved > after an fsck.gfs2 fixed (or attempted to fix) the problem? - when I first encountered the problem, I did an fsck on the filesystem with version 3.1.6 from Ubuntu. - several days after, the same messages “dirty_inode: glock -5” start showing on the same node as the first time. - I did an fsck with 3.1.8 build from git - few days after, the same node had the “dirty_inode” messages, I shutdown that node and then run the “gfs2_edit savemeta”. All nodes are same hardware and OS/kernel/pacemaker version. > One thing's for sure: I don't see any evidence of wild file system corruption; > certainly nothing that can account for those errors. > > You said the problem seemed to revolve around a gfs2_grow operation, > right? Not exactly, I live grow the fs 6 months ago and encounter some troubles, I did an fsck by that time and the fs run fine for months. Then we had the “dirty_inode” troubles starting on Feb 9. > Can you make sure the lvm2 volume group has the clustered bit set? > Please do the "vgs" command and see if that volume has "c" listed in its > flags. If not, it could have caused problems for the gfs2_grow. Yes it has the cluster flag. > I've seen problems like this very rarely. Once was a legitimate bug in > GFS2 that we fixed in RHEL5, but I assume your kernel is newer than > that. We have 3.13.0-78-generic from Ubuntu. [...] > My only working theory is this: > > This might be related to the transition between "unlinked" dinodes and > "free". After a file is deleted, it goes to "unlinked" and has to be > transitioned to "free". This sometimes goes wrong because of the way > it needs to check what other nodes in the cluster are doing. > > Maybe: If you have three nodes, and a file was unlinked on node 1, then > maybe the internode communication got confused and nodes 2 and 3 both > tried to transition it from Unlinked to Free. That is only a theory, and > there is absolutely no proof. However, I have a set of patches that are > experimental, and not even in the upstream kernel yet (hopefully soon!) > that try to tighten up and fix problems like this. It's much more common > for multiple nodes to try to transition from Unlinked to Free, and they > all fail, leaving the file in an "Unlinked" state. Thanks for the explanations, so I try to re-add the down node to the cluster and see what happen. Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
Attachment:
signature.asc
Description: PGP signature
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster