Hi, On Friday 25 of April 2014 12:42:59 Steven Whitehouse wrote: > Hi, > > On 24/04/14 17:29, Alan Brown wrote: > > On 30/03/14 12:34, Steven Whitehouse wrote: > >> Well that is not entirely true. We have done a great deal of > >> investigation into this issue. We do test quotas (among many other > >> things) on each release to ensure that they are working. Our tests have > >> all passed correctly, and to date you have provided the only report of > >> this particular issue via our support team. So it is certainly not > >> something that lots of people are hitting. > > > > Someone else reported it on this list (on centos), so we're not an > > isolated case. > > > >> We do now have a good idea of where the issue is. However it is clear > >> that simply exceeding quotas is not enough to trigger it. Instead quotas > >> need to be exceeded in a particular way. > > > > My suspicion is that it's some kind of interaction between quotas and > > NFS, but it'd be good if you could provide a fuller explanation. > > Yes, thats what we thought to start with... however that turned out to > be a bit of a red herring. Or at least the issue has nothing > specifically to do with NFS. The problem was related to when quota was > exceeded, and specifically what operation was in progress. You could > write to files as often as you wanted to, and exceeding quota would be > handled correctly. The problem was a specific code path within the inode > creation code, if it didn't result in quota being exceeded on that one > specific code path, then everything would work as expected. could you please provide a (somewhat reliable) test case to reproduce this bug? I have looked at the patch, and found nothing obviously related to quotas (it seems the patch only changes the fail-path of posix_acl_create() call, which doesn't appear to have nothing to do with quotas) I have been facing a possibly quota-related oops in GFS2 for some time, which I am unable to reproduce without switching my cluster to production use (which means potentialy facing the anger of my users, which I'd rather not do without at least a chance of the issue being fixed). sadly, I don't have RedHat support subscription (nor do I use RHEL or derivates), my kernel is mostly upstream. thanks Pavel Herrmann > > Also, quite often when the problem did appear, it did not actually > trigger a problem until later, making it difficult to track down. > > You are correct that someone else reported the issue on the list, > however I'm not aware of any other reports beyond yours and theirs. > Also, this was specific to certain versions of GFS2, and not something > that relates to all versions. > > The upstream patch is here: > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/gfs > 2?id=059788039f1e6343f34f46d202f8d9f2158c2783 > > It should be available in RHEL shortly - please ping support via the > ticket for updates, > > Steve. > > >> Returning to the original point however, it is certainly not recommended > >> to have mixed RHEL or CentOS versions running in the same cluster. It is > >> much better to keep everything the same, even though the GFS2 on-disk > >> format has not changed between the versions. > > > > More specfically (for those who are curious): Whilst the on-disk > > format has not changed between EL5 and EL6, the way that RH cluster > > members communicate with each other has. > > > > I ran a quick test some time back and the 2 different OS cluster > > versions didn't see each other for LAN heartbeating. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster