Alright, more testing today. I reconfigured the GFS nodes to export the filesystems using NFS4 and adjusted the client mounts accordingly and BAM! everything worked, even firefox which was also killing me. So it seems this is a GFS2+NFS3 thing. I did have to, as previously noted, shrink my wsize down 8k. This doesn't fix my problem since I have rhel3 machines which need to be able to run KDE apps etc. However, it's a data point and paves the way forward. I tried to configure the cluster based exports but there is little in the way of documentation for exporting things in an NFS4 way. The only way I could see to do it was to export from all nodes and just float the IP's around. It works but it's not what I'd call ideal. If someone has the instructions or pointers to them, I'd greatly appreciate it. As things are right now, I can have both nfs3 and nfs4 clients mount the exported filesystems due to the nature of the exports and mount --bind options. I also noticed that when failing an IP, and thus the client mount point from say server1 to server2, then back to server1 in a short time (less than a minute) the client locked up for a few minutes. This has had a bugzilla entry but I seem to have lost the link. Is there work being done to fix that? or is the accepted workaround just to avoid such silly behaviour? All in all, I still need to find a fix for this, but I wanted to point out that NFS4 seems to work. Any information is appreciated and sorry for the seemingly ranting nature of the previous post. On Sat, Feb 21, 2009 at 7:47 AM, Corey Kovacs <corey.kovacs@xxxxxxxxx> wrote: > Yeah, I was the one who suggested lowering the wsize. That's howI > "fixed" this last time I had the problem. I am convinced this is a > GFS+NFS thing as we currentlly have a Tru64 cluster doing this job and > works just fine. The first time I set up my workstation as a client to > the new cluster running 5.2 and gfs2 eval code it was broken. I > dropped back to gfs1 and things worked again. Some time later (about > two months ago, I converted all the filesystems on that cluster to > gfs2 again and things broke. Thats when I changed the write size to > 8192 as i noticed that firefox 2 was not working correctly. It would > freeze until some other NFS based IO had occurred. Lowering the wsize > allowed firefox (and kde it seemed) to carry out there locking > operations. I have since upgraded to rhel5.3 and gfs2, changing again > from 32 to 64 bit and things are broken again. > > It's very easy to point over at the KDE people, but the fact is, this > normally works and some combination of GFS+NFS3+RHEL5 has broken this. > It may be a similar situation in which newer compilers will "break" > old code by exposing obscure bugs in bad coding, or it might be > related to efforts in making NFS failover nicely in a cluster. I'd > believe the latter way before I'd believe the first. > > Try64 has done this great for years. Is it really that hard of a > problem? I know that thel2.1 actually patched the kernel to help this > along and redhat moved away from that approach in order to stay > mainline with the vger kernel. Failove rwas "fixed" in thel3 by > maintaining a "latest" copy of the rmtab files to prevent stale > handles. And now things have changed again. Maybe it's time someone > got on the horn to HP/COMPAQ/DEC or whatever they want to call > themselves this year and ask them a question or two. They just donated > advfs (which has a lot of zfx like capabilities and has for years) to > the world. Maybe they'd be interested in helping to get things like > this to work properly. > > I shouldn't require dropping a complete subset of > software/applications in order to upgrade to the latest supported > offerring from a vendor. I realize that redhat has always seemed to > hold KDE in some kind of contempt but it's not just going to affect > the desktop environment. In my eyes redhat has shipped something > broken. If testing didn't show this then the testing was poor. If it > did, then thats just bad business. > > Believe me when I say I'd much rather it be something simple I am > missing but I don't believe that to be the case. Especially since > "fixes" i've had to make keep breaking after every release/kernel > upgrade. > > To make sure this is a GFS+NFS problem, I'll export an ext3 filesystem > and try again. Somehow I expect it to work just fine. > > Sorry if this reads like a rant but this is really annoying. I am the > local RedHat/Linux Posterboy at work and this just looks bad all > around. > > Thanks for your suggestions Stewart. > > > Corey > > > On Sat, Feb 21, 2009 at 2:57 AM, Stewart Walters <spods@xxxxxxxxxxxx> wrote: >> I forgot there were other factors. >> >> One site that I had seen this working had the NFS client options set >> rsize=32768,wsize=32768. >> >> They had increased read size and write size. But I'm not sure what their >> motivation for it was (it was done on the site before my time). >> >> I seem to remember a post to the cluster list within the last two months about >> KDE and NFS where the suggestion was to set rsize=8192. So perhaps KDE needs at >> least a read size if 8MB before it can properly establish a lock. >> >> The post I saw from KDE devs about the need for locking was that they really do >> require a lock on certain files, by design. So there is no way the KDE devs will >> ever change this. >> >> Also, KDE has a tendency to create symbollically link files inside the user home >> directory to files that exist in /tmp. >> >> Check that these files in /tmp are being created, and that the users have >> appropriate write access to the files. Also check that the symbollic links in >> the user /home is linked to an actual file. >> >> Hope that helps, >> >> Stewart >> >> >> On Sat Feb 21 5:13 , Corey Kovacs <corey.kovacs@xxxxxxxxx> sent: >> >>>Well, I tried setting the mount to soft but that didn't change a >>>thing. I am using NFS v3 if that matters. I didn't have time to check >>>v4. Problem is that I have machines that are using rhel3 so I am stuck >>>with v3 it for now. It will be a good data point though. >>> >>>Is no one else having this issue? Hard to believe that the KDE devs >>>don't do something to "fix" the KDE over NFS issues that I've been >>>reading for the last few days. That said, I'd not had a problem until >>>I started testing gfs1/gfs2 experted by NFS. That and Firefox 3.x due >>>to an sqllite problem. >>> >>>Anyone have any ideas? I am stumped. >>> >>> >>>-C >>> >>>On Fri, Feb 20, 2009 at 7:19 AM, Corey Kovacs corey.kovacs@xxxxxxxxx> wrote: >>>> Thanks Stewert, I'll try that. >>>> >>>> >>>> -C >>>> >>>> On Fri, Feb 20, 2009 at 4:52 AM, Stewart Walters spods@xxxxxxxxxxxx> wrote: >>>>> I've found KDE + NFS home directory problems were cleared up when the NFS client >>>>> options were changed from hard locking to soft locking. >>>>> >>>>> At this point KDE worked better (i.e. worked at all). >>>>> >>>>> Not sure about Firefox though. >>>>> >>>>> Regards, >>>>> >>>>> Stewart >>>>> >>>>> >>>>> >>>>> On Fri Feb 20 6:04 , Corey Kovacs sent: >>>>> >>>>>>Up until I upgraded my cluster nodes to RHEL5.3, I was able to use KDE >>>>>>with nfs based home dirs off of a GFS1 export. I have never been able >>>>>>to get firefox 3.x to run acceptably at all. After the upgrade, KDE >>>>>>stopped working altogether and firefox still doesn't work. They both >>>>>>work with local users just fine. KDE it seems requires that it's lock >>>>>>can be guaranteed and firefox 3.x has issues with things going across >>>>>>NFS due to sqllite issues. The sqllite issues were supposed to be >>>>>>fixed but it still doesn't work in my setup. >>>>>> >>>>>>My question is simply what can I do, if anything, to improve this. Are >>>>>>others seeing the same issues? I am trying to get this cluster into >>>>>>production but if it means that half of the apps that people use are >>>>>>not going to work, then there is little point in inflicting this on >>>>>>the users. >>>>>> >>>>>>Any help is appreciated. >>>>>> >>>>>>-- >>>>>>Linux-cluster mailing list >>>>>>Linux-cluster@xxxxxxxxxx >>>>>>https://www.redhat.com/mailman/listinfo/linux-cluster >>>>>>) >>>>> >>>>> >>>>> >>>> >>>) >> >> >> > -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster