Hi, On Mon, 2009-09-28 at 12:13 +0200, Libor Tomsik wrote: > Hi, > >Hi, > > > >On Sat, 2009-09-26 at 18:29 +0200, Libor Tomsik wrote: > >> Hi all, > >> > >> I'm having a strange issue with a two nodes cluster based on xen > >> virtual hosts with shared disk on clvm. The servers are running apache > >> and one is considered as hot backup. On that node awstats are counted > >> from the apache custom logs stored on the shared device. Web data, > >> logs, configs and awstats results are in different directories withing > >> the same GFS2 volume. > >> > >> Everything works fine, but sometimes (at production environment, damn) > >> the directory with logs get frozen for the spare node with awstats. > >> All commands like ls, cd, mc on that directory get status D. On the > >> second node all works fine. Other directories seems unaffected too. > >> > >> I can not umount fs neither remout it ro and back rw since there are > >> "running" processes at D state. > >> > >> Can someone give me some advice, how-to prevent this problem? And > >> how-to recovery from it? It is a production with SLA on :( In next > >> time, I'll try to make lockdump on both nodes. > >> > >> Kernel is 2.6.18-128.1.10.el5xen, gfs2-utils-0.1.53-1.el5_3.2, > >> kmod-gfs2-xen-1.92-1.1.el5_2.2 > >> > >> Regards > >> > >> Libor > >> > >That sounds to me like there is a lot of activity from both nodes > >relating to the same directory. Can you split the logs of the two nodes > >into two different directories? That will probably solve the problem. > > > Actually there is just one apache writing on one server. Well in many > threads. Maybe this is the problem? I have about 40 sites hosted > there. So 2x40 separate log files. > The second node is just periodically reading this directory. > That can still cause a problem. The second node will require a shared lock on the directory, so if there is any file creation going on, it will be dramatically slowed down by that. Is it possible to stop the second node's I/O to check that? There shouldn't really be a bit issue with lots of threads provided they are all on the same node as is the case here, Steve. > >This kind of problem is tricky to debug since the glock dumps will tell > >you what state the glocks are currently in, and not what has been > >happening the in past. > > > >In the upstream code we've now got GFS2 tracepoints which will help in > >tracking down issues like this, but those are not in RHEL yet, > > > >Steve. > > > >> -- > >> Linux-cluster mailing list > >> Linux-cluster redhat com > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > Regards > > Libor. > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster