Hi Bartolomé , this can happen, if you are using php session ids. The httpd processes can run into a deadlock situation when two httpd processes try to flock the same session id file. In that case they stay in the "D" state which means waiting for I/O. To prevent this, we wrote a patch for php with the following features: - make transaction based file locking on session id files. - prevent php from flocking session ids (this is the NFS like behaviour, as NFS does not support flocks.) You can download the patch at http://open-sharedroot.org Have Fun, Mark On Wednesday 18 January 2006 11:42, Bartolomé Rodríguez wrote: > Hi list, > > Release: Red Hat Enterprise Linux ES release 4 (Nahant Update 2) > GFS-6.1.2-0 > GFS-kernel-smp-2.6.9-42.2 > Kernel running: 2.6.9-22.0.1.ELsmp > Apache: httpd-2.0.52-19.ent > > I have 4 servers with GFS. These servers have shared disk with GFS > (The disks are in a san with fiberchannel). This scene is in production. > 2 times in 2 weeks the httpd process become stat "D". We thinks GFS is > blocking something (file or inode... i dont know). And when httpd try > read or writhe there, then become stat "D": > > apache 14383 1.5 0.3 24004 13300 ? D 17:36 3:57 > /usr/sbin/httpd > apache 14503 1.6 0.3 23996 13296 ? D 17:49 3:53 > /usr/sbin/httpd > apache 14516 1.6 0.3 23988 13288 ? D 17:50 3:58 > /usr/sbin/httpd > apache 14522 1.6 0.3 24044 13340 ? D 17:50 3:54 > /usr/sbin/httpd > apache 14635 1.5 0.3 24092 13392 ? D 18:00 3:23 > /usr/sbin/httpd > apache 14661 1.6 0.3 24048 13344 ? D 18:03 3:33 > /usr/sbin/httpd > apache 14662 1.7 0.3 24096 13392 ? D 18:03 3:48 > /usr/sbin/httpd > apache 14671 1.4 0.3 24044 13340 ? D 18:03 3:12 > /usr/sbin/httpd > apache 14687 1.6 0.3 24020 13320 ? D 18:03 3:42 > /usr/sbin/httpd > apache 17329 0.8 0.3 24356 13640 ? D 19:42 1:04 > /usr/sbin/httpd > apache 17331 0.9 0.3 24000 13284 ? D 19:42 1:05 > /usr/sbin/httpd > apache 17332 1.2 0.3 24156 13456 ? D 19:43 1:29 > /usr/sbin/httpd > apache 17335 0.9 0.3 24128 13412 ? D 19:43 1:06 > /usr/sbin/httpd > apache 17345 0.9 0.3 23908 13120 ? D 19:43 1:07 > /usr/sbin/httpd > apache 17347 0.9 0.3 23608 12896 ? D 19:43 1:07 > /usr/sbin/httpd > apache 17385 0.8 0.3 24120 13416 ? D 19:45 1:01 > /usr/sbin/httpd > apache 17386 1.2 0.3 23932 13228 ? D 19:46 1:26 > /usr/sbin/httpd > apache 17387 1.1 0.3 23904 13200 ? D 19:46 1:17 > /usr/sbin/httpd > apache 17398 0.9 0.3 24064 13360 ? D 19:49 1:04 > /usr/sbin/httpd > apache 17599 1.3 0.3 24392 13676 ? D 20:41 0:52 > /usr/sbin/httpd > apache 17606 0.3 0.3 23472 12632 ? D 20:42 0:11 > /usr/sbin/httpd > apache 17607 0.6 0.3 24024 13312 ? D 20:42 0:23 > /usr/sbin/httpd > apache 17608 1.1 0.3 23880 13164 ? D 20:42 0:44 > /usr/sbin/httpd > apache 17609 0.3 0.3 23556 12736 ? D 20:42 0:11 > /usr/sbin/httpd > apache 17620 0.5 0.3 23928 13204 ? D 20:43 0:21 > /usr/sbin/httpd > apache 17632 0.0 0.2 23168 12268 ? D 20:46 0:03 > /usr/sbin/httpd > apache 17633 0.6 0.3 23448 12704 ? D 20:47 0:20 > /usr/sbin/httpd > apache 17635 0.0 0.3 23464 12592 ? D 20:48 0:01 > /usr/sbin/httpd > apache 17636 0.0 0.3 23448 12584 ? D 20:49 0:02 > /usr/sbin/httpd > apache 17637 0.2 0.3 23472 12640 ? D 20:49 0:07 > /usr/sbin/httpd > apache 17638 0.2 0.3 23508 12772 ? D 20:49 0:07 > /usr/sbin/httpd > apache 17639 0.2 0.3 23488 12672 ? D 20:49 0:07 > /usr/sbin/httpd > apache 17643 0.1 0.3 23860 13104 ? D 20:50 0:03 > /usr/sbin/httpd > apache 17644 0.0 0.2 23156 12376 ? D 20:51 0:02 > /usr/sbin/httpd > apache 17645 1.2 0.3 23472 12680 ? D 20:51 0:39 > /usr/sbin/httpd > apache 17647 0.6 0.2 23184 12428 ? D 20:52 0:19 > /usr/sbin/httpd > apache 17648 0.5 0.3 23496 12760 ? D 20:52 0:18 > /usr/sbin/httpd > apache 17665 0.4 0.3 23524 12700 ? D 20:54 0:11 > /usr/sbin/httpd > apache 17667 0.7 0.3 23972 13240 ? D 20:54 0:20 > /usr/sbin/httpd > apache 17668 0.1 0.3 23500 12652 ? D 20:54 0:03 > /usr/sbin/httpd > apache 17669 0.2 0.3 23488 12652 ? D 20:54 0:06 > /usr/sbin/httpd > apache 17671 0.8 0.3 23696 12888 ? D 20:55 0:25 > /usr/sbin/httpd > apache 17732 0.1 0.2 23188 12264 ? D 21:01 0:03 > /usr/sbin/httpd > apache 17755 0.4 0.3 23588 12768 ? D 21:02 0:11 > /usr/sbin/httpd > > This httpd process appear in the 4 servers. The first time, we > reboot all the cluster´s members, but the servers cant unmount GFS > filesystem, therefore the only solution is a fisical poweroff (botton). > The second time we saw the first httpd process with stat "D" and then > reboot only this server. Command reboot dont work again and fisical > poweroff is required. But then, the others 3 servers fix without reboot. > In this 3 servers, the httpd process in stat "D" became stat "S", all ok. > I had read this mail: > https://www.redhat.com/archives/linux-cluster/2005-December/msg00054.html > I think is very similar, but in my log dont appear explicitily > "stuck in gfs" i dont know if i have the max logging for gfs. I only see > in my logs the try of unmount or mount: > > Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: > withdrawing from cluster at user's request > Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: about > to withdraw from the cluster > Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: > waiting for outstanding I/O > Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: > telling LM to withdraw > Jan 16 21:53:55 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: withdrawn > Jan 16 21:58:45 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:02:01 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:02:13 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:02:25 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:02:37 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:02:49 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:03:01 sf-03 gfs: Unmounting GFS filesystems: failed > Jan 16 22:03:14 sf-03 vgchange: Can't deactivate volume group > "vg_gfs1" with 2 open logical volume(s) > Jan 16 22:03:14 sf-03 clvmd: Deactivating VG vg_gfs1: failed > Jan 16 22:03:14 sf-03 vgchange: Can't deactivate volume group > "vg_gfs2" with 1 open logical volume(s) > Jan 16 22:03:14 sf-03 clvmd: Deactivating VG vg_gfs2: failed > Jan 16 22:10:36 sf-03 ccsd[3229]: cluster.conf (cluster name = > gfs_cluster, version = 2) found. > Jan 16 22:11:59 sf-03 vgchange: 3 logical volume(s) in volume group > "vg_gfs1" now active > Jan 16 22:11:59 sf-03 vgchange: 3 logical volume(s) in volume group > "vg_gfs2" now active > Jan 16 22:12:20 sf-03 kernel: GFS 2.6.9-42.2 (built Oct 21 2005 > 11:57:26) installed > Jan 16 22:12:20 sf-03 kernel: GFS: Trying to join cluster "lock_dlm", > "gfs_cluster:gfs_raiz" > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Joined > cluster. Now mounting FS... > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2: > Trying to acquire journal lock... > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2: > Looking at journal... > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2: Done > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Scanning > for log elements... > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Found 0 > unlinked inodes > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Found > quota changes for 0 IDs > Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Done > Jan 16 22:12:22 sf-03 kernel: GFS: Trying to join cluster "lock_dlm", > "gfs_cluster:gfs_cache" > Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Joined > cluster. Now mounting FS... > Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2: > Trying to acquire journal lock... > Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2: > Looking at journal... > Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2: > Done Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: > Scanning for log elements... > Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Found 3 > unlinked inodes > Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Found > quota changes for 0 IDs > Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Done > Jan 16 22:12:26 sf-03 kernel: GFS: Trying to join cluster "lock_dlm", > "gfs_cluster:gfs_htdocs" > Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: Joined > cluster. Now mounting FS... > Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2: > Trying to acquire journal lock... > Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2: > Looking at journal... > Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2: > Done > Jan 16 22:12:28 sf-03 kernel: GFS: Trying to join cluster "lock_dlm", > "gfs_cluster:lv_webcalle20" > Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2: > Joined cluster. Now mounting FS... > Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2: > jid=2: Trying to acquire journal lock... > Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2: > jid=2: Looking at journal... > Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2: > jid=2: Done > Jan 16 22:12:30 sf-03 gfs: Mounting GFS filesystems: succeeded > > At the end we saw a gfs_tool option interesting "lockdump" and > before reboot ran it. At the end there are some output of my mountpoint > If you need more information about configuration or something please > say me. > > gfs_tool lockdump /mountpoint: > > Glock (5, 2114727) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 119206) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 119206/119206 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 228692) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 228692/228692 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 735842) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 > Glock (5, 1418402) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 1885491) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 1885491/1885491 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 399729) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 399729/399729 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 1386646) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (5, 241672) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (5, 207713) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 946688) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 946688/946688 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 30184) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 8340) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 8340/8340 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 2308745) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 2308745/2308745 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 949390) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 949390/949390 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 548987) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 1437881) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 1437881/1437881 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 139108) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 261765) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 261765/261765 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 2530374) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 2530374/2530374 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 33091) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 863848) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 863848/863848 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 208549) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 51708) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 51708/51708 > type = 2 > i_count = 1 > i_flags = > vnode = yes > Glock (5, 1887878) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (5, 864369) > gl_flags = > gl_count = 2 > gl_state = 3 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = no > ail_bufs = no > Holder > owner = -1 > gh_state = 3 > gh_flags = 5 7 > error = 0 > gh_iflags = 1 6 7 > Glock (2, 1436746) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 1436746/1436746 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 608211) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 608211/608211 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 609430) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 609430/609430 > type = 1 > i_count = 1 > i_flags = > vnode = yes > Glock (2, 893532) > gl_flags = > gl_count = 2 > gl_state = 0 > req_gh = no > req_bh = no > lvb_count = 0 > object = yes > new_le = no > incore_le = no > reclaim = no > aspace = 0 > ail_bufs = no > Inode: > num = 893532/893532 > type = 1 > i_count = 1 > i_flags = > vnode = yes > > Thanks in advance. -- Gruss / Regards, Dipl.-Ing. Mark Hlawatschek Phone: +49-89 121 409-55 http://www.atix.de/ ** ATIX - Ges. fuer Informationstechnologie und Consulting mbH Einsteinstr. 10 - 85716 Unterschleissheim - Germany -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster