Hi list,
Release: Red Hat Enterprise Linux ES release 4 (Nahant Update 2)
GFS-6.1.2-0
GFS-kernel-smp-2.6.9-42.2
Kernel running: 2.6.9-22.0.1.ELsmp
Apache: httpd-2.0.52-19.ent
I have 4 servers with GFS. These servers have shared disk with GFS
(The disks are in a san with fiberchannel). This scene is in production.
2 times in 2 weeks the httpd process become stat "D". We thinks GFS is
blocking something (file or inode... i dont know). And when httpd try
read or writhe there, then become stat "D":
apache 14383 1.5 0.3 24004 13300 ? D 17:36 3:57
/usr/sbin/httpd
apache 14503 1.6 0.3 23996 13296 ? D 17:49 3:53
/usr/sbin/httpd
apache 14516 1.6 0.3 23988 13288 ? D 17:50 3:58
/usr/sbin/httpd
apache 14522 1.6 0.3 24044 13340 ? D 17:50 3:54
/usr/sbin/httpd
apache 14635 1.5 0.3 24092 13392 ? D 18:00 3:23
/usr/sbin/httpd
apache 14661 1.6 0.3 24048 13344 ? D 18:03 3:33
/usr/sbin/httpd
apache 14662 1.7 0.3 24096 13392 ? D 18:03 3:48
/usr/sbin/httpd
apache 14671 1.4 0.3 24044 13340 ? D 18:03 3:12
/usr/sbin/httpd
apache 14687 1.6 0.3 24020 13320 ? D 18:03 3:42
/usr/sbin/httpd
apache 17329 0.8 0.3 24356 13640 ? D 19:42 1:04
/usr/sbin/httpd
apache 17331 0.9 0.3 24000 13284 ? D 19:42 1:05
/usr/sbin/httpd
apache 17332 1.2 0.3 24156 13456 ? D 19:43 1:29
/usr/sbin/httpd
apache 17335 0.9 0.3 24128 13412 ? D 19:43 1:06
/usr/sbin/httpd
apache 17345 0.9 0.3 23908 13120 ? D 19:43 1:07
/usr/sbin/httpd
apache 17347 0.9 0.3 23608 12896 ? D 19:43 1:07
/usr/sbin/httpd
apache 17385 0.8 0.3 24120 13416 ? D 19:45 1:01
/usr/sbin/httpd
apache 17386 1.2 0.3 23932 13228 ? D 19:46 1:26
/usr/sbin/httpd
apache 17387 1.1 0.3 23904 13200 ? D 19:46 1:17
/usr/sbin/httpd
apache 17398 0.9 0.3 24064 13360 ? D 19:49 1:04
/usr/sbin/httpd
apache 17599 1.3 0.3 24392 13676 ? D 20:41 0:52
/usr/sbin/httpd
apache 17606 0.3 0.3 23472 12632 ? D 20:42 0:11
/usr/sbin/httpd
apache 17607 0.6 0.3 24024 13312 ? D 20:42 0:23
/usr/sbin/httpd
apache 17608 1.1 0.3 23880 13164 ? D 20:42 0:44
/usr/sbin/httpd
apache 17609 0.3 0.3 23556 12736 ? D 20:42 0:11
/usr/sbin/httpd
apache 17620 0.5 0.3 23928 13204 ? D 20:43 0:21
/usr/sbin/httpd
apache 17632 0.0 0.2 23168 12268 ? D 20:46 0:03
/usr/sbin/httpd
apache 17633 0.6 0.3 23448 12704 ? D 20:47 0:20
/usr/sbin/httpd
apache 17635 0.0 0.3 23464 12592 ? D 20:48 0:01
/usr/sbin/httpd
apache 17636 0.0 0.3 23448 12584 ? D 20:49 0:02
/usr/sbin/httpd
apache 17637 0.2 0.3 23472 12640 ? D 20:49 0:07
/usr/sbin/httpd
apache 17638 0.2 0.3 23508 12772 ? D 20:49 0:07
/usr/sbin/httpd
apache 17639 0.2 0.3 23488 12672 ? D 20:49 0:07
/usr/sbin/httpd
apache 17643 0.1 0.3 23860 13104 ? D 20:50 0:03
/usr/sbin/httpd
apache 17644 0.0 0.2 23156 12376 ? D 20:51 0:02
/usr/sbin/httpd
apache 17645 1.2 0.3 23472 12680 ? D 20:51 0:39
/usr/sbin/httpd
apache 17647 0.6 0.2 23184 12428 ? D 20:52 0:19
/usr/sbin/httpd
apache 17648 0.5 0.3 23496 12760 ? D 20:52 0:18
/usr/sbin/httpd
apache 17665 0.4 0.3 23524 12700 ? D 20:54 0:11
/usr/sbin/httpd
apache 17667 0.7 0.3 23972 13240 ? D 20:54 0:20
/usr/sbin/httpd
apache 17668 0.1 0.3 23500 12652 ? D 20:54 0:03
/usr/sbin/httpd
apache 17669 0.2 0.3 23488 12652 ? D 20:54 0:06
/usr/sbin/httpd
apache 17671 0.8 0.3 23696 12888 ? D 20:55 0:25
/usr/sbin/httpd
apache 17732 0.1 0.2 23188 12264 ? D 21:01 0:03
/usr/sbin/httpd
apache 17755 0.4 0.3 23588 12768 ? D 21:02 0:11
/usr/sbin/httpd
This httpd process appear in the 4 servers. The first time, we
reboot all the cluster´s members, but the servers cant unmount GFS
filesystem, therefore the only solution is a fisical poweroff (botton).
The second time we saw the first httpd process with stat "D" and then
reboot only this server. Command reboot dont work again and fisical
poweroff is required. But then, the others 3 servers fix without reboot.
In this 3 servers, the httpd process in stat "D" became stat "S", all ok.
I had read this mail:
https://www.redhat.com/archives/linux-cluster/2005-December/msg00054.html
I think is very similar, but in my log dont appear explicitily
"stuck in gfs" i dont know if i have the max logging for gfs. I only see
in my logs the try of unmount or mount:
Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2:
withdrawing from cluster at user's request
Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: about
to withdraw from the cluster
Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2:
waiting for outstanding I/O
Jan 16 21:53:54 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2:
telling LM to withdraw
Jan 16 21:53:55 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: withdrawn
Jan 16 21:58:45 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:02:01 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:02:13 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:02:25 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:02:37 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:02:49 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:03:01 sf-03 gfs: Unmounting GFS filesystems: failed
Jan 16 22:03:14 sf-03 vgchange: Can't deactivate volume group
"vg_gfs1" with 2 open logical volume(s)
Jan 16 22:03:14 sf-03 clvmd: Deactivating VG vg_gfs1: failed
Jan 16 22:03:14 sf-03 vgchange: Can't deactivate volume group
"vg_gfs2" with 1 open logical volume(s)
Jan 16 22:03:14 sf-03 clvmd: Deactivating VG vg_gfs2: failed
Jan 16 22:10:36 sf-03 ccsd[3229]: cluster.conf (cluster name =
gfs_cluster, version = 2) found.
Jan 16 22:11:59 sf-03 vgchange: 3 logical volume(s) in volume group
"vg_gfs1" now active
Jan 16 22:11:59 sf-03 vgchange: 3 logical volume(s) in volume group
"vg_gfs2" now active
Jan 16 22:12:20 sf-03 kernel: GFS 2.6.9-42.2 (built Oct 21 2005
11:57:26) installed
Jan 16 22:12:20 sf-03 kernel: GFS: Trying to join cluster "lock_dlm",
"gfs_cluster:gfs_raiz"
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Joined
cluster. Now mounting FS...
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2:
Trying to acquire journal lock...
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2:
Looking at journal...
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: jid=2: Done
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Scanning
for log elements...
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Found 0
unlinked inodes
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Found
quota changes for 0 IDs
Jan 16 22:12:22 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_raiz.2: Done
Jan 16 22:12:22 sf-03 kernel: GFS: Trying to join cluster "lock_dlm",
"gfs_cluster:gfs_cache"
Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Joined
cluster. Now mounting FS...
Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2:
Trying to acquire journal lock...
Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2:
Looking at journal...
Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: jid=2: Done
Jan 16 22:12:25 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2:
Scanning for log elements...
Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Found 3
unlinked inodes
Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Found
quota changes for 0 IDs
Jan 16 22:12:26 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_cache.2: Done
Jan 16 22:12:26 sf-03 kernel: GFS: Trying to join cluster "lock_dlm",
"gfs_cluster:gfs_htdocs"
Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: Joined
cluster. Now mounting FS...
Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2:
Trying to acquire journal lock...
Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2:
Looking at journal...
Jan 16 22:12:28 sf-03 kernel: GFS: fsid=gfs_cluster:gfs_htdocs.2: jid=2:
Done
Jan 16 22:12:28 sf-03 kernel: GFS: Trying to join cluster "lock_dlm",
"gfs_cluster:lv_webcalle20"
Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2:
Joined cluster. Now mounting FS...
Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2:
jid=2: Trying to acquire journal lock...
Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2:
jid=2: Looking at journal...
Jan 16 22:12:30 sf-03 kernel: GFS: fsid=gfs_cluster:lv_webcalle20.2:
jid=2: Done
Jan 16 22:12:30 sf-03 gfs: Mounting GFS filesystems: succeeded
At the end we saw a gfs_tool option interesting "lockdump" and
before reboot ran it. At the end there are some output of my mountpoint
If you need more information about configuration or something please
say me.
gfs_tool lockdump /mountpoint:
Glock (5, 2114727)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 119206)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 119206/119206
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 228692)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 228692/228692
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 735842)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6
Glock (5, 1418402)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 1885491)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 1885491/1885491
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 399729)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 399729/399729
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 1386646)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (5, 241672)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (5, 207713)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 946688)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 946688/946688
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 30184)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 8340)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 8340/8340
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 2308745)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 2308745/2308745
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 949390)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 949390/949390
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 548987)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 1437881)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 1437881/1437881
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 139108)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 261765)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 261765/261765
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 2530374)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 2530374/2530374
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 33091)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 863848)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 863848/863848
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (5, 208549)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 51708)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 51708/51708
type = 2
i_count = 1
i_flags =
vnode = yes
Glock (5, 1887878)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (5, 864369)
gl_flags =
gl_count = 2
gl_state = 3
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = no
ail_bufs = no
Holder
owner = -1
gh_state = 3
gh_flags = 5 7
error = 0
gh_iflags = 1 6 7
Glock (2, 1436746)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 1436746/1436746
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 608211)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 608211/608211
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 609430)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 609430/609430
type = 1
i_count = 1
i_flags =
vnode = yes
Glock (2, 893532)
gl_flags =
gl_count = 2
gl_state = 0
req_gh = no
req_bh = no
lvb_count = 0
object = yes
new_le = no
incore_le = no
reclaim = no
aspace = 0
ail_bufs = no
Inode:
num = 893532/893532
type = 1
i_count = 1
i_flags =
vnode = yes
Thanks in advance.
--
________________________________________
Bartolomé Rodríguez Bordallo
Departamento de Explotación de Servicios
FUJITSU ESPAÑA SERVICES, S.A.U.
Camino Cerro de los Gamos, 1
28224 Pozuelo de Alarcón, Madrid
Tel.: 902 11 40 10
Mail: brodriguezb@xxxxxxxxxx
________________________________________
La información contenida en este e-mail es confidencial y va dirigida únicamente al receptor que aparece como destinatario. Si ha recibido este e-mail por error, por favor, notifíquenoslo inmediatamente y bórrelo de su sistema. Por favor, en tal caso, no lo copie ni lo use para ningún propósito, ni revele sus contenidos a ninguna persona ni lo almacene ni copie esta información en ningún medio.
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster