Re: GFS directory freezing unexpectedly under pressure ...

tam_annie@xxxxxxxxxxxxx · Sun, 04 Nov 2007 20:40:20 +0100

Well, I really think so: I always used 'gfs' in my command lines, never 
'gfs2'  ...; gfs_fsck runs well on that filesystem, while gfs2_fsck returns:

[root@orarac1 ~]# gfs2_fsck /dev/vg_share/lv_share_1
Initializing fsck
Old gfs1 file system detected.

However, the same question you're now asking to me came into my own mind 
when I wrote the post: nobody seems to have problems with GFS1, only 
with GFS2 (which is not yet available for production); maybe I'm messing 
up something ...

To make my doubts even greater, I noticed (with my great surprise) that 
my gfs kernel module is using the gfs2 kernel module:

[root@orarac1 ~]# lsmod
Module                  Size  Used by
....
gfs                   302204  2
lock_dlm               55385  3
gfs2                  522965  2 gfs,lock_dlm
dlm                   131525  24 lock_dlm
configfs               62301  2 dlm
vmnet                 106288  3
vmmon                 176716  0
sunrpc                195977  1
ipv6                  410017  22
cpufreq_ondemand       40401  2
dm_mirror              60993  0
dm_mod                 93841  6 dm_mirror
video                  51273  0
sbs                    49921  0
i2c_ec                 38593  1 sbs
i2c_core               56129  1 i2c_ec
button                 40545  0
battery                43849  0
asus_acpi              50917  0
acpi_memhotplug        40133  0
ac                     38729  0
parport_pc             62313  0
lp                     47121  0
parport                73165  2 parport_pc,lp
k8_edac                49537  0
edac_mc                58657  1 k8_edac
shpchp                 70765  0
bnx2                  119057  0
pcspkr                 36289  0
serio_raw              40517  0
sg                     69737  0
qla2400               242944  0
qla2300               159360  0
usb_storage           116257  0
cciss                  92361  4
ext3                  166609  2
jbd                    93873  1 ext3
ehci_hcd               65229  0
ohci_hcd               54493  0
uhci_hcd               57433  0
qla2xxx               309664  7 qla2400,qla2300
sd_mod                 54081  9
scsi_mod              184057  5 sg,usb_storage,cciss,qla2xxx,sd_mod
qla2xxx_conf          334856  1
intermodule            37508  2 qla2xxx,qla2xxx_conf

Is it possible that I'm wrongly using GFS via GFS2 (or something like 
that) and that my configuration is not stable simply because the gfs2 
kernel module is not yet ready for production? Maybe I have to change 
something ...

I'd greatly appreciate your sharing your working GFS1 configuration with 
me, if it's possible.
Thank you
Tyzan

[root@orarac1 ~]# gfs_tool df
/share:
 SB lock proto = "lock_dlm"
 SB lock table = "orarac:gfs_share_1"
 SB ondisk format = 1309
 SB multihost format = 1401
 Block size = 4096
 Journals = 10
 Resource Groups = 796
 Mounted lock proto = "lock_dlm"
 Mounted lock table = "orarac:gfs_share_1"
 Mounted host data = "jid=1:id=196610:first=0"
 Journal number = 1
 Lock module flags = 0
 Local flocks = FALSE
 Local caching = FALSE
 Oopses OK = FALSE

 Type           Total          Used           Free           use%
 ------------------------------------------------------------------------
 inodes         30             30             0              100%
 metadata       13197          12193          1004           92%
 data           52080821       6084973        45995848       12%

Gordan Bobic wrote:
Are you sure you are using GFS1 and not GFS2? I've experienced that 
problem with GFS2, but not with GFS1.

Gordan

tam_annie@xxxxxxxxxxxxx wrote:
Hi everybody,

   when my GFS (v. 1) filesystems experience some heavy load (ex. 
vmware virtual machine OS installation, oracle rman backup using gfs 
filesystems as flash recovery area), they "freeze" unexpectedly.
   More precisely, not the whole gfs filesystem freezes, but only the 
directory interested by the load: I can't even ls the contents of 
that directory, everything interesting it seems to hang hopelessly. I 
can't find any related errors in my logs; cluster utilities output 
(clustat, group_tool -v, cman_tool nodes) looks absolutely normal ... 
(no fencing is occurring): I can even go on working on the other 
directories of the same gfs!!!
The only way out I've found is to restart the cluster. I can 
reproduce deterministically the problem, but I don't know how to 
debug it.

  I noted that the problem arises on both my 2-node and my 1-node 
cluster, either when I mount gfs with 'noquota,noatime' or not.

  Your help is my hope:
  thank you in advance!
  Tyzan

___________________________________________________________________________________________________ 

Linux xxxxxxxxxxxxxxxx 2.6.18-8.1.8.el5 #1 SMP Tue Jul 10 06:39:17 
EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

lvm2-cluster-2.02.16-3.el5
kmod-gfs-0.1.16-5.2.6.18_8.1.8.el5
gfs2-utils-0.1.25-1.el5
gfs-utils-0.1.11-3.el5
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5

  [root@orarac1 ~]# gfs_tool gettune /share
ilimit1 = 100
ilimit1_tries = 3
ilimit1_min = 1
ilimit2 = 500
ilimit2_tries = 10
ilimit2_min = 3
demote_secs = 300
incore_log_blocks = 1024
jindex_refresh_secs = 60
depend_secs = 60
scand_secs = 5
recoverd_secs = 60
logd_secs = 1
quotad_secs = 5
inoded_secs = 15
quota_simul_sync = 64
quota_warn_period = 10
atime_quantum = 3600
quota_quantum = 60
quota_scale = 1.0000   (1, 1)
quota_enforce = 1
quota_account = 1
new_files_jdata = 0
new_files_directio = 0
max_atomic_write = 4194304
max_readahead = 262144
lockdump_size = 131072
stall_secs = 600
complain_secs = 10
reclaim_limit = 5000
entries_per_readdir = 32
prefetch_secs = 10
statfs_slots = 64
max_mhc = 10000
greedy_default = 100
greedy_quantum = 25
greedy_max = 250
rgrp_try_threshold = 100

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster