I am running a cluster with GFS-formatted file systems mounted on multiple
nodes. What I was hoping to do was to set up one node running httpd to be
my webserver and another node running samba to share the same data
internally.
What I am getting when running that is instability. The samba serving node
keeps crashing. I have heartbeat set up so that failover happens to the
webserver node, at which point the system apparently behaves well.
After reading a few articles on the list it seemed to me that the problem
might be samba using oplocks or some other caching mechanism that breaks
synchronization. I tried turning oplocks=off in my smb.conf file, but that
made the system unusably slow (over 3 minutes to right-click on a two-meg
file).
I am also not sure that is the extent of the problem, as I seem to be able
to re-create the crash simply by accessing the same file on multiple
clients just via samba (which locking should be able to handle). If the
problem were merely that the remote node and the samba node were both
accessing an oplocked file I could understand, but that doesn't always seem
to be the case.
has anyone had any success running the same type of setup? I am also
serving nfs on the samba server, though with very little load there.
below is the syslog output of a crash. I'm running 2.6.8-1.521smp with a
GFS CVS dump from mid-september.
-alan
Code: 8b 03 0f 18 00 90 3b 5c 24 04 75 97 8b 04 24 5b 5e 5b 5e 5f
<1>Unable to handle kernel paging request at virtual address 00100100
printing eip:
f2ef1e8d
*pde = 00003001
Oops: 0000 [#3]
SMP
Modules linked in: udf nfsd exportfs lock_dlm(U) dlm(U) cman(U) gfs(U) lock_harness(U) nfs lockd sunrpc tg3 floppy sg microcode joydev dm_mod ohci_hcd ext3 jbd aacraid megaraid sd_mod scsi_mod
CPU: 0
EIP: 0060:[<f2ef1e8d>] Not tainted
EFLAGS: 00010246 (2.6.8-1.521smp)
EIP is at query_lkb_queue+0x85/0x9b [dlm]
eax: ccf485d8 ebx: 00100100 ecx: 00000000 edx: 00000100
esi: 13012e48 edi: 00000000 ebp: 00000130 esp: 13012dc4
ds: 007b es: 007b ss: 0068
Process smbd (pid: 13049, threadinfo=13012000 task=7617b1f0)
Stack: 00000000 4543aad0 00000130 950670d8 13012e48 3644d458 f2ef209e
13012e48
00000000 00000000 f2ef133d 34326633 68478400 950670d8 00000137 000000d0
ef239980 dea26800 13012e48 00000380 f2b79169 13012e48 f2b7905d be437380
Call Trace:
[<f2ef209e>] query_locks+0x6f/0xad [dlm]
[<f2ef133d>] dlm_query+0x155/0x238 [dlm]
[<f2b79169>] get_conflict_global+0x104/0x2ae [lock_dlm]
[<f2b7905d>] query_ast+0x0/0x8 [lock_dlm]
[<0227c989>] release_sock+0xa5/0xab
[<f2b794c2>] lm_dlm_plock_get+0xcb/0x10f [lock_dlm]
[<f314b4e1>] do_plock+0xc2/0x171 [gfs]
[<f314b5d4>] gfs_lock+0x44/0x52 [gfs]
[<f314b590>] gfs_lock+0x0/0x52 [gfs]
[<02170571>] fcntl_getlk64+0x75/0x12e
[<02170841>] fcntl_setlk64+0x217/0x221
[<0216c7e0>] sys_fcntl64+0x4d/0x7b