NFS replacement, rest stopped

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

as told earlier we tried to replace a nfs-server/client combination in
semi-production environment with a trivial one-server gluster setup. We
thought at first that this pretty simple setup would allow some more testing.
Unfortunately we have to stop those tests because it turns out that the client
system has troubles with networking as soon as we start glusterfs.
The client has three network cards, first is for internet use, second is for
connection to glusterfs-server, third for collecting data from several other
boxes.
It turned out that the third interface had troubles soon after we started to
work with glusterfs. We could not ping several hosts on the same lan, or
packet delay was very high (up to 20 s).
The effects were pretty weird, looked like a bad interface card. But switching
back to kernel-nfs everything went back to normal.
It really looks like glusterfs client has some problems, too. It looks like
buffer re-usage or mem thrashing or pointer mixup or the like.
Interestingly no problems were visible on the interface where the glusterfs
was happening, I have no idea how something like this happens.
Anyway, the story looks like someone will tell me it is the kernel networking
that has troubles, just like reiserfs that has troubles or ext3 :-(
To give you an idea what ugly things look like:

Aug 31 08:20:16 heather kernel: ------------[ cut here ]------------
Aug 31 08:20:16 heather kernel: WARNING: at net/ipv4/tcp.c:1405 tcp_recvmsg+0x1c7/0x7b6()
Aug 31 08:20:16 heather kernel: Hardware name: empty
Aug 31 08:20:16 heather kernel: Modules linked in: nfs lockd nfs_acl sunrpc fuse loop i2c_i801 e100 i2c_core e1000e
Aug 31 08:20:16 heather kernel: Pid: 31500, comm: netcat Not tainted 2.6.30.5 #1
Aug 31 08:20:16 heather kernel: Call Trace:
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ? tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ? tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff8023282d>] ? warn_slowpath_common+0x77/0xa3
Aug 31 08:20:16 heather kernel:  [<ffffffff80431497>] ? tcp_recvmsg+0x1c7/0x7b6
Aug 31 08:20:16 heather kernel:  [<ffffffff80401340>] ? sock_common_recvmsg+0x30/0x45
Aug 31 08:20:16 heather kernel:  [<ffffffff8029b3d8>] ? mnt_drop_write+0x25/0x12e
Aug 31 08:20:16 heather kernel:  [<ffffffff803fee67>] ? sock_aio_read+0x109/0x11d
Aug 31 08:20:16 heather kernel:  [<ffffffff80287131>] ? do_sync_read+0xce/0x113
Aug 31 08:20:16 heather kernel:  [<ffffffff80244348>] ? autoremove_wake_function+0x0/0x2e
Aug 31 08:20:16 heather kernel:  [<ffffffff80293243>] ? poll_select_copy_remaining+0xd0/0xf3
Aug 31 08:20:16 heather kernel:  [<ffffffff80287b83>] ? vfs_read+0xbd/0x133
Aug 31 08:20:16 heather kernel:  [<ffffffff80287cb5>] ? sys_read+0x45/0x6e
Aug 31 08:20:16 heather kernel:  [<ffffffff8020ae6b>] ? system_call_fastpath+0x16/0x1b
Aug 31 08:20:16 heather kernel: ---[ end trace 31e61d5bab6e7cc0 ]---

Hopefully you would not tell that netcat has problems, or not?
Hopefully we can agree on the fact that there are nasty things going on inside this code and someone with better brain and kernel knowledge than me should give it a very close look.

-- 
Regards,
Stephan



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux