Simultaneous reads and writes from specific apps to IPoIB volume seem to conflict and kill performance.

hjmangalam at gmail.com (Harry Mangalam) · Mon, 23 Jul 2012 16:59:03 -0700

I have fairly new gluster fs of 4 nodes with 2  RAID6 bricks on each
node connected to a cluster via IPoIB on QDR IB.
The servers are all SL6.2, running gluster 3.3-1; the clients are
running the gluster-released glusterfs-fuse-3.3.0qa42-1 &
glusterfs-3.3.0qa42-1.

The volume seems normal:
$ gluster volume info
Volume Name: gl
Type: Distribute
Volume ID: 21f480f7-fc5a-4fd8-a084-3964634a9332
Status: Started
Number of Bricks: 8
Transport-type: tcp,rdma
Bricks:
Brick1: bs2:/raid1
Brick2: bs2:/raid2
Brick3: bs3:/raid1
Brick4: bs3:/raid2
Brick5: bs4:/raid1
Brick6: bs4:/raid2
Brick7: bs1:/raid1
Brick8: bs1:/raid2
Options Reconfigured:
performance.cache-size: 268435456
nfs.disable: on
performance.io-cache: on
performance.quick-read: on
performance.io-thread-count: 64
auth.allow: 10.2.*.*,10.1.*.*

The logs on both the server and client are remarkable in their lack of
anything amiss (the server has the previously reported zillion times
repeating string of:

    I [socket.c:1798:socket_event_handler] 0-transport: disconnecting now

which seems to be correlated with turning the NFS server off.  This
has been mentioned before.

The gluster volume log, stripped of that line is here:
<http://pastie.org/4309225>

Individual large-file reads and writes are in the >300MB/s range which
is not magnificent but tolerable. However, we've recently detected
what appears to be a conflict in reading and writing for some
applications.  When some applications are reading and writing to the
gluster fs, the client
/usr/sbin/glusterfs increases its CPU consunmption to >100% and the IO
goes to almost zero.

When the inputs are on the gluster fs and the output is on another fs,
performance is as good as on a local RAID.
This seems to be specific to a particular application (bedtools,
perhaps some other openmp genomics apps - still checking).  Other
utilities  (cp,  perl, tar, and other utilities ) that read and write
to the gluster filesystem seem to be able to push and pull fairly
large amount of data to/from it.

The client is running a genomics utility (bedtools) which reads a very
large chunks  of data from the gluster fs, then aligns it to a
reference genome.  Stracing the run yields this stanza, after which it
hangs until I kill it.  The user has said that it does complete but at
a speed hundreds of times slower (maybe timing out at each step..?)

open("/data/users/tdlong/bin/genomeCoverageBed", O_RDONLY) = 3
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fffcf0e5bb0) = -1 ENOTTY
(Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "#!/bin/sh\n${0%/*}/bedtools genom"..., 80) = 42
lseek(3, 0, SEEK_SET)                   = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=4*1024, rlim_max=4*1024}) = 0
dup2(3, 255)                            = 255
close(3)                                = 0
fcntl(255, F_SETFD, FD_CLOEXEC)         = 0
fcntl(255, F_GETFL)                     = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(255, {st_mode=S_IFREG|0755, st_size=42, ...}) = 0
lseek(255, 0, SEEK_CUR)                 = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "#!/bin/sh\n${0%/*}/bedtools genom"..., 42) = 42
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2ae9318729e0) = 8229
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436f40, [], SA_RESTORER, 0x3cb64302d0},
{SIG_DFL, [], SA_RESTORER, 0x3cb64302d0}, 8) = 0
wait4(-1,

Does this indicate any optional tuning or operational parameters that
we should be using?

hjm

-- 
Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine
[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487
415 South Circle View Dr, Irvine, CA, 92697 [shipping]
MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps)