I've since ordered a different switch, the same manufacturer as the HBAs. We have decided to rebuild the lab since we were having issues with oVirt as well. We can disregard this, unless the issue is reproducable with the new equipment, I believe it is equipment related.
On Thu, Aug 11, 2016 at 2:53 AM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote:
Added Rafi, Raghavendra who work on RDMAOn Mon, Aug 8, 2016 at 7:58 AM, Dan Lavu <dan@xxxxxxxxxx> wrote:______________________________So a volume looks like the following, (please if there is anything I need to adjust, the settings was pulled from several examples)Then the infiniband device will get bounced and VMs will get stuck.Hello,I'm having some major problems with Gluster and oVirt, I've been ripping my hair out with this, so if anybody can provide insight, that will be fantastic. I've tried both transports TCP and RDMA... both are having instability problems.
So the first thing I'm running into, intermittently, on one specific node, will get spammed with the following message;
"[2016-08-08 00:42:50.837992] E [rpc-clnt.c:357:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_l og_callingfn+0x1a3)[0x7fb728b0 f293] (--> /lib64/libgfrpc.so.0(saved_fra mes_unwind+0x1d1)[0x7fb7288d73 d1] (--> /lib64/libgfrpc.so.0(saved_fra mes_destroy+0xe)[0x7fb7288d74e e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ connection_cleanup+0x7e)[0x7fb 7288d8d0e] (--> /lib64/libgfrpc.so.0(rpc_clnt_ notify+0x88)[0x7fb7288d9528] ))))) 0-vmdata1-client-0: forced unwinding frame type(GlusterFS 3.3) op(WRITE(13)) called at 2016-08-08 00:42:43.620710 (xid=0x6800b)" Another problem I'm seeing, once a day, or every two days, an oVirt node will hang on gluster mounts. Issuing a df to check the mounts will just stall, this occurs hourly if RDMA is used. I can log into the hypervisor remount the gluster volumes most of the time.This is on Fedora 23; Gluster 3.8.1-1, the Infiniband gear is 40Gb/s QDR Qlogic, using the ib_qib module, this configuration was working with our old infinihost III. I couldn't get OFED to compile so all the infiniband modules are Fedora installed.
Volume Name: vmdata_ha
Type: Replicate
Volume ID: 325a5fda-a491-4c40-8502-f89776a3c642
Status: Started
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp,rdma
Bricks:
Brick1: deadpool.ib.runlevelone.lan:/gluster/vmdata_ha
Brick2: spidey.ib.runlevelone.lan:/gluster/vmdata_ha
Brick3: groot.ib.runlevelone.lan:/gluster/vmdata_ha (arbiter)
Options Reconfigured:
performance.least-prio-threads: 4
performance.low-prio-threads: 16
performance.normal-prio-threads: 24
performance.high-prio-threads: 24
cluster.self-heal-window-size: 32
cluster.self-heal-daemon: on
performance.md-cache-timeout: 1
performance.cache-max-file-size: 2MB
performance.io-thread-count: 32
network.ping-timeout: 5
performance.write-behind-window-size: 4MB
performance.cache-size: 256MB
performance.cache-refresh-timeout: 10
server.allow-insecure: on
network.remote-dio: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
nfs.disable: on
config.transport: tcp,rdma
performance.stat-prefetch: off
cluster.eager-lock: enable
Volume Name: vmdata1
Type: Distribute
Volume ID: 3afefcb3-887c-4315-b9dc-f4e890f786eb
Status: Started
Number of Bricks: 2
Transport-type: tcp,rdma
Bricks:
Brick1: spidey.ib.runlevelone.lan:/gluster/vmdata1
Brick2: deadpool.ib.runlevelone.lan:/gluster/vmdata1
Options Reconfigured:
config.transport: tcp,rdma
network.remote-dio: enable
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
nfs.disable: on
storage.owner-gid: 36
storage.owner-uid: 36
performance.readdir-ahead: on
server.allow-insecure: on
performance.stat-prefetch: off
performance.cache-refresh-timeout: 10
performance.cache-size: 256MB
performance.write-behind-window-size: 4MB
network.ping-timeout: 5
performance.io-thread-count: 32
performance.cache-max-file-size: 2MB
performance.md-cache-timeout: 1
performance.high-prio-threads: 24
performance.normal-prio-threads: 24
performance.low-prio-threads: 16
performance.least-prio-threads: 4 /etc/glusterfs/glusterd.vol
volume management
type mgmt/glusterd
option working-directory /var/lib/glusterd
option transport-type socket,tcp
option transport.socket.keepalive-time 10
option transport.socket.keepalive-interval 2
option transport.socket.read-fail-log off
option ping-timeout 0
option event-threads 1
# option rpc-auth-allow-insecure on
option transport.socket.bind-address 0.0.0.0
# option transport.address-family inet6
# option base-port 49152
end-volumeI think that's a good start, thank you so much for taking the time to look at this. You can find me on freenode, nick side_control if you want to chat, I'm GMT -5.Cheers,Dan_________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
--Pranith
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users