Glusterfs 3.1.2 NFS offline when one brick is down

Etienne.Lyard at unige.ch (Etienne Lyard) · Thu, 10 Mar 2011 17:25:27 +0100

Hello,

I tried more or less the same setup as you did (CentOS 5.5, gluster 3.1.2) but I got different behaviour: in replica, on single brick down didn't disturb the filesystem. However, in distribute mode, on brick down did take the whole filesystem down.

Cheers,

Etienne
---
Etienne Lyard                
ISDC, Observatoire de Geneve                     
Chemin d'Ecogia 16, CH-1290 VERSOIX
+41 22 379 21 14 (direct)
+41 22 379 21 00 (switchboard)
+41 22 379 21 33 (fax)
http://www.isdc.unige.ch/

Date: Thu, 10 Mar 2011 05:33:28 +0800
From: Hugh Zhu <itgs_zhu at hotmail.com>
Subject: Glusterfs 3.1.2 NFS offline when one brick is
	down
To: <gluster-users at gluster.org>
Message-ID: <SNT143-w5BB487B6553310F6EAF26FEC90 at phx.gbl>
Content-Type: text/plain; charset="gb2312"

Hi Everyone,

I followed the simple documentation for 3.1 to setup two boxes in replication mode, and mounted NFS on ESXi 4.1.0 Everything worked right away until I took the box that ESXi is not directly pointed to. The whole NFS is not accessible until the box come back online.

The boxes are running CentOS 5.5. My installation steps are:

1. Install CentOS binary
2. Create trusted storage pool
3. Create replica 2 volume, start the volume

Then I mount the first box through NFS on ESXi, boom, everything started to work. Thanks to Gluster team's great work. This is by far the quickest and easiest open-source installation I have used.

When the second box, which my ESXi is not pointed to, is unplugged or shutdown, nfs.log on the first box gets following logs when the whole NFS become inaccessible.

[2011-03-09 14:16:13.864156] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x391a60f779] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x391a60ef2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x391a60ee9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-03-09 14:16:01.154175
[2011-03-09 14:16:13.864207] E [rpc-clnt.c:338:saved_frames_unwind] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0xb9) [0x391a60f779] (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7e) [0x391a60ef2e] (-->/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x391a60ee9e]))) rpc-clnt: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2011-03-09 14:16:10.154410
[2011-03-09 14:16:13.864244] I [client.c:1590:client_rpc_notify] test-volume-client-1: disconnected

Has anyone else run into this?

Thanks in advance.

Hugh