Re: Gluster 3.7.13 with nfs-ganesha 2.3.0.1

Soumya Koduri <skoduri@xxxxxxxxxx> · Wed, 3 Aug 2016 00:01:04 +0530

Hi,

Does your test involve multiple multiple ganesha servers or removing files?

http://review.gluster.org/#/c/14522/ (merged in 3.7.12)  caused a 
regression in upcall processing of nfs-ganesha. It is being fixed as 
part of http://review.gluster.org/14701 .

Could you please turn off upcalls (using below cmd) and re-try the tests.

cmd: gluster v set <volname> features.cache-invalidation off

Thanks,
Soumya

On 08/02/2016 11:48 PM, ML Wong wrote:
When i have used the packages from "centos-gluster37" to setup my
Gluster with ZFS backend, ganesha-nfsd will throw me a ABRT signal when
i tried to copy, or simply rsync a directory to the share exported from
nfs-ganesha.

Environment:
CentOS 7 - kernel 3.10.0-327.22.2.el7.x86_64
ZFS Version: 0.6.5.7 Release     : 1.el7.centos
Gluster 3.7.13.1.el7 from centos-gluster37
nfs-ganesha 2.3.0.1.el7 from centos-gluster37

This is the only line i got from strace, off from the PID of gaensha-nfsd

futex(0x7f623ffff9d0, FUTEX_WAIT, 38303, NULL <detached ...>

In ganesha-gfapi.log, when i try to copy files - the log will pop up the
following entries, which keep complaining split-brain, and
stale-file-handle, issues in the Gluster volume.

[2016-08-01 23:06:09.423901] W [MSGID: 108008]
[afr-read-txn.c:244:afr_read_txn] 0-nfsvol1-replicate-0: Unreadable
subvolume -1 found with event generation 2 for gfid
3f713211-7573-45b1-aed8-503c8e17714b. (Possible split-brain)

[2016-08-01 23:06:09.425664] E [MSGID: 109040]
[dht-helper.c:1190:dht_migration_complete_check_task] 0-nfsvol1-dht:
<gfid:3f713211-7573-45b1-aed8-503c8e17714b>: failed to lookup the file
on nfsvol1-dht [Stale file handle]

I tried with both Gluster 3.7.13, and 3.7.12, these versions both give
me the same problem. Until i downgrade Gluster to 3.7.11, nfs-ganesha
then plays nicely with Gluster. I once wondered if that's related to my
ZFS backend setup, then i set something quick in my laptop using XFS as
the backend with 3 nodes running Gluster 3.7.13, and nfs-ganesha 2.3.0,
and i got the same result. Rsync/Copy files to the NFS shares exported
from Gluster+Ganesha aborted after a few files got copied. For your
reference, pacemaker+corosync are both still running in the background
even when this happens.

I am wondering if there are something introduced since 3.7.12, which
somehow breaks the interface between nfs-ganesha, and Gluster. Any
pointers will be appreciated.

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users