KERBOOM
[michael@fleming1 ~]$ sudo mount -a -t nfs [sudo] password for michael: mount: fearless1:/gv0 failed, reason given by server: No such file or directory mount: fearless1:/gv0/fleming1/db0/ALTUS_config failed, reason given by server: unknown nfs status return value: 22 mount: fearless1:/gv0/fleming1/db0/ALTUS_data failed, reason given by server: unknown nfs status return value: 22 mount: fearless1:/gv0/fleming1/db0/ALTUS_flash failed, reason given by server: unknown nfs status return value: 22 mount.nfs: mount point /db/flash_recovery_area/ALTUS/onlinelog does not exist nfs.log: [2013-04-12 15:55:16.507084] E [nfs3.c:305:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c) [0x7f45bfbb852c] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29) [0x7f45bfbb2ce9] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51) [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl [2013-04-12 15:55:16.538560] E [nfs3.c:4706:nfs3_fsinfo] 0-nfs-nfsv3: Bad Handle [2013-04-12 15:55:16.538580] W [nfs3-helpers.c:3389:nfs3_log_common_res] 0-nfs-nfsv3: XID: 242c1550, FSINFO: NFS: 10001(Illegal NFS file handle), POSIX: 14(Bad address) [2013-04-12 15:55:16.538617] E [nfs3.c:305:__nfs3_get_volume_id] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c) [0x7f45bfbb852c] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29) [0x7f45bfbb2ce9] (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51) [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl (I tried both with and without modifying your uint32_t size to a 'int32_t size' to correct the signedness of the argument) Get ahold of me in IRC and let's get this figured out. I've got a debugger attached. M. On 13-04-12 11:32 AM, Niels de Vos wrote: On Fri, Apr 12, 2013 at 05:23:08PM +0200, Niels de Vos wrote:On Thu, Apr 11, 2013 at 12:37:30PM -0400, Michael Brown wrote:That actually broke everything (including Linux trying to mount NFS). I've modified it slightly to be: bool_t xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp) { if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *) &objp->data.data_len, NFS3_FHSIZE)) if (!xdr_opaque (xdrs, &objp, (u_int *) &objp->data.data_len)) return FALSE; return TRUE; } (i.e. only call the xdr_opaque function if the xdr_bytes decode fails)Nah, that won't work. The xdr_* functions are modifying the position of the cursor in the XDR-stream. Subsequent reads will continue where the previous one finished. What you probably need to do is something like this: xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp) { uint32_t size; if (!xdr_int (xdrs, &size)) if (!xdr_opaque (xdrs, (u_int *)&objp->data.data_len, size))^ that should be objp->data.data_val of course :-/return FALSE return TRUE; } That will read the size of the fhandle first, to determine how long the opaque fhandle is, and use that size to read it. Cheers, NielsBut I get no change in behaviour. Also get these warnings: xdr-nfs3.c: In function 'xdr_nfs_fh3': xdr-nfs3.c:197: warning: passing argument 2 of 'xdr_opaque' from incompatible pointer type /usr/include/rpc/xdr.h:313: note: expected 'caddr_t' but argument is of type 'struct nfs_fh3 **' xdr-nfs3.c:197: warning: passing argument 3 of 'xdr_opaque' makes integer from pointer without a cast /usr/include/rpc/xdr.h:313: note: expected 'u_int' but argument is of type 'u_int *' M. On 13-04-11 07:42 AM, Niels de Vos wrote:My guess is that this (untested) change would fix it, can you try that? --- a/rpc/xdr/src/xdr-nfs3.c +++ b/rpc/xdr/src/xdr-nfs3.c @@ -184,7 +184,7 @@ xdr_specdata3 (XDR *xdrs, specdata3 *objp) bool_t xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp) { - if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *) &objp->data.data_len, NFS3_FHSIZE)) + if (!xdr_opaque (xdrs, &objp, (u_int *) &objp->data.data_len)) return FALSE; return TRUE; } HTH, NielsAll I get out of gluster is: [2013-04-08 12:54:32.206312] E [nfs3.c:4741:nfs3svc_fsinfo] 0-nfs-nfsv3: Error decoding arguments I've attached abridged packet captures and text explanations of the packets (thanks to wireshark). Can someone please look at this and determine if it's gluster's parsing of the RPC call to blame, or if it's Oracle? This is the same setup on which I reported the NFS race condition bug. It does have that patch applied. Details: http://lists.gnu.org/archive/html/gluster-devel/2013-04/msg00014.html Thanks, Michael -- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ?: +1 519 883 1172 x5106 | termination of their C programs.' - Firth_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx https://lists.nongnu.org/mailman/listinfo/gluster-devel-- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth-- Niels de Vos Sr. Software Maintenance Engineer Support Engineering Group Red Hat Global Support Services _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxx https://lists.nongnu.org/mailman/listinfo/gluster-devel -- Michael Brown | `One of the main causes of the fall of Systems Consultant | the Roman Empire was that, lacking zero, Net Direct Inc. | they had no way to indicate successful ☎: +1 519 883 1172 x5106 | termination of their C programs.' - Firth |