Hi, I have a working glusterfs setup running on Centos 5.3 with glusterfs-2.0.4 (compiled from the source RPM) fuse-2.7.4-1 dkms-fuse-2.7.4-1.rf autofs-5.0.1-0.rc2.102 kernel 2.6.18-128.1.10.el5 and this all works just fine - autofs mounts the file system as you would expect and this has been in production for some time. However, if I try and upgrade any of the components, it breaks in that the autofs mount will hang rather than completing the mount. Mounting the file system by hand with an explicit mount command always works correctly. I've tried several versions of glusterfs later than the above including the latest 3.0.2-1 with exactly the same result. Additionally keeping that version of gluster and updating any of the other components also seems to break it, although I've not been able to test all the combinations - certainly the following set doesn't work either: glusterfs-3.0.2-1 dkms-fuse-2.7.4-1.nodist.rf fuse-2.7.4-8.el5 autofs-5.0.1-0.rc2.131.el5_4.1 2.6.18-164.11.1.el5 I wonder if someone on the list can help me, as I've seen nothing in bugzilla relating to this. Relevant information follows (for my test rig only) Server volfile is: ---snip--- [l3admin at oy-centos-5_3-buildserver glusterfs]$ cat /etc/glusterfs/glusterfsd.vol ## Export volume "images-brick" with the contents of /export/images directory volume posix type storage/posix option directory /export/shared/ end-volume volume locks type features/locks subvolumes posix end-volume volume server type protocol/server option transport-type tcp/server subvolumes locks option auth.addr.locks.allow * end-volume ---snip--- Client volfile: ---snip--- [l3admin at oy-centos-5_3-buildserver glusterfs]$ cat /etc/glusterfs/glusterfs.vol volume oy-centos-5_3-buildserver type protocol/client option transport-type tcp/client option remote-host 127.0.0.1 option remote-subvolume locks end-volume ---snip--- /etc/auto.master has the following: ---snip--- /mnt/auto /etc/auto.d/auto.gluster --timeout=60 --ghost ---snip--- and auto.gluster has ---snip--- # Mount the glustered file system shared -fstype=glusterfs :/etc/glusterfs/glusterfs.vol ---snip--- Mounting the gluster file system directly works fine: ---snip--- sudo mount -t glusterfs /etc/glusterfs/glusterfs.vol /mnt/auto/shared/ [l3admin at oy-centos-5_3-buildserver ~]$ df /mnt/auto/shared Filesystem 1K-blocks Used Available Use% Mounted on glusterfs#/etc/glusterfs/glusterfs.vol 2031360 543744 1382656 29% /mnt/auto/shared ---snip--- starting autofs and attempting to access the mounted directory eg ls /mnt/auto/shared/ causes the glusterfs to hang leaving a process list like this: [l3admin at oy-centos-5_3-buildserver glusterfs]$ pstree -p | grep glu |-automount(22819)-+-mount(22830)---mount.glusterfs(22831)---glusterfs(22880)---glusterfs(22881) |-glusterfs(22882)---{glusterfs}(22883) |-glusterfsd(22356)---{glusterfsd}(22357) Running gdb against 22882 during the hang shows: (gdb) bt #0 0x00c81402 in __kernel_vsyscall () #1 0x00ee7473 in __xstat64 at GLIBC_2.1 () from /lib/libc.so.6 #2 0x00df66ec in stat64 () from /usr/lib/glusterfs/3.0.0/xlator/mount/fuse.so #3 0x00df419c in init (this_xl=0x88db1f8) at fuse-bridge.c:3368 #4 0x00b2293d in xlator_init (xl=0x88db1f8) at xlator.c:940 #5 0x00b22583 in xlator_init_rec (xl=0x88db1f8) at xlator.c:833 #6 0x00b226e6 in xlator_tree_init (xl=0x88db1f8) at xlator.c:871 #7 0x0804b299 in _xlator_graph_init () #8 0x0804b433 in glusterfs_graph_init () #9 0x0804d40c in main () (gdb) directory /home/l3admin/rpmbuild/BUILD/glusterfs-3.0.0/glusterfsd/src Source directories searched: /home/l3admin/rpmbuild/BUILD/glusterfs-3.0.0/glusterfsd/src:$cdir:$cwd (gdb) list *0x00df419c 0xdf419c is in init (fuse-bridge.c:3368). 3363 gf_log ("fuse", GF_LOG_ERROR, 3364 "Mandatory option 'mountpoint' is not specified."); 3365 goto cleanup_exit; 3366 } 3367 3368 if (stat (value_string, &stbuf) != 0) { 3369 if (errno == ENOENT) { 3370 gf_log (this_xl->name, GF_LOG_ERROR, 3371 "%s %s does not exist", 3372 ZR_MOUNTPOINT_OPT, value_string); (gdb) (gdb) select-frame 3 (gdb) print value_string $1 = 0x88da198 "/mnt/auto/shared" By the time I'd got here, the spawned process on 22883 had died (is there a watchdog of some sort?) so I repeated the exercise and ran gdb on the watchdog process (which I think was pid 22881) getting this: (gdb) bt #0 0x0013d402 in __kernel_vsyscall () #1 0x001cf996 in nanosleep () from /lib/libc.so.6 #2 0x0020915c in usleep () from /lib/libc.so.6 #3 0x00d7fa2d in gf_timer_proc (ctx=0x8560008) at timer.c:177 #4 0x0068573b in start_thread () from /lib/libpthread.so.0 #5 0x0020fcfe in clone () from /lib/libc.so.6 And so I presume that this is waiting for some communication from the process which spawned it, indicating that the mount was complete??? Regards to all Phil -- Director, Layer3 Systems Ltd Layer3 Systems Limited is registered in England. Company no 3130393 43 Pendle Road, Streatham, London, SW16 6RT tel: 020 8769 4484