Re: Problem with autofs configuration - sometimes mount does not complete fast enough?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/06/2009 11:19 PM, Mark Mielke wrote:
On 09/06/2009 10:42 PM, Mark Mielke wrote:
This seems to happen about 50% of the time:

[root@wcarh035 ~]# ls /gluster/data
ls: cannot open directory /gluster/data: No such file or directory
[root@wcarh035 ~]# ls /gluster/data
00      06.fun  15      23.fun  32      40.fun  47      55.fun  64
00.fun  07      15.fun  24      32.fun  41      47.fun  56      64.fun

My current guess is that GlusterFS is saying the mount is complete to AutoFS before the actual mount operation takes effect. 50% of the time GlusterFS is able to complete the mount before AutoFS let's the user continue, and all is well. The other 50% of the time, GlusterFS does not quite finish the mount, and AutoFS gives the user a broken directory.

I might try and prove this by adding a sleep 5 to /sbin/mount.glusterfs, although I do not consider this a valid solution, as it just reduces the effect of the race - it does not eliminate the race.

Uhh... Hmm... It already has a "sleep 3", and changing it to "sleep 5" does not reduce the frequency of the problem. Changing it to "sleep 10" also has no effect.

Why does it sometimes work and sometimes not?

I note that the fusermount from the FUSE libraries does not seem to have the same problem:

$ /stage/linux/fuse-2.7.4/example/fusexmp_fh /tmp/t ; ls /tmp/t
backup/ boot/ etc/ lib64/ media/ pccyber/ sbin/ stage/ usr/ backup2/ db/ home/ lost+found/ mnt/ proc/ selinux/ sys/ var/ bin/ dev/ lib/ mail/ opt/ root/ srv/ tmp/ www/

It works immediately. Compare this to:

[root@wcarh033]~# echo hi >/tmp/t/hi
[root@wcarh033]~# time /opt/glusterfs/sbin/glusterfs --volfile=/etc/glusterfs/gluster-data.vol /tmp/t ; ls /tmp/t ; sleep 1 ; ls /tmp/t /opt/glusterfs/sbin/glusterfs --volfile=/etc/glusterfs/gluster-data.vol /tmp/ 0.00s user 0.00s system 113% cpu 0.003 total
hi
00      06.fun  15      23.fun  32      40.fun  47      55.fun  64
00.fun  07      15.fun  24      32.fun  41      47.fun  56      64.fun
01      07.fun  16      24.fun  33      41.fun  50      56.fun  65
01.fun  10      16.fun  25      33.fun  42      50.fun  57      65.fun
02      10.fun  17      25.fun  34      42.fun  51      57.fun  66
02.fun  11      17.fun  26      34.fun  43      51.fun  60      66.fun
03      11.fun  20      26.fun  35      43.fun  52      60.fun  67
03.fun  12      20.fun  27      35.fun  44      52.fun  61      67.fun
04      12.fun  21      27.fun  36      44.fun  53      61.fun  lost+found
04.fun  13      21.fun  30      36.fun  45      53.fun  62
05      13.fun  22      30.fun  37      45.fun  54      62.fun
05.fun  14      22.fun  31      37.fun  46      54.fun  63
06      14.fun  23      31.fun  40      46.fun  55      63.fun

Note that the first 'ls' returns 'hi', and a second later, 'ls' returns the glusterfs content.

For fusexmp, it appears to complete the mount before it returns. For glusterfs, it seems to complete the mount a short time after it completes.

I think this is where autofs is getting confused, and serving the handle to the directory to the client too early. It thinks glusterfs is done mounting, and gives the handle to the client, but this handle is broken and fails. Glusterfs completes the mount, and a short time later the lookups succeed. Adding 'sleep' in mount.glusterfs do not seem to be good enough - as 'sleep 1' and 'sleep 20' do not change the frequency. The existing 'sleep 3' in /sbin/mount.glusterfs should be completely unnecessary. Instead, we should figure out why GlusterFS cannot ensure the mount is in place before it returns?

I'm worn out investigating for today - hopefully somebody can help me? :-)

Cheers,
mark

--
Mark Mielke<mark@xxxxxxxxx>





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux