Hello, Some time ago I spent quite some time debugging a problem reported by a customer. I'd like to share it here hoping that it can help others that have seen similar symptoms. The short problem description is: a program tries to open/access a file in an automounted volume (originally unmounted) and open()/access() returns -ENOENT. If the operation is retried immediately, it succeeds. To reproduce it in a reasonable time (<1h) I had to setup many (~15k) direct mounts and really stress the daemon, triggering many mounts simultaneously (calling stat() twice in some file and checking if the first call failed) and reloading it from time to time. Also, I think it requires LDAP, but I can't say for sure. A more detailed description: There is a rare race condition between do_mount_direct() and do_readmap() that can make the automount daemon return success for a mount that it haven't really mounted. This is how the involved threads look like at the race window: > Thread 8 (readmap thread) Thread 1 (mount thread) > ======== ======== > do_readmap (state.c:505) do_mount_direct (direct.c:1229) > [traverses list of cached mounts lookup_nss_mount (lookup.c:935) > calling do_readmap_mount()] lookup_map_name (lookup.c:755) > do_readmap_mount (state.c:429) lookup_name_source_instance (lookup.c:754) > do_mount_autofs_direct (direct.c:347) lookup_mount (lookup_ldap.c:2995) > [sets MOUNT_FLAG_REMOUNT in ap->flags parse_mount (parse_sun.c:1609) > and calls try_remount] sun_mount (parse_sun.c:695) > try_remount (mounts.c:1375) mount_mount (mount_nfs.c:75) > [finds MOUNT_FLAG_REMOUNT set in > ap->flags and returns immediately] Both threads are working on the same autofs_point (ap) structure (the one that describes the standard direct mount point -> "/-"). Thread 1 tries to mount a volume from the direct mount while Thread 8 is re-reading the map. During re-read, MOUNT_FLAG_REMOUNT will be set and unset for each entry in the direct map (which is ~15k times in my test setup). As there's no lock protecting ap->flags, the race window is the time spent executing try_remount(). If the other thread reaches mount_mount() during the window, the volume won't be mounted but the daemon will inform the kernel that it was. What makes the race window small is: - The readmap doesn't run that often, specially when using long timeouts - MOUNT_FLAG_REMOUNT is set for a very short time. What makes it a bit more interesting: I'm able to reproduce this only with kernels <= 2.6.37. There were quite some changes in the AutoFS kernel module introduced on 2.6.38. I'm not able to explain which change masked or fixed the problem. Considering how much time I spent on this, I'd be very grateful if someone could shed some light on it. To workaround the problem on systems that can't update the kernel, I patched the daemon (mount_mount() in mount_nfs.c) to retry the test after some microseconds when it finds MOUNT_FLAG_REMOUNT set. I know... not nice :-) Thanks, Leonardo -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html