On Sun, May 27, 2012 at 11:02 PM, Ian Kent <raven@xxxxxxxxxx> wrote: > On Wed, 2012-02-29 at 14:10 -0400, Leonardo Chiquitto wrote: >> On Wed, Dec 14, 2011 at 2:05 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> > On Mon, 2011-12-12 at 19:08 -0300, Leonardo Chiquitto wrote: >> >> On Mon, Dec 12, 2011 at 3:39 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> >> > On Mon, 2011-12-05 at 12:52 -0300, Leonardo Chiquitto wrote: >> >> >> Hello, >> >> >> >> >> >> There is a problem in AutoFS that makes the daemon hang forever when >> >> >> starting up. It can be reproduced reliably with the latest git checkout, >> >> >> if you use the following configure options: --disable-mount-locking >> >> >> --enable-forced-shutdown --enable-ignore-busy. >> >> >> >> >> >> You'll also need to setup a nested direct map: >> >> >> >> >> >> # cat /etc/auto.master >> >> >> /- /etc/auto.direct >> >> >> >> >> >> # cat /etc/auto.direct >> >> >> /test \ >> >> >> / -nfsvers=3 10.1.1.65:/nfs \ >> >> >> /var -nfsvers=3 10.1.1.65:/nfs/var >> >> >> >> >> >> And then run these steps: >> >> >> >> >> >> 1. Start AutoFS >> >> >> 2. cd /test >> >> >> 3. Stop AutoFS >> >> >> 4. Start AutoFS >> >> >> >> >> >> The automounter will deadlock when starting up because it calls >> >> >> cache_writelock() in a lock that is already "readlocked" by the >> >> >> same thread. Here's the call traces (Thread 3 is the one that >> >> >> deadlocks): >> >> >> >> >> >> Thread 4 (Thread 0x7f63a0b18700 (LWP 6923)): >> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 () >> >> >> from /lib64/libpthread.so.0 >> >> >> #1 0x00007f63a0b665c8 in master_do_mount (entry=0x7f63a0d9dd30) >> >> >> at master.c:1065 >> >> >> #2 0x00007f63a0b66a38 in master_mount_mounts (master=0x7f63a0d82310, >> >> >> age=1323091131, readall=0) at master.c:1206 >> >> >> #3 0x00007f63a0b65c03 in master_read_master (master=0x7f63a0d82310, >> >> >> age=1323091131, readall=0) at master.c:860 >> >> >> #4 0x00007f63a0b45d45 in main (argc=0, argv=0x7fff20d087a0) >> >> >> at automount.c:2202 >> >> >> >> >> >> Thread 3 (Thread 0x7f639f564700 (LWP 6928)): >> >> >> #0 0x00007f63a0702610 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0 >> >> >> #1 0x00007f63a0b56d48 in cache_writelock (mc=0x7f63a0d9e010) at cache.c:72 >> >> >> #2 0x00007f639e706c23 in parse_mount (ap=0x7f63a0d9de50, >> >> >> name=0x7f639f563400 "/nfs", name_len=4, >> >> >> mapent=0x7f639f562400 "/ -nfsvers=3 10.1.1.65:/nfs \t/var -nfsvers=3 10.1.1.65:/nfs/var", context=0x7f63a0d83c90) at parse_sun.c:1550 >> >> >> #3 0x00007f639eb498ea in lookup_mount (ap=0x7f63a0d9de50, >> >> >> name=0x7f63a0da4460 "/nfs", name_len=4, context=0x7f63a0d899a0) >> >> >> at lookup_file.c:1074 >> >> >> #4 0x00007f63a0b523d5 in do_lookup_mount (ap=0x7f63a0d9de50, >> >> >> map=0x7f63a0da40d0, name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:678 >> >> >> #5 0x00007f63a0b5258c in lookup_name_file_source_instance (ap=0x7f63a0d9de50, >> >> >> map=0x7f63a0d9df60, name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:718 >> >> >> #6 0x00007f63a0b52cc3 in lookup_nss_mount (ap=0x7f63a0d9de50, source=0x0, >> >> >> name=0x7f63a0da4460 "/nfs", name_len=4) at lookup.c:909 >> >> >> #7 0x00007f63a0b5c678 in do_remount_direct (ap=0x7f63a0d9de50, fd=11, >> >> >> path=0x7f63a0da4460 "/nfs") at mounts.c:1159 >> >> >> #8 0x00007f63a0b5d003 in remount_active_mount (ap=0x7f63a0d9de50, >> >> >> mc=0x7f63a0d9e010, path=0x7f63a0da4460 "/nfs", devid=31, type=2, >> >> >> ioctlfd=0x7f639f563a30) at mounts.c:1329 >> >> >> #9 0x00007f63a0b5d0f3 in try_remount (ap=0x7f63a0d9de50, me=0x7f63a0da4390, >> >> >> type=2) at mounts.c:1357 >> >> >> #10 0x00007f63a0b49dee in do_mount_autofs_direct (ap=0x7f63a0d9de50, >> >> >> mnts=0x7f63a0da44d0, me=0x7f63a0da4390) at direct.c:339 >> >> >> #11 0x00007f63a0b4a811 in mount_autofs_direct (ap=0x7f63a0d9de50) >> >> >> at direct.c:511 >> >> >> #12 0x00007f63a0b42f2a in mount_autofs (ap=0x7f63a0d9de50, >> >> >> root=0x7f63a0d83500 "/-") at automount.c:1003 >> >> >> #13 0x00007f63a0b44a80 in handle_mounts (arg=0x7fff20d08440) >> >> >> at automount.c:1561 >> >> >> #14 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0 >> >> >> #15 0x00007f639f84853d in clone () from /lib64/libc.so.6 >> >> >> >> >> >> Thread 2 (Thread 0x7f63a0b16700 (LWP 6925)): >> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 () >> >> >> from /lib64/libpthread.so.0 >> >> >> #1 0x00007f63a0b5625a in st_queue_handler (arg=0x0) at state.c:1073 >> >> >> #2 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0 >> >> >> #3 0x00007f639f84853d in clone () from /lib64/libc.so.6 >> >> >> >> >> >> Thread 1 (Thread 0x7f63a0b32700 (LWP 6924)): >> >> >> #0 0x00007f63a0702e6c in pthread_cond_wait@@GLIBC_2.3.2 () >> >> >> from /lib64/libpthread.so.0 >> >> >> #1 0x00007f63a0b629f8 in alarm_handler (arg=0x0) at alarm.c:186 >> >> >> #2 0x00007f63a06fef05 in start_thread () from /lib64/libpthread.so.0 >> >> >> #3 0x00007f639f84853d in clone () from /lib64/libc.so.6 >> >> >> >> >> >> And the debug logs: >> >> >> >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: Starting automounter version 5.0.6, master map auto.master >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: using kernel protocol version 5.02 >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_nss_read_master: reading master files auto.master >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_init: parse(sun): init gathered global options: (null) >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_read_master: lookup(file): read entry /- >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: master_do_mount: mounting /- >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: automount_path_to_fifo: fifo name /run/autofs.fifo-- >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_nss_read_map: reading map file /etc/auto.direct >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_init: parse(sun): init gathered global options: (null) >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: remount_active_mount: trying to re-connect to mount /nfs >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: mounted direct on /nfs with timeout 300, freq 75 seconds >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_mount: lookup(file): looking up /nfs >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: lookup_mount: lookup(file): /nfs -> / -nfsvers=3 10.1.1.65:/nfs #011/var -nfsvers=3 10.1.1.65:/nfs/var >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_mount: parse(sun): expanded entry: / -nfsvers=3 10.1.1.65:/nfs #011/var -nfsvers=3 10.1.1.65:/nfs/var >> >> >> Dec 5 10:18:51 linux-2alv automount[6923]: parse_mount: parse(sun): gathered options: >> >> >> >> >> >> Here's an initial analysis of the Thread 3 behavior: >> >> >> >> >> >> #15 clone () >> >> >> #14 start_thread () >> >> >> #13 handle_mounts at automount.c:1561 >> >> >> #12 mount_autofs at automount.c:1003 >> >> >> #11 mount_autofs_direct at direct.c:511 >> >> >> >> >> >> In mount_autofs_direct() (daemon/direct.c:486), cache_readlock() >> >> >> acquires the lock that will be unlocked by pthread_cleanup_pop(1) >> >> >> (daemon/direct.c:515). However, before the unlock, the remount >> >> >> code path goes on like this: >> >> >> >> >> >> #10 do_mount_autofs_direct at direct.c:339 >> >> >> #9 try_remount at mounts.c:1357 >> >> >> #8 remount_active_mount at mounts.c:1329 >> >> >> #7 do_remount_direct at mounts.c:1159 >> >> >> #6 lookup_nss_mount at lookup.c:909 >> >> >> #5 lookup_name_file_source_instance at lookup.c:718 >> >> >> #4 do_lookup_mount at lookup.c:678 >> >> >> #3 lookup_mount at lookup_file.c:1074 >> >> >> #2 parse_mount at parse_sun.c:1550 >> >> >> >> >> >> ... calling cache_writelock() in the same mapent_cache that was >> >> >> locked above. >> >> >> >> >> >> #1 cache_writelock at cache.c:72 >> >> >> >> >> >> Any ideas on how this could be fixed? >> >> > >> >> > Give this patch a try please. >> >> > I still need to test using the submount test since there may be a >> >> > problem with them, based on the comment in the code that's been >> >> > removed. I'll report back on how that goes. >> >> > >> >> > autofs-5.0.6 - fix remount deadlock >> >> > >> >> > From: Ian Kent <raven@xxxxxxxxxx> >> >> > >> >> > When reconstructing the mount tree upon restart a writelock to the map >> >> > entry cache cannot be taken when parsing an entry because a readlock >> >> > already is held higher up in the call tree. >> >> > >> >> > In the place this is done it shouldn't be necessary to alter the map >> >> > entries in the cache and it shouldn't be necessary to delete an existing >> >> > multi-mount cache entries to avoid a duplicate multi-mount entry error >> >> > return. The check for a duplicate should be able to be done in the cache >> >> > handling functions. >> >> > >> >> > I don't remember how this code came to be in the parsing function but >> >> > it is likely it was due to a problem observed when testing using the >> >> > autofs connectathon tests. With the change here the tests still pass >> >> > and attempts to add a duplicate multi-mount entry are still detected. >> >> > >> >> > But the deadlock reported by Leonardo Chiquitto also appears to be >> >> > resolved. >> >> > --- >> >> > >> >> > lib/cache.c | 8 +++++--- >> >> > modules/parse_sun.c | 38 ++++++-------------------------------- >> >> > 2 files changed, 11 insertions(+), 35 deletions(-) >> >> > >> >> > >> >> > diff --git a/lib/cache.c b/lib/cache.c >> >> > index 9acf736..3464e7d 100644 >> >> > --- a/lib/cache.c >> >> > +++ b/lib/cache.c >> >> > @@ -658,10 +658,12 @@ int cache_add_offset(struct mapent_cache *mc, const char *mkey, const char *key, >> >> > return CHE_FAIL; >> >> > >> >> > me = cache_lookup_distinct(mc, key); >> >> > - if (me && me != owner) >> >> > - return CHE_DUPLICATE; >> >> > + if (me && me->age == age) { >> >> > + if (me != owner) >> >> > + return CHE_DUPLICATE; >> >> > + } >> >> > >> >> > - ret = cache_add(mc, owner->source, key, mapent, age); >> >> > + ret = cache_update(mc, owner->source, key, mapent, age); >> >> > if (ret == CHE_FAIL) { >> >> > warn(logopt, "failed to add key %s to cache", key); >> >> > return CHE_FAIL; >> >> > diff --git a/modules/parse_sun.c b/modules/parse_sun.c >> >> > index 13b8af8..0da1802 100644 >> >> > --- a/modules/parse_sun.c >> >> > +++ b/modules/parse_sun.c >> >> > @@ -843,12 +843,14 @@ add_offset_entry(struct autofs_point *ap, const char *name, >> >> > strcpy(m_mapent, loc); >> >> > >> >> > ret = cache_add_offset(mc, name, m_key, m_mapent, age); >> >> > - if (ret == CHE_OK) >> >> > - debug(ap->logopt, MODPREFIX >> >> > - "added multi-mount offset %s -> %s", path, m_mapent); >> >> > - else >> >> > + if (ret != CHE_UPDATED) >> >> > warn(ap->logopt, MODPREFIX >> >> > "syntax error or duplicate offset %s -> %s", path, loc); >> >> > + else { >> >> > + debug(ap->logopt, MODPREFIX >> >> > + "added multi-mount offset %s -> %s", path, m_mapent); >> >> > + ret = CHE_OK; >> >> > + } >> >> > >> >> > return ret; >> >> > } >> >> > @@ -1547,34 +1549,6 @@ int parse_mount(struct autofs_point *ap, const char *name, >> >> > strcat(m_root, name); >> >> > } >> >> > >> >> > - cache_writelock(mc); >> >> > - me = cache_lookup_distinct(mc, name); >> >> > - if (!me) { >> >> > - int ret; >> >> > - /* >> >> > - * Not in the cache, perhaps it's a program map >> >> > - * or one that doesn't support enumeration >> >> > - */ >> >> > - ret = cache_add(mc, source, name, mapent, time(NULL)); >> >> > - if (ret == CHE_FAIL) { >> >> > - cache_unlock(mc); >> >> > - free(options); >> >> > - return 1; >> >> > - } >> >> > - } else { >> >> > - /* >> >> > - * If the entry exists it must not have any existing >> >> > - * multi-mount subordinate entries since we are >> >> > - * mounting this afresh. We need to do this to allow >> >> > - * us to fail on the check for duplicate offsets in >> >> > - * we don't know when submounts go away. >> >> > - */ >> >> > - cache_multi_writelock(me); >> >> > - cache_delete_offset_list(mc, name); >> >> > - cache_multi_unlock(me); >> >> > - } >> >> > - cache_unlock(mc); >> >> > - >> >> > cache_readlock(mc); >> >> > me = cache_lookup_distinct(mc, name); >> >> > if (me) { >> >> >> >> Hello Ian, >> >> >> >> Thanks for the patch! I confirm the deadlock is gone. I'm investigating a >> >> new problem in the same test case (restarting AutoFS when a mount >> >> with nested mounts is busy), but it's probably unrelated to the patch. >> >> I'll gather more information and report that in a new thread to avoid >> >> confusion. >> > >> > But testing using the submount test indicates there may be a problem >> > with submounts using this patch or a new problem introduced with recent >> > changes. I'm still working on it. >> > >> > The handling of nested mounts is difficult, especially upon restart in >> > the presence of in use mounts. I look forward to your feedback on the >> > problem. >> >> Hello Ian, >> >> Did you have time to revisit this problem? The "submounts problem" you >> mentioned above was really introduced by the patch (fix remount deadlock) >> you posted here? > > The problem I saw might not have been introduced by the patch above but > that code most likely there for a reason that's still relevant. > > This patch I think is the way to do it, can you try it please. > > autofs-5.0.6 - fix remount deadlock > > From: Ian Kent <raven@xxxxxxxxxx> > > When reconstructing the mount tree upon restart a writelock to the map > entry cache cannot be taken when parsing a direct map entry because a > readlock is already held higher up in the call tree. > > In the place this is done it isn't be necessary to alter the direct map > entries in the cache. Also, it shouldn't be necessary to delete existing > multi-mount cache entries to avoid a duplicate multi-mount entry error > return. The check for a duplicate can be done in the cache handling > functions. > --- > > lib/cache.c | 8 ++++--- > modules/parse_sun.c | 60 ++++++++++++++++++++++++++------------------------- > 2 files changed, 35 insertions(+), 33 deletions(-) > > > diff --git a/lib/cache.c b/lib/cache.c > index 9acf736..3464e7d 100644 > --- a/lib/cache.c > +++ b/lib/cache.c > @@ -658,10 +658,12 @@ int cache_add_offset(struct mapent_cache *mc, const char *mkey, const char *key, > return CHE_FAIL; > > me = cache_lookup_distinct(mc, key); > - if (me && me != owner) > - return CHE_DUPLICATE; > + if (me && me->age == age) { > + if (me != owner) > + return CHE_DUPLICATE; > + } > > - ret = cache_add(mc, owner->source, key, mapent, age); > + ret = cache_update(mc, owner->source, key, mapent, age); > if (ret == CHE_FAIL) { > warn(logopt, "failed to add key %s to cache", key); > return CHE_FAIL; > diff --git a/modules/parse_sun.c b/modules/parse_sun.c > index d7e74e2..8b0f8b6 100644 > --- a/modules/parse_sun.c > +++ b/modules/parse_sun.c > @@ -843,12 +843,17 @@ add_offset_entry(struct autofs_point *ap, const char *name, > strcpy(m_mapent, loc); > > ret = cache_add_offset(mc, name, m_key, m_mapent, age); > - if (ret == CHE_OK) > + if (ret == CHE_DUPLICATE) > + warn(ap->logopt, MODPREFIX > + "syntax error or duplicate offset %s -> %s", path, loc); > + else if (ret == CHE_FAIL) > + debug(ap->logopt, MODPREFIX > + "failed to add multi-mount offset %s -> %s", path, m_mapent); > + else { > + ret = CHE_OK; > debug(ap->logopt, MODPREFIX > "added multi-mount offset %s -> %s", path, m_mapent); > - else > - warn(ap->logopt, MODPREFIX > - "syntax error or duplicate offset %s -> %s", path, loc); > + } > > return ret; > } > @@ -1410,7 +1415,7 @@ int parse_mount(struct autofs_point *ap, const char *name, > char buf[MAX_ERR_BUF]; > struct map_source *source; > struct mapent_cache *mc; > - struct mapent *me = NULL; > + struct mapent *me; > char *pmapent, *options; > const char *p; > int mapent_len, rv = 0; > @@ -1561,33 +1566,28 @@ int parse_mount(struct autofs_point *ap, const char *name, > strcat(m_root, name); > } > > - cache_writelock(mc); > - me = cache_lookup_distinct(mc, name); > - if (!me) { > - int ret; > - /* > - * Not in the cache, perhaps it's a program map > - * or one that doesn't support enumeration > - */ > - ret = cache_add(mc, source, name, mapent, time(NULL)); > - if (ret == CHE_FAIL) { > - cache_unlock(mc); > - free(options); > - return 1; > + /* > + * Can't take the write lock for direct mount entries here > + * but they should always be present in the map entry cache. > + */ > + if (ap->type == LKP_INDIRECT) { > + cache_writelock(mc); > + me = cache_lookup_distinct(mc, name); > + if (!me) { > + int ret; > + /* > + * Not in the cache, perhaps it's a program map > + * or one that doesn't support enumeration. > + */ > + ret = cache_add(mc, source, name, mapent, age); > + if (ret == CHE_FAIL) { > + cache_unlock(mc); > + free(options); > + return 1; > + } > } > - } else { > - /* > - * If the entry exists it must not have any existing > - * multi-mount subordinate entries since we are > - * mounting this afresh. We need to do this to allow > - * us to fail on the check for duplicate offsets in > - * we don't know when submounts go away. > - */ > - cache_multi_writelock(me); > - cache_delete_offset_list(mc, name); > - cache_multi_unlock(me); > + cache_unlock(mc); > } > - cache_unlock(mc); > > cache_readlock(mc); > me = cache_lookup_distinct(mc, name); Hello Ian, Thanks for getting back to this issue. I've tested the patch now using the test case reported earlier on this thread and I confirm that it resolves the deadlock on startup. The problem of not reconnecting to the nested mounts remains, but I don't want to mix up the threads. Thanks again, Leonardo -- To unsubscribe from this list: send the line "unsubscribe autofs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html