On 20/2/23 08:42, Ian Kent wrote:
The mount map uses LDAP and changes quite often. My guess is that
automountd notices that some directory has been removed from the map,
and so removes the map entry. This presumably races with the expiry
process. The mount gets unmounted because it is removed from the map
at the same time that expiry wants to remove it, and confusion
results.
That sounds different to the terminology I'd use but I think I get what
your saying.
I would describe it as, a map entry has been removed from the map when
it's in use causing expires for that map entry to be done on an entry
that's been removed from the index we need for the map entry lookup.
This map entry shouldn't be removed in this case.
My current thought for a solution is to change the way the kernel
waits
for NFY_EXPIRE replies. Instead of waiting indefinitely it waits with
a timeout. If the wait times out and the filesystem is still mounted,
it just loops around and waits again. If after the timeout the
filesystem has been unmounted it waits one more time (just in case
automountd is about to reply) and then aborts the wait with -EAGAIN.
I've provided the customer with a patch to do this using a 5 second
wait. I don't have test results yet.
I really don't think this is a kernel problem, it's a user space problem.
Some time ago there was a weird case where an active map entry was being
removed from the map entry cache. I had a little trouble even working out
what I had done when I cam across it in a clean up a while ago. So if
this is what your seeing we'll need to do some work to work out what
I saw and what I was doing to fix it.
Let me check 5.1.3 and get back to you.
I had a look and what I was thinking of is already present in 5.1.3.
I did however find something that looks like it's work considering,
have a look at this, it might help, not sure though:
commit 21ce28df1f4529948df876243fc977908e070296
Author: Ian Kent <raven@xxxxxxxxxx>
Date: Tue Aug 7 12:05:21 2018 +0800
autofs-5.1.4 - mark removed cache entry negative
When re-reading a map, entries that have been removed are detected
and deleted from the map entry cache by lookup_prune_cache().
If a removed map entry is mounted at the time lookup_prune_cache()
is called the map entry is skipped. This is done becuase the next
lookup (following the mount expire, which needs the cache entry to
remain) will detect the stale cache entry and a map update done
resulting in the stale entry being removed.
But if a map re-read is performed while the cache entry is mounted
the cache will appear to up to date so the removed entry will remain
valid even after it has expired.
To cover this case it's sufficient to mark the mounted cache entry
negative during the cache prune which prevents further lookups from
using the stale entry.
Signed-off-by: Ian Kent <raven@xxxxxxxxxx>
There might have been other patches at the time but it doesn't look
like it from the patch description, worth checking though.
Mostly I would be looking at debug logs to find out where the map entry
is mistakenly gets deleted, not at all straight forward but I think the
only way to tackle this problem.
I'd like to do more to help but I have a difficult problem to work out
how to fix myself just now.
Anyway, maybe I can put some time into it a bit later if needed, ;)
Ian