Hi Ian, list,
On 2/2/19 2:16 AM, Ian Kent wrote:
On Thu, 2019-01-31 at 22:26 +0100, Frank Thommen wrote:
We are running autofs 5.0.7, release 70.el7_4.1 on CentOS 7.4.1708.
Updating the CentOS release ist not possible due to hardware and
software constraints.
Before I go burning lots of time on trying to reproduce this
you should check if this happens with the latest CentOS autofs
package, revision 90 (the CentOS repo doesn't look like it
retains older revisions).
There were a couple of regressions fixed in 7.5 amount other
things.
I don't think there were updates to dependent packages that
would cause problems in the subsequent RHEL releases (in fact
there shouldn't be).
Another question.
When you see the problem has occurred did you check that
automount is actually still running (IOW, did you check if
it had crashed).
Ian
Oops, already > three months and I haven't replied yet. I'm very sorry
for that, because I really appreciate your reactiveness and helpfulness.
Unfortunately other IT problems have outpaced this one.
Summary: In the end we "flattened" the automounter file structure, so
that instead of using
auto.master: /base /etc/auto.base browse
auto.base: sub1 /sub11 -fstype=autofs,vers=3 file:/etc/auto.sub11
auto.sub11: sub11-1 server:/export1
sub11-2 server:/export2
we now use
auto.master: /base/sub1 /etc/auto.sub1 browse
auto.sub1: sub11 -fstype=nfs,vers=3 \
sub11-1 server:/export1 \
sub11-2 server:/export2
this solution is as manageable as the first one and the problems
described in my original post have gone since then. It "works for us",
even though we don't understand what didn't work. Since the issue - as
we have learned in the meantime - overlapped with networking problems of
the central storage, the problem /could/ have been an unfortunate
concidence, triggering the described problem.
For the sake of completeness and documentation I'll answer your last and
still unanswered questions:
Is it always the same directory that becomes unresponsive?
It's all the directories managed by this table.
My original reading of the problem description made me think
that only certain automount points became unresponsive.
If "all" the automounts become unresponsive that's a very different
problem.
only /certain/ directories got lost, but not always the same ones and
not on all hosts the same ones.
Does the problem also occur if you use a HUP signal to re-read the
maps?
Haven't tried this yet. We usually just restart autofs.
I think this is another misunderstanding of the problem I have.
The description sounded like it was the restart with a modified
map that resulted in the problem but based on this and your later
reply it sounds like the restart fixes the problem.
That implies that modifying the map results in this automount
becoming unresponsive at some later time after the map change.
Have I got it right now?
Not quite :-) An automounter restart with he /un/modified map always
solved the issue...for some time until some of the directories became
unavailable again...
Is there anything in the debug log about a map re-read (and
following log entries from that) between the time the map is
deployed and when the problem occurrs?
Are you sure you're getting all the debug logging?
If your assuming that setting "loggin = debug" in the autofs
configuration and using syslog with a default configuration
you might not be. How have setup to collect the debug log?
I haven't looked in the details of complete autofs debugging. Basically
the daemon is running with "-d --foreground --dont-check-daemon" (set in
/etc/sysconfig/autofs as 'OPTIONS="-d"')
Again thank you very much for your efforts
frank