On Thu, 2020-10-08 at 08:00 -0500, Patrick Goetz wrote: > > On 10/7/20 7:38 PM, Ian Kent wrote: > > On Wed, 2020-10-07 at 18:39 -0500, Patrick Goetz wrote: > > > I have a rather puzzling timing issue which I hope someone can > > > shed > > > some > > > light on. We are using a centralized authentication service > > > (Microsoft > > > AD) with authorization restricted to members of a particular > > > security > > > group. Home directories are housed on an NFS server (Ubuntu > > > 20.04) > > > and > > > automounted to a collection of compute workstations (currently a > > > mixture > > > of Ubuntu 18.04 and 20.04 systems). > > > > > > Because there is a reasonably high level of turnover (grad > > > students > > > and > > > postdocs) and because I don't want people to login directly to > > > the > > > NFS > > > server any more (hence can't use pam_mkhomedir without > > > compromising > > > security by setting no_root_squash on the NFS server) I decided > > > to > > > try > > > and automate the creation of home directories a different way. To > > > wit, I > > > run a pam_exec script in the session configuration: > > > > > > /etc/pam.d/common-session: > > > ------------------------- > > > session optional pam_exec.so log=/tmp/pam_exec.log > > > /usr/local/sbin/make-nfs_homedir.sh > > > > > > Which runs this script: > > > > > > ----------------------------------------------- > > > #!/bin/bash > > > > > > if [ $PAM_TYPE = "open_session" ]; then > > > > > > UTEID="$(cut -d'@' -f1 <<<"$PAM_USER")" > > > USERGID=$(id -g $UTEID) > > > PROC=$$ > > > TEMPDIR=$UTEID$PROC > > > > > > if (($USERGID > 100000)); then > > > > > > mkdir /tmp/$TEMPDIR > > > mount helios:/home /tmp/$TEMPDIR > > > > > > > > > if [ ! -d /tmp/$TEMPDIR/$UTEID ]; then > > > mkdir /tmp/$TEMPDIR/temp/$UTEID > > > sleep 70 > > > fi > > > > > > cd / > > > umount /tmp/$TEMPDIR > > > rmdir /tmp/$TEMPDIR > > > fi > > > fi > > > ----------------------------------------------- > > > > > > A brief explanation is in order. nfs_server:/home is mounted to > > > /tmp/$uid$pid on the client workstation and the script checks to > > > see > > > if > > > the user's home directory already exists. If so do nothing. If > > > not, > > > create it in /home/temp. On the NFS server, the /home/temp > > > directory > > > has 1777 permissions, so anyone can write to it, including the > > > nobody > > > user from the workstations. > > > > > > Now for some systemd black magic. On the NFS server there is a > > > systemd > > > path unit file and accompanying service file. Whenever a > > > directory > > > is > > > created in /home/temp, there's an NFS server script which moves > > > it > > > to > > > /home and sets the appropriate user permissions. This process is > > > nearly > > > instantaneous; i.e. if I type `mkdir stuff` in /home/temp on the > > > NFS > > > server, I don't have time to type `ls` before the directory is > > > moved > > > to > > > /home with appropriate user permissions set. > > > > > > One last detail: /etc/auto.home on the workstation: > > > > > > * -tcp,vers=4.2 helios.biosci.utexas.edu:/home/& > > > > > > > > > OK, now for the puzzling part. Notice the > > > > > > sleep 70 > > > > > > directive in the client side pam_exec script. I've fiddled > > > around > > > with > > > this quite a bit and anything less results in a message of > > > > > > Could not chdir to home directory /home/pgoetz: No such file > > > or > > > directory > > > > > > on first log in. I've checked, and indeed the directory is not > > > mounted. > > > If you hang around in / long enough (roughly 70 seconds) you can > > > eventually cd to your automounted home directory. > > > > > > So, Question: Why the delay? The home directory on the NFS > > > server > > > is > > > created nearly instantly, so it can't be that. And as mentioned, > > > if > > > I > > > sleep for say, 60 seconds, the home directory isn't immediately > > > accessible on login, although one can cd to it a few seconds > > > later. I > > > can't fathom why the required delay. > > > > It sounds like the NFS client isn't seeing the attribute change of > > the server directory /home. > > > > IIUC basically there are two things to worry about, first that the > > VFS path walk on the client actually results calls into the NFS > > client code (it might not for various reasons, like the VFS doesn't > > think the directory dentry needs revalidation), and second, the > > attributes used by the NFS client to detect staleness aren't being > > changed by the server operations being done so revalidation isn't > > done by the NFS client. > > > > Not sure what to do about it short of diving into the NFS client > > kernel code. > > > > Ian > > > > > > > > Thanks, Ian. I just wanted to make sure I'm not overlooking > something > dumb. Looking at the NFS client code is not warranted, given the > minor > inconvenience this causes. While this is obviously a bit of an edge > case, since there are other ways to get this done, given the ongoing > shift to dynamic containerized computing these kinds of issues will > continue to gain prominence though. All that attribute checking the NFS kernel client does is meant to identify when a path component has changed, there's a lot of it. Possibly what's happening is the VFS isn't calling ->d_revalidate(), possibly during the lockless part of the path walk, because it's the parent, not the last component of the path, and it doesn't appear its changed based on what the VFS can check in that mode (so called rcu-walk). If that is the case it will likely be rather difficult for the VFS to deal with, so essentially I suspect it's an NFS problem. Ian