>>> I've found one other place that has insufficient locking but the race to hit it is fairly small. It's in the Kerberos machine principal cache when it refreshes the machine credentials. > These type of patches are always welcome. :-) > In the recent past, some of our scientific staff exprienced strange problems with Kerberos authentication against our NFSv4 file servers. > Maybe, the outages were in connection with this type of race condition. But, I do not know for sure as the authentication errors did happen on a rather sporadic basis. We (Linköping University in Sweden) have seen these problems before too. I sent a patch for rpc.gssd this spring that “fixed” this problem too (well, fixed the symptom and not the root cause so it wasn’t the right fix). Without that patch we typically had rpc.gssd crash on our multiuser client servers every other day. It was partly masked by Puppet detecting it down and restarting it but the users had strange errors that they reported and then when the support folks checked everything was running :-). It also crashed very often on a set of test machines that every minute would connect to our NFS servers in order to verify that they were running and giving good response times. Multiple NFS connections being set up and teared with concurrently many times easily forced this problem to happen after a day or two. > A question far apart from this: > How is it about the spread of NFSv4+Kerberos setups within academic community and commerical environments? We are using NFSv4+Kerberos. Most of our users are SMBv3 clients (Windows & Mac, 10x the Linux users) though but we have some 600 NFS clients (99.9% Linux (CentOS & Ubuntu mostly) based, servers are FreeBSD with ZFS). We used to be a big Sun/Solaris NFS shop previously so NFS comes “naturally” for us :-) (Would have loved to use NFSv4+Kerberos on the MacOS clients but unfortunately MacOS panics when the Kerberos ticket expires and you have an active NFS share mounted which is a bit of a bummer :-) (Using NFS v3 or lower and without Kerberos isn’t really an option - real ACLs and some sort of security is really needed) Anyway - it’s good to see that the root cause for this bug has been found and fixed the right way :-) - Peter > Are there, up to your knowledge, any bigger on-premise or cloud setups out there? > And are there any companies running dedicated NFSv4+Kerberos setups? > > > Best and keep well and fit > Sebastian > > _________________ > Sebastian Kraus > Team IT am Institut für Chemie > Gebäude C, Straße des 17. Juni 115, Raum C7 > > Technische Universität Berlin > Fakultät II > Institut für Chemie > Sekretariat C3 > Straße des 17. Juni 135 > 10623 Berlin > > > Tel.: +49 30 314 22263 > Fax: +49 30 314 29309 > Email: sebastian.kraus@xxxxxxxxxxxx > > ________________________________________ > From: Doug Nazar <nazard@xxxxxxxx> > Sent: Monday, June 29, 2020 16:09 > To: Kraus, Sebastian > Cc: linux-nfs@xxxxxxxxxxxxxxx > Subject: Re: [PATCH v2] Re: Strange segmentation violations of rpc.gssd in Debian Buster > > On 2020-06-29 01:39, Kraus, Sebastian wrote: >> Hi Doug, >> thanks very much for your patch and efforts. >> I manually backported the patch to nfs-utils 1.3.4-2.5 source in Debian Buster. >> I am now testing the modified build on one of my NFSv4 file servers. Looks promising. >> >> One additional question: Which nfs-utils branch are your working on - steved/nfs-utils.git ? > > Yes, I'm working against upstream. I did check briefly that the code > hadn't changed too much since 1.3.4 in that area. > > I've found one other place that has insufficient locking but the race to > hit it is fairly small. It's in the Kerberos machine principal cache > when it refreshes the machine credentials. I have a patch for that, but > it's pretty invasive due to some other changes I'm currently working on. > Let me know if you hit it, and I can work on a simple version to backport. > > Doug >