On 2020-07-01 03:39, Kraus, Sebastian wrote:
OK, thanks for the info. I wondered, because your patch did not show
up as a commit within upstream.
Your patch seems to do a good job - no more segfaults since a period of four days. :-)
I'm not a maintainer, just an enthusiastic user with a compiler... ;-)
I'm sure it'll get applied in the near future, as time permits.
I've found one other place that has insufficient locking but the race to hit it is fairly small. It's in the Kerberos machine principal cache when it refreshes the machine credentials.
These type of patches are always welcome. :-)
In the recent past, some of our scientific staff exprienced strange problems with Kerberos authentication against our NFSv4 file servers.
Maybe, the outages were in connection with this type of race condition. But, I do not know for sure as the authentication errors did happen on a rather sporadic basis.
The previous bug could also cause authentication issues without crashing
depending on load, timing, memory usage, malloc library, etc. This one
would only crop up during machine credentials refresh, which by default
is once every 24 hours. I've just posted a patch 'gssd: Fix locking for
machine principal list', the interesting part for backporting is around
line 447. It used to always strdup() even if cache name was the same.
I have a patch for that, but it's pretty invasive due to some other changes I'm currently working on. Let me know if you hit it, and I can work on a simple version to backport.
NFSv4+Kerberos is not for the faint-hearted. I do not fear of invasive patches - as long as they are not missing technical correctness. ;-)
No guarantees... but I do try. ;-)
A question far apart from this:
How is it about the spread of NFSv4+Kerberos setups within academic community and commerical environments?
Are there, up to your knowledge, any bigger on-premise or cloud setups out there?
And are there any companies running dedicated NFSv4+Kerberos setups?
I really have no idea. I only run it on my home network of a few dozen
(old) machines. From what I've seen while googling
trying to figure out how the code base works, there are fair number of
users. There's also been a large amount of work in
recent years, which would point to something driving that.
Doug