On Mon 18-11-19 16:25:58, Pavel Machek wrote: > Hi! > > > > > From: Michal Hocko <mhocko@xxxxxxxx> > > > > > > > > commit b0f53dbc4bc4c371f38b14c391095a3bb8a0bb40 upstream. > > > > > > > > Partially revert 16db3d3f1170 ("kernel/sysctl.c: threads-max observe > > > > limits") because the patch is causing a regression to any workload which > > > > needs to override the auto-tuning of the limit provided by kernel. > > > > > > > > set_max_threads is implementing a boot time guesstimate to provide a > > > > sensible limit of the concurrently running threads so that runaways will > > > > not deplete all the memory. This is a good thing in general but there > > > > are workloads which might need to increase this limit for an application > > > > to run (reportedly WebSpher MQ is affected) and that is simply not > > > > possible after the mentioned change. It is also very dubious to > > > > override an admin decision by an estimation that doesn't have any direct > > > > relation to correctness of the kernel operation. > > > > > > > > Fix this by dropping set_max_threads from sysctl_max_threads so any > > > > value is accepted as long as it fits into MAX_THREADS which is important > > > > to check because allowing more threads could break internal robust futex > > > > restriction. While at it, do not use MIN_THREADS as the lower boundary > > > > because it is also only a heuristic for automatic estimation and admin > > > > might have a good reason to stop new threads to be created even when > > > > below this limit. > > > > > > Ok, why not, but I smell followup work could be done: > > > > > > > @@ -2635,7 +2635,7 @@ int sysctl_max_threads(struct ctl_table > > > > if (ret || !write) > > > > return ret; > > > > > > > > - set_max_threads(threads); > > > > + max_threads = threads; > > > > > > > > > > AFAICT set_max_threads can now become __init. > > > > Yes. Care to send a patch? > > I'm not usually hacking in that area. Could you do that? I can put it on my ever growing todo list. But this should be a low hanging fruit that doesn't really require a deep understanding of the specific subsystem. > > > Plus, I don't see any locking here, should this be WRITE_ONCE() at > > > minimum? > > > > Why would that matter? Do you expect several root processes race to set > > the value? > > Well, for example to warn humans that this code is accessing unlocked > variable. Second, as is, code is not valid C and compilers are > allowed to do strange stuff ("undefined behaviour"). Third, there are > concurency checkers that will not like this one. I do not see any undefined behahvior in assigning an integer in a lockless manner if there are no actual consistency issues. If this is not the case then please do describe them in a specific manner. -- Michal Hocko SUSE Labs