On Thu, May 25, 2017 at 10:30 AM, Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx> wrote: > On Thu, May 25, 2017 at 09:50:15AM -0700, Luis R. Rodriguez wrote: >> On Thu, May 25, 2017 at 9:38 AM, Dmitry Torokhov >> <dmitry.torokhov@xxxxxxxxx> wrote: >> > On Thu, May 25, 2017 at 06:22:01PM +0200, Luis R. Rodriguez wrote: >> >> On Fri, May 19, 2017 at 02:58:29PM -0700, Dmitry Torokhov wrote: >> >> > On Fri, May 19, 2017 at 02:45:29PM -0700, Luis R. Rodriguez wrote: >> >> > > On May 19, 2017 1:45 PM, "Dmitry Torokhov" <dmitry.torokhov@xxxxxxxxx> >> >> > > wrote: >> >> > > >> >> > > On Thu, May 18, 2017 at 08:24:39PM -0700, Luis R. Rodriguez wrote: >> >> > > > We currently statically limit the number of modprobe threads which >> >> > > > we allow to run concurrently to 50. As per Keith Owens, this was a >> >> > > > completely arbitrary value, and it was set in the 2.3.38 days [0] >> >> > > > over 16 years ago in year 2000. >> >> > > > >> >> > > > Although we haven't yet hit our lower limits, experimentation [1] >> >> > > > shows that when and if we hit this limit in the worst case, will be >> >> > > > fatal -- consider get_fs_type() failures upon mount on a system which >> >> > > > has many partitions, some of which might even be with the same >> >> > > > filesystem. Its best to be prudent and increase and set this >> >> > > > value to something more sensible which ensures we're far from hitting >> >> > > > the limit and also allows default build/user run time override. >> >> > > > >> >> > > > The worst case is fatal given that once a module fails to load there >> >> > > > is a period of time during which subsequent request for the same module >> >> > > > will fail, so in the case of partitions its not just one request that >> >> > > > could fail, but whole series of partitions. This later issue of a >> >> > > > module request failure domino effect can be addressed later, but >> >> > > > increasing the limit to something more meaninful should at least give us >> >> > > > enough cushion to avoid this for a while. >> >> > > > >> >> > > > Set this value up with a bit more meaninful modern limits: >> >> > > > >> >> > > > Bump this up to 64 max for small systems (CONFIG_BASE_SMALL) >> >> > > > Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL) >> >> > > > >> >> > > > Also allow the default max limit to be further fine tuned at compile >> >> > > > time and at initialization at run time at boot up using the kernel >> >> > > > parameter: max_modprobes. >> >> > > > >> >> > > > [0] https://git.kernel.org/cgit/linux/kernel/git/history/ >> >> > > history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44 >> >> > > > [1] https://github.com/mcgrof/test_request_module >> >> > > >> >> > > If we actually run into this issue, instead of slamming the system with >> >> > > bazillion concurrent requests, can we wait for the other modprobes to >> >> > > finish and then continue? >> >> > > >> >> > > >> >> > > Yes ! That I have a patch that does precisely that ! That is actually still >> >> > > *not enough* to not fail fatally but this would be subject of another >> >> > > series with more debatable approaches. >> >> > > >> >> > >> >> > Then please post it. >> >> >> >> Will do. >> >> >> >> > > This at least pushes us to closer safer limits for now while also making it >> >> > > configurable. >> >> > >> >> > Making it configurable depending on how big/little box is makes no >> >> > sense, >> >> >> >> If we set a hard limit then we need to patch a system if we need to increment >> >> it. This is rather stupid given we have no current heuristics to make kmod >> >> loading deterministic from userspace, and in the worst case this can be fatal. >> >> General system size is a good first guess, but making it configurable is >> >> really key given current limitations. I'll post further patches which reveals >> >> some of these issues more clearly. >> >> >> >> > especially if the above is implemented, as depth of modprobe >> >> > invocations depends on configuration and not computing power of the >> >> > hardware the system is running on. >> >> >> >> You seem to agree making it configurable is sensible , but not depending on >> >> the system size ? >> > >> > No, I am saying that making it configurable based on system size makes >> > no sense at all, and making it configurable given you already have >> > patches removing hard failures gives no benefit. >> >> Ah no, the problem is that hard failures are not yet removed in this >> patch set at all! This series only contains the things I thought were >> non-radical really. > > I know they are not removed in this patch set. > >> >> In fact even with the subsequent patches from my 2nd series I'll >> eventually post post -- these fatal issues are not cured at all unless >> we dance with userspace a bit, or unless as you suggest we have *all* >> pending theads wait without killing any. > > Well, that is too bad, I understood you already implemented what I > suggested. We seem to want the same so that is good actually. >> *This* small patch should enable folks to move the needle to a more >> *fair* limit, its also useful to backport a simple fix even if the >> other stuff is not merged, *but* it *also* provides a way for systems >> to move away from the slippery slope if they know what they are doing. > > Look, you are trying to push a band-aid solution for a problem that is > purely theoretical (as you say in your patch description we are not > hitting this problem in practice, only your test does). I have a stress test driver which reveals we can easily hit it. In practice the only way to know if we hit the limit is a: "request_module: runaway loop modprobe %s" message on dmesg, however its fatal, how often people inspect a kernel log to see if that came up though... not sure. So a module could not be loaded and we may not realize it. > There is > no slippery slope for systems to move away, no need to backport > anything. We seem to agree that a better solution is possible (throttle > number of concurrently running modprobes without killing requesters), > and with that solution the band-aid will no longer be needed. > > So please implement and post the proper fix for the issue. Alright, will do away with this patch and just go for the jugular of the issue. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html