On Fri, Oct 3, 2014 at 1:23 AM, Tom Gundersen <teg@xxxxxxx> wrote: > On Thu, Oct 2, 2014 at 10:06 PM, Luis R. Rodriguez <mcgrof@xxxxxxxx> wrote: >> On Thu, Oct 02, 2014 at 08:12:37AM +0200, Tom Gundersen wrote: >>> Making kmod a special case is of course possible. However, as long as >>> there is no fundamental reason why kmod should get this special >>> treatment, this just looks like a work-around to me. >> >> I've mentioned a series of five reasons why its a bad idea right now to >> sigkill modules [0], we're reviewed them each and still at least >> items 2-4 remain particularly valid fundamental reasons to avoid it > > So items 2-4 basically say "there currently are drivers that cannot > deal with sigkill after a three minute timeout". No, dealing with the sigkill gracefully is all related to 2) as it says its probably a terrible idea to be triggering exit paths at random points on device drivers on init / probe. And while one could argue that perhaps that can be cleaned up I provided tons of references and even *research effort* on this particular area so the issues over this point should by no means easily be brushed off. And it may be true that we can fix some things on Linux but a) that requires a kernel upgrade on users and b) Some users may end up buying hardware that only is supported through a proprietary driver and getting those fixes is not trivial and almost impossible on some cases. 3) says it is fundamentally incorrect to limit with any arbitrary timeout the bus probe routine 4) talks about how the timeout is creating a limit on the number of devices a device driver can support on Linux as follows give the driver core batches *all* probes for one device driver serially: number_devices = systemd_timeout ------------------------------------- max known probe time for driver We have device drivers which we *know* just on *probe* will take over 1 minute, this means that by default for these device drivers folks can only install 3 devices of that type on a system. One can surely address things on the kernel but again assuming folks use defaults and don't upgrade their kernel the sigkill is simply limiting Linux right now, even if it is for the short term. > In the short-term we already have the solution: increase the timeout. Short term implicates what will be supported for a while for tons of deployments of systemd. The kernel command line work around for increasing the timeout is a reactive measure, its not addressing the problem architecturally. If the sigkill is going to be maintained for kmod its implications should be well documented as well in terms of the impact and limitations on both device drivers and number of devices a driver can support. > In the long-term, we have two choices, either permanently add some > heuristic to udev to deal with drivers taking a very long time to be > inserted, or fix the drivers not to take such a long time. Drivers taking long on init should probably be addressed, drivers taking long on probe are not broken specially since the driver core probe's all supported devices on one device driver serially, so the probe time is actually cumulative. > A priori, > it makes no sense to me that drivers spend unbounded amounts of time > to get inserted, so fixing the drivers seems like the most reasonable > approach to me. That said, I'm of course open to be proven wrong if > there are some drivers that fundamentally _must_ take a long time to > insert (but we should then discuss why that is and how we can best > deal with the situation, rather than adding some hack up-front when we > don't even know if it is needed). Ok hold on. Async probe on the driver core will be a new feature and there are even caveats that Tejun pointed out which are important for distributions to consider before embracing it. Of course folks can ignore these but by no means should it be considered that tons of device device drivers were broken, what we are providing is a new mechanism. And then there are device drivers which will need work in order to use async probe, some will require fixes on init / probe assumptions as I provided for the amd64_edac driver but for others only time will tell what is required. > Your patch series should go a long way towards fixing the drivers (and > I imagine there being a lot of low-hanging fruit that can easily be > fixed once your series has landed), and the fact that we have now > increased the udev timeout from 30 to 180 seconds should also greatly > reduce the problem. Sure, I do ask for folks to revisit the short term solution though, I did my best to communicate / document the issues. Luis -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html