Re: Should commit abb139e75c2 about "kernel loading firmware directly from fs" be backported to stable trees ?

Francis Moreau <francis.moro@xxxxxxxxx> · Sun, 21 Jul 2013 09:56:02 +0200

On Sun, Jul 21, 2013 at 12:50 AM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Sat, Jul 20, 2013 at 2:22 PM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>>      For older kernels, like 3.4, that version of udev should
>> still work ok, but I can't remember where the problem started showing up
>> so I could be totally wrong...
>
> It was never kernel-related, it was an udev regression. So if somebody
> is updating udev without updating the kernel... I forget which udev
> version it would be that introduced the "wait synchronously for
> firmware load", though. It was mentioned somewhere deep in that
> thread.

I havent found that information.  The only hint I found was in one of your post:

"I don't know where the problem started in udev, but the report I saw
was that udev175 was fine, and udev182 was broken, and would deadlock
if module_init() did a request_firmware()."

here: http://thread.gmane.org/gmane.linux.kernel/1368617

>
> The Fedora 17 problems with media driver firmware happened with
> udev-182, and Francis is talking about 181, but I don't know if the
> problem was _introduced_ in 182 or might affect the earlier 181
> version too.

I tried to dig into the udev repository but was unable to find the
exact culprit commit.

But I found commit e64fae5573e566ce4fd9b23c68ac8f3096603314 whose message is:

    udevd: kill hanging event processes after 30 seconds

    Some broken kernel drivers load firmware synchronously in the module init
    path and block modprobe until the firmware request is fulfilled.

    The modprobe-generated firmware request is a direct child device of the
    device which caused modprobe to run. Child device event are blocked until
    the parent device is handled. This dead-locks until the kernel firmware
    loading timeout of 60 seconds is reached.

    The hanging modprobe event should now time-out and allow the firmware
    event to run before the 60 second kernel timeout.

which already mentions the broken behaviour of udev which suggests
that the problem can be found earlier.

The commit description is : 177-4-ge64fae5 therefore I think it's
reasonnable to claim that all versoins of udev >= 178 are affected.

Back to my initial question, and assuming it's reasonnable to run an
old kernel with affected udev (>= 178 or maybe earlier), how this
should be fixed ? Do you think it makes sense to backport your initial
fix ?

Thanks
--
Francis
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html