Re: Oops after register_netdev() failure in 2.6.3-bk5

viro@parcelfarce.linux.theplanet.co.uk · Tue, 24 Feb 2004 04:16:06 +0000

On Mon, Feb 23, 2004 at 10:49:50PM -0500, Pavel Roskin wrote:
> > -	for (i = 0; i < numdummies && !err; i++)
> > +	for (i = 0; !err && i < numdummies && !err; i++)
> >  		err = dummy_init_one(i);
> >  	if (err) {

<stares at the line above>
Duh.

> Add "i--;" here.  The device that failed doesn't need another
> free_netdev().

Right you are - and change above hadn't done anything at all, since we
already had !err check there.  How about simpler variant making the
check explicit?

        for (i = 0; i < numdummies; i++) {
                err = dummy_init_one(i);
                if (err) {
                        while (--i >= 0)
                                dummy_free_one(i);
                        break;
                }
        }
        return err;

> > Now, which driver have you actually seen it in?
> 
> orinoco_plx.  I've put the current snapshot here:
> http://www.red-bean.com/~proski/tmp/orinoco-oops.tar.gz
> 
> orinoco_init() in orinoco.c has been modified to fail always.
> 
> I could try to reduce the driver to another dummy that can be run on any
> system, but it will take time.
> 
> # modprobe orinoco_plx
> orinoco.c 0.14alpha2HEAD (David Gibson <hermes@gibson.dropbear.id.au>,
> Pavel Roskin <proski@gnu.org>, et al)
> orinoco_plx.c 0.14alpha2HEAD (Daniel Barlow <dan@telent.net>, David Gibson
> <hermes@gibson.dropbear.id.au>)
> orinoco_plx: CIS: 01:03:00:00:FF:17:04:67:5A:08:FF:1D:05:01:67:5A:
> orinoco_plx: Local Interrupt already enabled
> Detected Orinoco/Prism2 PLX device at 0000:01:00.0 irq:12, io addr:0xc400
> orinoco_plx: init_one(), FAIL!
> orinoco_plx: probe of 0000:01:00.0 failed with error -16
> Unable to handle kernel paging request at virtual address 6b6b6b77
>  printing eip:
> c02d0d88
> *pde = 00000000
> Oops: 0000 [#1]
> CPU:    0
> EIP:    0060:[<c02d0d88>]    Not tainted
> EFLAGS: 00010202
> EIP is at rtnetlink_fill_ifinfo+0x278/0x430
> eax: 6b6b6b6b   ebx: cf03e0c0   ecx: 00000640   edx: cd4e3920
> esi: 00000000   edi: cd4e38a8   ebp: c12b9ed8   esp: c12b9eb4
> ds: 007b   es: 007b   ss: 0068
> Process events/0 (pid: 3, threadinfo=c12b8000 task=c12bcc80)
> Stack: c12b9ec8 00000640 00000f40 cd4e3000 cf05fde4 6b6b6b6b cf05fde4
> 00000000
>        00000010 c12b9efc c02d11b6 00000000 00000000 00000000 cf03e0c0
> cf03e0c0
>        c12b9f20 c12b9f20 c12b9f34 c02d1a67 00000001 00000003 cf121340
> 5de4c35d
> Call Trace:
>  [<c02d11b6>] rtmsg_ifinfo+0x46/0xb0
>  [<c02d1a67>] linkwatch_run_queue+0x147/0x1f0
>  [<c02d1b52>] linkwatch_event+0x42/0x70
>  [<c01363b4>] worker_thread+0x1f4/0x3e0
>  [<c0119f67>] recalc_task_prio+0x97/0x1c0
>  [<c02d1b10>] linkwatch_event+0x0/0x70
>  [<c011b6e0>] default_wake_function+0x0/0x10
>  [<c011b6e0>] default_wake_function+0x0/0x10
>  [<c013b105>] kthread+0x95/0xa0
>  [<c01361c0>] worker_thread+0x0/0x3e0
>  [<c013b070>] kthread+0x0/0xa0
>  [<c0107019>] kernel_thread_helper+0x5/0xc

> Reordering statements in orinoco_plx_init_one() after "fail:" may prevent
> the oops, but only the first time.  If the module is unloaded and loaded
> again, it crashes.
> 
> Commenting out free_orinocodev() (wrapper around free_netdev()) fixes the
> oops, but I think it could leave some allocated memory.  That's the likely
> workaround if we fail to fix the problem.

Very interesting.  OK, I'll check that one out.
-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html