"Du, ChangbinX" <changbinx.du@xxxxxxxxx> writes: >> From: Bjørn Mork [mailto:bjorn@xxxxxxx] >> Sent: Tuesday, October 29, 2013 4:41 PM >> To: Du, ChangbinX >> Cc: oliver@xxxxxxxxxx; linux-usb@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; >> linux-kernel@xxxxxxxxxxxxxxx >> Subject: Re: [PATCH] net/cdc_ncm: fix null pointer panic at usbnet_link_change >> >> "Du, ChangbinX" <changbinx.du@xxxxxxxxx> writes: >> >> > From: "Du, Changbin" <changbinx.du@xxxxxxxxx> >> > >> > In cdc_ncm_bind() function, it call cdc_ncm_bind_common() to setup usb. >> > But cdc_ncm_bind_common() may meet error and cause usbnet_disconnect() >> > be called which calls free_netdev(net). >> >> I am sure you are right, but I really don't see how that can happen. >> >> AFAICS, we ensure that the intfdata is set to NULL before calling >> usb_driver_release_interface() in the error cleanup in >> cdc_ncm_bind_common(): >> >> >> error2: >> usb_set_intfdata(ctx->control, NULL); >> usb_set_intfdata(ctx->data, NULL); >> if (ctx->data != ctx->control) >> usb_driver_release_interface(driver, ctx->data); >> error: >> cdc_ncm_free((struct cdc_ncm_ctx *)dev->data[0]); >> dev->data[0] = 0; >> dev_info(&dev->udev->dev, "bind() failure\n"); >> return -ENODEV; >> >> >> Thus we hit the test in disconnect, and usbnet_disconnect() is never >> called: >> >> static void cdc_ncm_disconnect(struct usb_interface *intf) >> { >> struct usbnet *dev = usb_get_intfdata(intf); >> >> if (dev == NULL) >> return; /* already disconnected */ >> >> usbnet_disconnect(intf); >> } >> >> So we should really be OK, but we are not???? I would appreciate it if >> anyone took the time to feed me why, with a very small spoon... >> > > Yes, you are right. It should not happen if cdc_ncm_disconnect is not > called. But this really happened. It produced on kernel 3.10. Yes, I do not doubt it. I just don't understand it. There is no contradiction there. I believe lots of stuff I don't understand :-) The version this happened is very useful, although I don't think there are any relevant changes after v3.10. > Here is the full panic message for you to refer. I will get a method try to > reproduce it. That would be great if you were able to. Otherwise it will be difficult to verify that the proposed fix works. In any case, as I said: I believe a fix which avoids the call to usbnet_link_change() in case of bind failure is correct and appropriate. But I suggest that you do it by removing the call and comment from cdc_ncm_bind() and instead set the FLAG_LINK_INTR in the driver_info. That's how this should have been implemented from the beginning. > [ 15.635727] BUG: Bad page state in process khubd pfn:31994 > [ 15.641989] page:f3ad9280 count:0 mapcount:0 mapping:00000020 index:0x0 > [ 15.649384] page flags: 0x40000000() > [ 15.670096] BUG: unable to handle kernel NULL pointer dereference at 00000078 > [ 15.678078] IP: [<c24b7f5e>] usbnet_link_change+0x1e/0x80 > [ 15.684120] *pde = 00000000 > [ 15.687339] Oops: 0000 [#1] SMP > [ 15.690953] Modules linked in: atmel_mxt_ts vxd392 videobuf_vmalloc videobuf_core bcm_bt_lpm bcm432) > [ 15.702658] CPU: 0 PID: 573 Comm: khubd Tainted: G B W O 3.10.1+ #1 > [ 15.710241] task: f27e8f30 ti: f2084000 task.ti: f2084000 > [ 15.716270] EIP: 0060:[<c24b7f5e>] EFLAGS: 00010246 CPU: 0 > [ 15.722396] EIP is at usbnet_link_change+0x1e/0x80 > [ 15.727747] EAX: f18524c0 EBX: 00000000 ECX: f18524c0 EDX: 00000000 > [ 15.734746] ESI: f18524c0 EDI: 00000000 EBP: f2085b7c ESP: f2085b70 > [ 15.741746] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 > [ 15.747775] CR0: 8005003b CR2: 00000078 CR3: 02c3b000 CR4: 001007d0 > [ 15.754774] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 15.761774] DR6: ffff0ff0 DR7: 00000400 > [ 15.766054] Stack: > [ 15.768295] 00000000 f18524c0 c2a03818 f2085b8c c24bc69a f1852000 f1f52e00 f2085be0 > [ 15.776991] c24b8d32 00000000 00000001 00000000 c2167f2c f2085bb4 c2821253 f2085bec > [ 15.785687] 00000246 00000246 f18d8088 f1f52e1c f1fce464 f1fce400 f18524c0 c28a06e0 > [ 15.794376] Call Trace: > [ 15.797109] [<c24bc69a>] cdc_ncm_bind+0x3a/0x50 > [ 15.802267] [<c24b8d32>] usbnet_probe+0x282/0x7d0 > [ 15.807621] [<c2167f2c>] ? sysfs_new_dirent+0x6c/0x100 > [ 15.813460] [<c2821253>] ? mutex_lock+0x13/0x40 > [ 15.818618] [<c24bb278>] cdc_ncm_probe+0x8/0x10 > [ 15.823777] [<c24e0ef7>] usb_probe_interface+0x187/0x2c0 I still do not understand how this happens, but usbnet_link_change will schedule some delayed work which absolutely is not appropriate if usbnet_probe fails. There are no cancel_work calls in the usbnet_probe exit path.... So the call should definitely be avoided if bind fails. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html