Hi, On Fri, Oct 20, 2023 at 4:31 AM Hayes Wang <hayeswang@xxxxxxxxxxx> wrote: > > Douglas Anderson <dianders@xxxxxxxxxxxx> > > Sent: Friday, October 20, 2023 5:20 AM > [...] > > static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size, > > @@ -8265,6 +8353,17 @@ static int rtl8152_pre_reset(struct usb_interface *intf) > > if (!tp) > > return 0; > > > > + /* We can only use the optimized reset if we made it to the end of > > + * probe without any register access fails, which sets > > + * `PROBED_WITH_NO_ERRORS` to true. If we didn't have that then return > > + * an error here which tells the USB framework to fully unbind/rebind > > + * our driver. > > + */ > > + if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) { > > + mutex_unlock(&tp->control); > > I think you forget to remove mutex_unlock here. Ugh, thanks for catching. I tested it with a bootup or two but I didn't re-run all tests or spend lots of time looking through the logs so I missed this. I'll run a few more cycles this time. > > + return -EIO; > > + } > > + > > netdev = tp->netdev; > > if (!netif_running(netdev)) > > return 0; > > @@ -8277,7 +8376,9 @@ static int rtl8152_pre_reset(struct usb_interface *intf) > > napi_disable(&tp->napi); > > if (netif_carrier_ok(netdev)) { > > mutex_lock(&tp->control); > > + set_bit(IN_PRE_RESET, &tp->flags); > > tp->rtl_ops.disable(tp); > > + clear_bit(IN_PRE_RESET, &tp->flags); > > mutex_unlock(&tp->control); > > } > > > > @@ -8293,6 +8394,8 @@ static int rtl8152_post_reset(struct usb_interface *intf) > > if (!tp) > > return 0; > > > > + rtl_set_accessible(tp); > > + > > Excuse me. I have a new idea. You could check if it is possible. > If you remove test_bit(PROBED_WITH_NO_ERRORS, &tp->flags) in pre_reset(), > the driver wouldn't be unbound and rebound. Instead, you test PROBED_WITH_NO_ERRORS > here to re-initialize the device. Then, you could limit the times of USB reset, and > the infinite loop wouldn't occur. The code would be like the following, > > if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) { > /* re-init */ > mutex_lock(&tp->control); > tp->rtl_ops.init(tp); > mutex_unlock(&tp->control); > rtl_hw_phy_work_func_t(&tp->hw_phy_work.work); > > /* re-open(). Maybe move after checking netif_running(netdev) */ > mutex_lock(&tp->control); > tp->rtl_ops.up(tp); > mutex_unlock(&tp->control); > > /* check if there is any control error */ > if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) { > if (tp->reg_access_reset_count < REGISTER_ACCESS_MAX_RESETS) { > /* queue reset again ? */ > } else { > ... > } > /* return 0 ? */ > } else { > set_bit(PROBED_WITH_NO_ERRORS, &tp->flags) > } > } The above solution worries me. I guess one part of this is that it replicates some logic that's in probe(). That's not necessarily awful, but we'd at least want to reorganize things so that they could share code if possible, though maybe that's hard to do with the extra grabs of the mutex? The other part that worries me is that in the core when we added the network device that something in the core might have cached bogus data about our network device. This doesn't seem wonderful to me. I guess yet another part is that your proposed solution there has a whole bunch of question marks on it. If it's not necessarily obvious what we should do in this case then it doesn't feel like a robust solution. It seems like your main concern here is with the potential for an infinite number of resets. I have sent up a small patch to the USB core [1] addressing this concern. Let's see what folks say about that patch. If it is accepted then it seems like we could just not worry about it. If it's not accepted then perhaps feedback on that patch will give us additional guidance. In the meantime I'll at least post v5 since I don't want to leave the patch up there with the mismatched mutex. I'll have my v5 point at my USB core patch. [1] https://lore.kernel.org/r/20231020083125.1.I3e5f7abcbf6f08d392e31a5826b7f234df662276@changeid -Doug