Re: [PATCH v4 5/5] r8152: Block future register access if register access fails

Doug Anderson <dianders@xxxxxxxxxxxx> · Fri, 20 Oct 2023 08:42:07 -0700

Hi,

On Fri, Oct 20, 2023 at 4:31 AM Hayes Wang <hayeswang@xxxxxxxxxxx> wrote:
>
> Douglas Anderson <dianders@xxxxxxxxxxxx>
> > Sent: Friday, October 20, 2023 5:20 AM
> [...]
> >  static int generic_ocp_read(struct r8152 *tp, u16 index, u16 size,
> > @@ -8265,6 +8353,17 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
> >         if (!tp)
> >                 return 0;
> >
> > +       /* We can only use the optimized reset if we made it to the end of
> > +        * probe without any register access fails, which sets
> > +        * `PROBED_WITH_NO_ERRORS` to true. If we didn't have that then return
> > +        * an error here which tells the USB framework to fully unbind/rebind
> > +        * our driver.
> > +        */
> > +       if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) {
> > +               mutex_unlock(&tp->control);
>
> I think you forget to remove mutex_unlock here.

Ugh, thanks for catching. I tested it with a bootup or two but I
didn't re-run all tests or spend lots of time looking through the logs
so I missed this. I'll run a few more cycles this time.

> > +               return -EIO;
> > +       }
> > +
> >         netdev = tp->netdev;
> >         if (!netif_running(netdev))
> >                 return 0;
> > @@ -8277,7 +8376,9 @@ static int rtl8152_pre_reset(struct usb_interface *intf)
> >         napi_disable(&tp->napi);
> >         if (netif_carrier_ok(netdev)) {
> >                 mutex_lock(&tp->control);
> > +               set_bit(IN_PRE_RESET, &tp->flags);
> >                 tp->rtl_ops.disable(tp);
> > +               clear_bit(IN_PRE_RESET, &tp->flags);
> >                 mutex_unlock(&tp->control);
> >         }
> >
> > @@ -8293,6 +8394,8 @@ static int rtl8152_post_reset(struct usb_interface *intf)
> >         if (!tp)
> >                 return 0;
> >
> > +       rtl_set_accessible(tp);
> > +
>
> Excuse me. I have a new idea. You could check if it is possible.
> If you remove test_bit(PROBED_WITH_NO_ERRORS, &tp->flags) in pre_reset(),
> the driver wouldn't be unbound and rebound. Instead, you test PROBED_WITH_NO_ERRORS
> here to re-initialize the device. Then, you could limit the times of USB reset, and
> the infinite loop wouldn't occur. The code would be like the following,
>
>         if (!test_bit(PROBED_WITH_NO_ERRORS, &tp->flags)) {
>                 /* re-init */
>                 mutex_lock(&tp->control);
>                 tp->rtl_ops.init(tp);
>                 mutex_unlock(&tp->control);
>                 rtl_hw_phy_work_func_t(&tp->hw_phy_work.work);
>
>                 /* re-open(). Maybe move after checking netif_running(netdev) */
>                 mutex_lock(&tp->control);
>                 tp->rtl_ops.up(tp);
>                 mutex_unlock(&tp->control);
>
>                 /* check if there is any control error */
>                 if (test_bit(RTL8152_INACCESSIBLE, &tp->flags) {
>                         if (tp->reg_access_reset_count < REGISTER_ACCESS_MAX_RESETS) {
>                                 /* queue reset again ? */
>                         } else {
>                                 ...
>                         }
>                         /* return 0 ? */
>                 } else {
>                         set_bit(PROBED_WITH_NO_ERRORS, &tp->flags)
>                 }
>         }

The above solution worries me.

I guess one part of this is that it replicates some logic that's in
probe(). That's not necessarily awful, but we'd at least want to
reorganize things so that they could share code if possible, though
maybe that's hard to do with the extra grabs of the mutex?

The other part that worries me is that in the core when we added the
network device that something in the core might have cached bogus data
about our network device. This doesn't seem wonderful to me.

I guess yet another part is that your proposed solution there has a
whole bunch of question marks on it. If it's not necessarily obvious
what we should do in this case then it doesn't feel like a robust
solution.

It seems like your main concern here is with the potential for an
infinite number of resets. I have sent up a small patch to the USB
core [1] addressing this concern. Let's see what folks say about that
patch. If it is accepted then it seems like we could just not worry
about it. If it's not accepted then perhaps feedback on that patch
will give us additional guidance.

In the meantime I'll at least post v5 since I don't want to leave the
patch up there with the mismatched mutex. I'll have my v5 point at my
USB core patch.

[1] https://lore.kernel.org/r/20231020083125.1.I3e5f7abcbf6f08d392e31a5826b7f234df662276@changeid

-Doug