Re: [PATCH net-next 2/4] net: mvpp2: add mvpp2_phylink_to_port() helper

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Russell,

czw., 10 gru 2020 o 16:46 Russell King - ARM Linux admin
<linux@xxxxxxxxxxxxxxx> napisał(a):
>
> On Thu, Dec 10, 2020 at 03:35:29PM +0100, Marcin Wojtas wrote:
> > Hi Greg,
> >
> > śr., 9 gru 2020 o 11:59 Greg Kroah-Hartman
> > <gregkh@xxxxxxxxxxxxxxxxxxx> napisał(a):
> > > What part fixes the issue?  I can't see it...
> >
> > I re-checked in my setup and here's the smallest part of the original
> > patch, that fixes previously described issue:
> > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > index e98be8372780..9d71a4fe1750 100644
> > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> > @@ -4767,6 +4767,11 @@ static void mvpp2_port_copy_mac_addr(struct
> > net_device *dev, struct mvpp2 *priv,
> >         eth_hw_addr_random(dev);
> >  }
> >
> > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config *config)
> > +{
> > +       return container_of(config, struct mvpp2_port, phylink_config);
> > +}
> > +
> >  static void mvpp2_phylink_validate(struct phylink_config *config,
> >                                    unsigned long *supported,
> >                                    struct phylink_link_state *state)
> > @@ -5105,13 +5110,12 @@ static void mvpp2_gmac_config(struct
> > mvpp2_port *port, unsigned int mode,
> >  static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode,
> >                              const struct phylink_link_state *state)
> >  {
> > -       struct net_device *dev = to_net_dev(config->dev);
> > -       struct mvpp2_port *port = netdev_priv(dev);
> > +       struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> >         bool change_interface = port->phy_interface != state->interface;
> >
> >         /* Check for invalid configuration */
> >         if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) {
> > -               netdev_err(dev, "Invalid mode on %s\n", dev->name);
> > +               netdev_err(port->dev, "Invalid mode on %s\n", port->dev->name);
> >                 return;
> >         }
> >
> > @@ -5151,8 +5155,7 @@ static void mvpp2_mac_link_up(struct
> > phylink_config *config,
> >                               int speed, int duplex,
> >                               bool tx_pause, bool rx_pause)
> >  {
> > -       struct net_device *dev = to_net_dev(config->dev);
> > -       struct mvpp2_port *port = netdev_priv(dev);
> > +       struct mvpp2_port *port = mvpp2_phylink_to_port(config);
> >         u32 val;
> >
> >         if (mvpp2_is_xlg(interface)) {
> > @@ -5199,7 +5202,7 @@ static void mvpp2_mac_link_up(struct
> > phylink_config *config,
> >
> >         mvpp2_egress_enable(port);
> >         mvpp2_ingress_enable(port);
> > -       netif_tx_wake_all_queues(dev);
> > +       netif_tx_wake_all_queues(port->dev);
> >  }
> >
> >  static void mvpp2_mac_link_down(struct phylink_config *config,
>
> The problem is caused by this hack:
>
>                 /* Phylink isn't used as of now for ACPI, so the MAC has to be
>                  * configured manually when the interface is started. This will
>                  * be removed as soon as the phylink ACPI support lands in.
>                  */
>                 struct phylink_link_state state = {
>                         .interface = port->phy_interface,
>                 };
>                 mvpp2_mac_config(&port->phylink_config, MLO_AN_INBAND, &state);
>                 mvpp2_mac_link_up(&port->phylink_config, MLO_AN_INBAND,
>                                   port->phy_interface, NULL);
>
> which passes an un-initialised (zeroed) port->phylink_config, as
> phylink is not used in ACPI setups.
>
> The problem occurs because port->phylink_config.dev (which is a
> NULL pointer in this instance) is passed to to_net_dev():
>
> #define to_net_dev(d) container_of(d, struct net_device, dev)
>
> Which then means netdev_priv(dev) attempts to dereference a not-quite
> NULL pointer, leading to an oops.

Correct. Hopefully the MDIO+ACPI patchset will land and I'll be able
to get rid of this hack and do not maintain dual handling paths any
longer.

>
> The problem here is that the bug was not noticed; it seems hardly
> anyone bothers to run mainline kernels with ACPI on Marvell platforms,
> or if they do, they don't bother reporting to mainline communities
> when they have problems. Likely, there's posts on some random web-based
> bulletin board or mailing list that kernel developers don't read
> somewhere complaining that there's an oops.
>

I must admit that due to other duties I did not follow the mainline
mvpp2 for a couple revisions (and I am not maintainer of it). However
recently I got reached-out directly by different developers - the
trigger was different distros upgrading the kernel above v5.4+ and for
some reasons the DT path is not chosen there (and the ACPI will be
chosen more and more in the SystemReady world).

> Like...
>
> https://lists.einval.com/pipermail/macchiato/2020-January/000309.html
> https://gist.github.com/AdrianKoshka/ff9862da2183a2d8e26d47baf8dc04b9
>
> This kind of segmentation is very disappointing; it means potentially
> lots of bugs go by unnoticed by kernel developers, and bugs only get
> fixed by chance.  Had it been reported to somewhere known earlier
> this year, it is likely that a proper fix patch would have been
> created.

Totally agree. As soon as I got noticed directly it took me no time to
find the regression root cause.

>
> How this gets handled is ultimately up to the stable developers to
> decide what they wish to accept. Do they wish to accept a back-ported
> full version of my commit 6c2b49eb9671 ("net: mvpp2: add
> mvpp2_phylink_to_port() helper") that unintentionally fixed this
> unknown issue, or do they want a more minimal fix such as the cut-down
> version of that commit that Marcin has supplied.
>
> Until something changes in the way bugs get reported, I suspect this
> won't be the only instance of bug-fixing-by-accident happening.
>

Thank you Russell for your input.

Best regards,
Marcin




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux