Hi Russell, czw., 10 gru 2020 o 16:46 Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> napisał(a): > > On Thu, Dec 10, 2020 at 03:35:29PM +0100, Marcin Wojtas wrote: > > Hi Greg, > > > > śr., 9 gru 2020 o 11:59 Greg Kroah-Hartman > > <gregkh@xxxxxxxxxxxxxxxxxxx> napisał(a): > > > What part fixes the issue? I can't see it... > > > > I re-checked in my setup and here's the smallest part of the original > > patch, that fixes previously described issue: > > diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > index e98be8372780..9d71a4fe1750 100644 > > --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c > > @@ -4767,6 +4767,11 @@ static void mvpp2_port_copy_mac_addr(struct > > net_device *dev, struct mvpp2 *priv, > > eth_hw_addr_random(dev); > > } > > > > +static struct mvpp2_port *mvpp2_phylink_to_port(struct phylink_config *config) > > +{ > > + return container_of(config, struct mvpp2_port, phylink_config); > > +} > > + > > static void mvpp2_phylink_validate(struct phylink_config *config, > > unsigned long *supported, > > struct phylink_link_state *state) > > @@ -5105,13 +5110,12 @@ static void mvpp2_gmac_config(struct > > mvpp2_port *port, unsigned int mode, > > static void mvpp2_mac_config(struct phylink_config *config, unsigned int mode, > > const struct phylink_link_state *state) > > { > > - struct net_device *dev = to_net_dev(config->dev); > > - struct mvpp2_port *port = netdev_priv(dev); > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > bool change_interface = port->phy_interface != state->interface; > > > > /* Check for invalid configuration */ > > if (mvpp2_is_xlg(state->interface) && port->gop_id != 0) { > > - netdev_err(dev, "Invalid mode on %s\n", dev->name); > > + netdev_err(port->dev, "Invalid mode on %s\n", port->dev->name); > > return; > > } > > > > @@ -5151,8 +5155,7 @@ static void mvpp2_mac_link_up(struct > > phylink_config *config, > > int speed, int duplex, > > bool tx_pause, bool rx_pause) > > { > > - struct net_device *dev = to_net_dev(config->dev); > > - struct mvpp2_port *port = netdev_priv(dev); > > + struct mvpp2_port *port = mvpp2_phylink_to_port(config); > > u32 val; > > > > if (mvpp2_is_xlg(interface)) { > > @@ -5199,7 +5202,7 @@ static void mvpp2_mac_link_up(struct > > phylink_config *config, > > > > mvpp2_egress_enable(port); > > mvpp2_ingress_enable(port); > > - netif_tx_wake_all_queues(dev); > > + netif_tx_wake_all_queues(port->dev); > > } > > > > static void mvpp2_mac_link_down(struct phylink_config *config, > > The problem is caused by this hack: > > /* Phylink isn't used as of now for ACPI, so the MAC has to be > * configured manually when the interface is started. This will > * be removed as soon as the phylink ACPI support lands in. > */ > struct phylink_link_state state = { > .interface = port->phy_interface, > }; > mvpp2_mac_config(&port->phylink_config, MLO_AN_INBAND, &state); > mvpp2_mac_link_up(&port->phylink_config, MLO_AN_INBAND, > port->phy_interface, NULL); > > which passes an un-initialised (zeroed) port->phylink_config, as > phylink is not used in ACPI setups. > > The problem occurs because port->phylink_config.dev (which is a > NULL pointer in this instance) is passed to to_net_dev(): > > #define to_net_dev(d) container_of(d, struct net_device, dev) > > Which then means netdev_priv(dev) attempts to dereference a not-quite > NULL pointer, leading to an oops. Correct. Hopefully the MDIO+ACPI patchset will land and I'll be able to get rid of this hack and do not maintain dual handling paths any longer. > > The problem here is that the bug was not noticed; it seems hardly > anyone bothers to run mainline kernels with ACPI on Marvell platforms, > or if they do, they don't bother reporting to mainline communities > when they have problems. Likely, there's posts on some random web-based > bulletin board or mailing list that kernel developers don't read > somewhere complaining that there's an oops. > I must admit that due to other duties I did not follow the mainline mvpp2 for a couple revisions (and I am not maintainer of it). However recently I got reached-out directly by different developers - the trigger was different distros upgrading the kernel above v5.4+ and for some reasons the DT path is not chosen there (and the ACPI will be chosen more and more in the SystemReady world). > Like... > > https://lists.einval.com/pipermail/macchiato/2020-January/000309.html > https://gist.github.com/AdrianKoshka/ff9862da2183a2d8e26d47baf8dc04b9 > > This kind of segmentation is very disappointing; it means potentially > lots of bugs go by unnoticed by kernel developers, and bugs only get > fixed by chance. Had it been reported to somewhere known earlier > this year, it is likely that a proper fix patch would have been > created. Totally agree. As soon as I got noticed directly it took me no time to find the regression root cause. > > How this gets handled is ultimately up to the stable developers to > decide what they wish to accept. Do they wish to accept a back-ported > full version of my commit 6c2b49eb9671 ("net: mvpp2: add > mvpp2_phylink_to_port() helper") that unintentionally fixed this > unknown issue, or do they want a more minimal fix such as the cut-down > version of that commit that Marcin has supplied. > > Until something changes in the way bugs get reported, I suspect this > won't be the only instance of bug-fixing-by-accident happening. > Thank you Russell for your input. Best regards, Marcin