On Wed, Dec 12, 2018 at 01:50:10PM +0000, Igor Russkikh wrote: > From: Dmitry Bezrukov <dmitry.bezrukov@xxxxxxxxxxxx> > > Lab testing shows that chip may get overheated when > network link is connected on high speed (2.5G/5G). > > Default hardware design uses only passive heatsink without a cooler, > and that makes things worse. > > To prevent possible chip damage here we add throttling mechanism. > > There is a worker that monitors the PHY's temperature via reading > PHY registers. When PHY's temperature reaches the critical threshold > (it is 106 degrees of Celsius) it changes the link speed to the lowest > regardless user/default link speed settings. It should reduce the PHY's > temperature to normal numbers. > > When the PHY's temparature is back to normal (low threshold, it is > 85 degrees) it restores user/default link speed settings. Hi Igor Please could you also export the temperature using HWMON. The Marvell PHY drivers are good examples. I also have a patch for driver/net/phy/aquantia.c which adds HWMON support to that PHY. > > Signed-off-by: Dmitry Bezrukov <dmitry.bezrukov@xxxxxxxxxxxx> > Signed-off-by: Igor Russkikh <igor.russkikh@xxxxxxxxxxxx> > --- > drivers/net/usb/aqc111.c | 78 ++++++++++++++++++++++++++++++++++++++++ > drivers/net/usb/aqc111.h | 8 +++++ > 2 files changed, 86 insertions(+) > > diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c > index 8867f6a3eaa7..fa6b9dfc2a6f 100644 > --- a/drivers/net/usb/aqc111.c > +++ b/drivers/net/usb/aqc111.c > @@ -17,6 +17,7 @@ > #include <linux/usb/cdc.h> > #include <linux/usb/usbnet.h> > #include <linux/linkmode.h> > +#include <linux/workqueue.h> > > #include "aqc111.h" > > @@ -334,6 +335,9 @@ static void aqc111_set_phy_speed(struct usbnet *dev, u8 autoneg, u16 speed) > aqc111_data->phy_cfg |= (3 << AQ_DSH_RETRIES_SHIFT) & > AQ_DSH_RETRIES_MASK; > > + if (aqc111_data->thermal_throttling) > + speed = SPEED_100; > + That is a big jump. Do you need to cool is down quickly, or would 1Gbps be sufficient? I think as a user, i would prefer it ping-pongs between 5G and 1G, not 5G and 100M. > if (autoneg == AUTONEG_ENABLE) { > switch (speed) { > case SPEED_5000: It looks like this only works when auto-neg is enabled. If i've fixed configured it i don't think this works? > @@ -714,6 +718,8 @@ static int aqc111_bind(struct usbnet *dev, struct usb_interface *intf) > /* store aqc111_data pointer in device data field */ > dev->driver_priv = aqc111_data; > > + aqc111_data->dev = dev; > + > /* Init the MAC address */ > ret = aqc111_read_perm_mac(dev); > if (ret) > @@ -991,6 +997,71 @@ static int aqc111_link_reset(struct usbnet *dev) > return 0; > } > > +int aqc111_get_temperature(struct usbnet *dev, u32 *temperature) > +{ > + u16 reg16 = 0; > + > + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS2, AQ_PHY_GLOBAL_ADDR, > + ®16) < 0) > + goto err; > + > + if (!(reg16 & AQ_THERM_ST_READY)) > + goto err; > + > + if (aqc111_mdio_read(dev, AQ_GLB_THERMAL_STATUS1, AQ_PHY_GLOBAL_ADDR, > + ®16) < 0) > + goto err; > + > + /*convert from 1/256 to 1/100 degrees of Celsius*/ > + *temperature = (u32)reg16 * 100 / 256; > + return 0; > + > +err: > + *temperature = 0; > + return -1; > +} > + > +void aqc111_thermal_work_cb(struct work_struct *w) > +{ > + struct delayed_work *dw = to_delayed_work(w); > + struct aqc111_data *aqc111_data = container_of(dw, struct aqc111_data, > + therm_work); > + unsigned long timeout = msecs_to_jiffies(AQ_THERMAL_TIMER_MS); > + struct usbnet *dev = aqc111_data->dev; > + u32 temperature = 0; > + u8 reset_speed = 0; > + > + if (!aqc111_data->link) > + /* poll not so frequently */ > + timeout *= 2; > + > + if (aqc111_get_temperature(dev, &temperature) != 0) > + goto end; > + > + if (aqc111_data->thermal_throttling && > + temperature <= AQ_NORMAL_TEMP_THRESHOLD) { > + netdev_info(dev->net, "The temperature is back to normal(%u)", > + AQ_NORMAL_TEMP_THRESHOLD / 100); How often do you see these messages? I'm wondering if they need to be rate limited? > + aqc111_data->thermal_throttling = 0; > + reset_speed = 1; > + } > + > + if (!aqc111_data->thermal_throttling && > + temperature >= AQ_CRITICAL_TEMP_THRESHOLD) { Should there be some hysteresis in here? In fact, if temperature is AQ_CRITICAL_TEMP_THRESHOLD it is both back to normal, and throttled at the same time! > + netdev_warn(dev->net, "Critical temperature(%u) is reached", > + AQ_CRITICAL_TEMP_THRESHOLD / 100); > + aqc111_data->thermal_throttling = 1; > + reset_speed = 1; update_speed might be a better name, since you are not always resetting it. Andrew