> > > +static int pd692x0_sendrecv_msg(struct pd692x0_priv *priv, > > > + struct pd692x0_msg *msg, > > > + struct pd692x0_msg_content *buf) > > > +{ > > > + struct device *dev = &priv->client->dev; > > > + int ret; > > > + > > > + ret = pd692x0_send_msg(priv, msg); > > > + if (ret) > > > + return ret; > > > + > > > + ret = pd692x0_recv_msg(priv, msg, buf); > > > > So this function takes at least 10 seconds? > > No, on normal communication it takes a bit more than 30ms. So i think the first step is to refactor this code to make it clear what the normal path is, and what the exception path is, and the timing of each. > > > + msg.content.sub[2] = id; > > > + ret = pd692x0_sendrecv_msg(priv, &msg, &buf); > > > > So this is also 10 seconds? > > > > Given its name, it looks like this is called via ethtool? Is the > > ethtool core holding RTNL? It is generally considered bad to hold RTNL for > > that long. > > Yes it is holding RTNL lock. Should I consider another behavior in case of > communication loss to not holding RTNL lock so long? How often does it happen? On the scale of its a theoretical possibility, through to it happens every N calls? Also, does it happen on cold boot and reboot? If its never supposed to happen, i would keep holding RTNL, and add a pr_warn() that the PSE has crashed and burned, waiting for it to reboot. If this is likely to happen on the first communication with the device, we might want to do a dummy transfer during probe to get is synchronized before we start using it with the RTNL held. Andrew