On Wed, 22 Nov 2023 17:54:53 +0100 Andrew Lunn <andrew@xxxxxxx> wrote: > > > > +static int pd692x0_sendrecv_msg(struct pd692x0_priv *priv, > > > > + struct pd692x0_msg *msg, > > > > + struct pd692x0_msg_content *buf) > > > > +{ > > > > + struct device *dev = &priv->client->dev; > > > > + int ret; > > > > + > > > > + ret = pd692x0_send_msg(priv, msg); > > > > + if (ret) > > > > + return ret; > > > > + > > > > + ret = pd692x0_recv_msg(priv, msg, buf); > > > > > > So this function takes at least 10 seconds? > > > > No, on normal communication it takes a bit more than 30ms. > > So i think the first step is to refactor this code to make it clear > what the normal path is, and what the exception path is, and the > timing of each. Ok I will try to refactor it to makes it more readable. > > > > + msg.content.sub[2] = id; > > > > + ret = pd692x0_sendrecv_msg(priv, &msg, &buf); > > > > > > So this is also 10 seconds? > > > > > > Given its name, it looks like this is called via ethtool? Is the > > > ethtool core holding RTNL? It is generally considered bad to hold RTNL for > > > that long. > > > > Yes it is holding RTNL lock. Should I consider another behavior in case of > > communication loss to not holding RTNL lock so long? > > How often does it happen? On the scale of its a theoretical > possibility, through to it happens every N calls? Also, does it happen > on cold boot and reboot? > > If its never supposed to happen, i would keep holding RTNL, and add a > pr_warn() that the PSE has crashed and burned, waiting for it to > reboot. If this is likely to happen on the first communication with > the device, we might want to do a dummy transfer during probe to get > is synchronized before we start using it with the RTNL held. I would say it never supposed to happen. I never faced the issue playing with the controller. The first communication is a simple i2c_master_recv of the controller status without entering the pd692x0_sendrecv_msg function, therefore it won't be an issue. Another solution could be to raise a flag if I enter in communication loss and release the rtnlock. We would lock again the rtnl when the flags got disabled. The controler won't be accessible until the flag is disabled. Regards, -- Köry Maincent, Bootlin Embedded Linux and kernel engineering https://bootlin.com