On 29/09/14 15:02, Konrad Rzeszutek Wilk wrote: > On Sat, Sep 27, 2014 at 12:36:42AM +0800, Chen Gang wrote: >> When xenbus_switch_state() fails, it will call xenbus_switch_fatal() > > Only on the first depth, not on the subsequent ones (as in if > the first xenbus_switch_fail fails, it won't try to call > xenbus_switch_state again and again). > >> internally, so need not return any status value, then use 'void' instead >> of 'int' for xenbus_switch_state() and __xenbus_switch_state(). > > When that switch occurs (to XenbusStateConnected) won't the watches > fire - meaning we MUST make sure that the watch functions - if they > use the xenbus_switch_state() they MUST not hold any locks - because > they could be executed once more? > > Oh wait, we don't have to worry about that right now as the callbacks > that pick up the messages from the XenBus are all gated on one mutex > anyhow. > > Hm, anyhow, I would add this extra piece of information to the patch: > > > diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c > index c214daa..f7399fd 100644 > --- a/drivers/xen/xen-pciback/xenbus.c > +++ b/drivers/xen/xen-pciback/xenbus.c > @@ -661,6 +661,12 @@ static void xen_pcibk_be_watch(struct xenbus_watch *watch, > > switch (xenbus_read_driver_state(pdev->xdev->nodename)) { > case XenbusStateInitWait: > + /* > + * xenbus_switch_state can call xenbus_switch_fatal which will > + * immediately set the state to XenbusStateClosing which > + * means if we were reading for it here we MUST drop any > + * locks so that we don't dead-lock. > + */ Watches are asynchronous and serialised by the xenwatch thread. I can't see what deadlock you're talking about here. Particularly since the backend doesn't watch its own state node (it watches the frontend one). > xen_pcibk_setup_backend(pdev); > break; > >> >> Also need be sure that all callers which check the return value must let >> 'err' be 0. > > I am bit uncomfortable with that, that is due to: > > > .. snip.. >> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c >> index 9c47b89..b5c3d47 100644 >> --- a/drivers/net/xen-netback/xenbus.c >> +++ b/drivers/net/xen-netback/xenbus.c >> @@ -337,10 +337,7 @@ static int netback_probe(struct xenbus_device *dev, >> if (err) >> pr_debug("Error writing multi-queue-max-queues\n"); >> >> - err = xenbus_switch_state(dev, XenbusStateInitWait); >> - if (err) >> - goto fail; >> - >> + xenbus_switch_state(dev, XenbusStateInitWait); > > Which if it fails it won't call: > > 354 fail: > 355 pr_debug("failed\n"); > 356 netback_remove(dev); > 357 return err; > > > And since there is no watch on the backend state to go in Closing it won't > ever call those and we leak memory. It's not leaking the memory. All resources will be recovered when the device is removed. > The same is for xen-blkback mechanism in the probe function. David -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html