On Wed, 2014-02-26 at 10:49 -0500, Alan Stern wrote: > On Tue, 25 Feb 2014, Dan Williams wrote: > > > > Look, we guarantee that the peer relation is dropped when either of the > > > devices is unregistered. Therefore there's no need to take a reference > > > to the peer device. > > > > Ugh, yes. Reference counts alone don't save us from this scenario: > > > > CPU1 CPU2 > > mutex_lock(peer_lock) > > unlink_peers(portA, portC) > > mutex_unlock(peer_lock) > > mutex_lock(peer_lock) > > device_unregister(portA) > > link_peers(portB, portA) > > mutex_unlock(peer_lock) > > > > > This will cause a problem only if something else goes wrong. That's > > > why I suggested, in patch 2, that when a peering error is detected, you > > > print out the erroneous pointer rather than assuming it points to a > > > valid port structure. > > > > ...but even with the invalid port flag portA can theoretically still be > > a stale pointer. > > I don't see how. When CPU2 takes the lock it will certainly be after hubA->disconnected has been set, but I don't see what prevents de-referencing the hubA pointer from crashing. Continuing the above example: CPU1 CPU2 mutex_lock(peer_lock) unlink_peers(portA, portC) mutex_unlock(peer_lock) mutex_lock(peer_lock) device_unregister(portA) hubA = to_usb_hub(portA) kref_put(hubA) free(hubA) if (!hubA->disconnected) /* crash */ link_peers(portB, portA) mutex_unlock(peer_lock) I think this needs to be solved by portB not being able to lookup portA in the first place. Which I believe is addressed by wider synchronizing usb_hub_create_port_device() vs setting hub->disconnected (as you mention below). > > > > This is also why I suggested that you close the race in > > > usb_hub_remove_port_device: to make sure that the guarantee really is > > > enforced. > > > > You proposed a flag on the port_dev once it goes invalid. It seems to > > me the flag to check if a port is valid is parent_hub->disconnected. > > Good suggestion. > > > However, in the case above, I believe find_port_by_location() and > > find_default_peer() need to prevent the "link_peers(portB, portA)" > > event. When these routines walk down the tree they should be taking > > usb_lock_device() to make sure they are not racing with disconnect > > operations. > > > > Locking while traversing the list and otherwise checking for > > hub->disconnected closes the race. We can't solve the race at the > > port_dev level alone. Agreed? > > No. We already have the port_lock mutex; nothing else is needed. > Simply make link_peers fail if either of the hub->disconnected flags is > set. Then the scenario shown above can't happen, because link_peers > running on CPU2 will see that portA is stale. > > However, it might be a good idea for usb_hub_create_port_device to hold > the port_lock while doing > > hub->ports[port1 - 1] = port_dev; > > And perhaps this line should be moved down to just before the > device_register call. In fact, maybe that call should occur inside the > scope of the mutex. Something along these lines, yes. Would need to change the tier mismatch code to assume the peer_lock is already held, but that may work. I'll take a look. > > There may be other places where we have to be more careful about > assuming that hub->ports[port1 - 1] is non-NULL. Even though checking > the disconnected flag will prevent races with port destruction, there > could be races with port creation. *nod* -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html