On Fri, Jul 14, 2017 at 11:37:36AM -0400, John Ferlan wrote: > > > On 07/14/2017 05:23 AM, Erik Skultety wrote: > > On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote: > >> > >> > >> On 07/11/2017 10:38 AM, Erik Skultety wrote: > >>> On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote: > >>>> Since virnodedeviceobj now has a self-lockable hash table, there's no > >>>> need to lock the table from the driver for processing. Thus remove the > >>>> locks from the driver for NodeDeviceObjList mgmt. > >>>> > >>>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> > >>> > >>> [..] > >>> > >>>> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device) > >>>> return -1; > >>>> def = virNodeDeviceObjGetDef(obj); > >>>> > >>>> - nodeDeviceLock(); > >>>> - > >>> > >>> Consider the following scenario handling the same device: > >>> > >> > >> What if I told you that's impossible? You cannot "have" a scsi_hostN, > >> delete a scsi_hostN, and then have a new one created with the same name. > > > > Except I wasn't considering creation, rather than plain change. Although I > > didn't manage to find out under what circumstances would kernel actually emit > > the 'change' event (I tried to write to various writable attributes - but since > > it either lacks documentation completely or it's just well hidden from me - I > > wasn't able to trigger it), other than explicitly triggering it by writing to > > sysfs' uevent - so in that aspect, as I wrote below in my previous response, > > it's highly unlikely but not impossible to hit the race. > > > > Not quite sure what would trigger a change event. A vHBA has a wwnn/wwpn > and a parent wwnn/wwpn and WWN. If any of those change, the vHBA would > need to be deleted and recreated. > > I do have a faint recollection of considering the ramifications of > dropping the obj lock in that path and the race drawback, but I > dismissed it mainly because of "how" vHBA's are created and what could > constitute a change event for a vHBA essentially redefines it. > > >> > >> The scsi_hostN's are an ever increasing value of N. Once created and > >> deleted, the N will not be reused. > >> > >>> Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy) > >>> =======================================|============================= > >>> | # attempt to destroy a device > >>> | obj = nodeDeviceObjFindByName() > >>> | > >>> | # @obj is locked > >>> | def = virNodeDeviceObjGetDef(obj) > >>> | > >>> | virNodeDeviceObjEndAPI(&obj) > >>> | # @obj is unlocked > >>> <------ > >>> # change event | > >>> udevAddOneDevice() | > >>> | > >>> obj = virNodeDeviceObjListFindByName | > >>> # @obj locked | > >>> new_device = false | > >>> # @obj unlocked | > >>> | > >>> obj = virNodeDeviceObjListAssignDef | > >>> # @obj locked | > >>> virNodeDeviceObjEndAPI(&obj) | > >>> # @obj unlocked and @def changed | > >>> ------> > >>> | virNodeDeviceObjListGetParentHost() > >>> | if (def->parent) ===> SIGSEGV > >>> > >>> In patch 12/14 this would have been a deadlock because you first locked the > >>> @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I > >>> don't know about NPIV so I'm not sure whether there is another way of removing > >>> a device and updating the parent device tree, might require an updated model, > >>> but for now, you need to make a deep copy of @def. I can see that the chance of > >>> getting an 'change' event from udev on a vHBA device is low, but I'm trying to > >>> look at it from a higher perspective, as we want to be able to remove mdevs > >>> this way, possibly other devices in the future. > >> > >> I think what happens is code from virNodeDeviceGetWWNs through > >> virVHBAManageVport gets placed into a function that handles vHBA's on > >> deletion. Similarly for CreateXML, vHBA specific code ends up in a > >> helper function. Those helpers would be called based on the type of > >> object/device we're talking about (vHBA/mdev). > > > > Despite the likelihood of the case I'm describing, the main point I'm trying to > > make is that the lock protects a mutable resource (@def) and by releasing it > > followed by querying it without actually holding the lock violates thread > > safety. > > > > I understand the position and while the likelihood is essentially next > to zero that something like that could happen it's also possible to > remove @def and pass copies of the name, parent, parent_wwnn, > parent_wwpn, and parent_fabric_wwn. > > So before I do that - can we close on patches 1-12? Yeah, I wasn't explicit with the ACK for 1-12 because I first wanted to have a small discussion about the changes, but we can now focus solely on the rest. So, ACK to 1-12 with the proposed change to virNode*ObjNew not consuming @def (I don't have a strong preference here that I could back with some reasonable pros/cons, to me, having it behave like a constructor for OOP, which we're essentially trying to copy in libvirt, naturally makes sense, but I also see Michal's point). Erik -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list