On 07/14/2017 05:23 AM, Erik Skultety wrote: > On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote: >> >> >> On 07/11/2017 10:38 AM, Erik Skultety wrote: >>> On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote: >>>> Since virnodedeviceobj now has a self-lockable hash table, there's no >>>> need to lock the table from the driver for processing. Thus remove the >>>> locks from the driver for NodeDeviceObjList mgmt. >>>> >>>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> >>> >>> [..] >>> >>>> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device) >>>> return -1; >>>> def = virNodeDeviceObjGetDef(obj); >>>> >>>> - nodeDeviceLock(); >>>> - >>> >>> Consider the following scenario handling the same device: >>> >> >> What if I told you that's impossible? You cannot "have" a scsi_hostN, >> delete a scsi_hostN, and then have a new one created with the same name. > > Except I wasn't considering creation, rather than plain change. Although I > didn't manage to find out under what circumstances would kernel actually emit > the 'change' event (I tried to write to various writable attributes - but since > it either lacks documentation completely or it's just well hidden from me - I > wasn't able to trigger it), other than explicitly triggering it by writing to > sysfs' uevent - so in that aspect, as I wrote below in my previous response, > it's highly unlikely but not impossible to hit the race. > Not quite sure what would trigger a change event. A vHBA has a wwnn/wwpn and a parent wwnn/wwpn and WWN. If any of those change, the vHBA would need to be deleted and recreated. I do have a faint recollection of considering the ramifications of dropping the obj lock in that path and the race drawback, but I dismissed it mainly because of "how" vHBA's are created and what could constitute a change event for a vHBA essentially redefines it. >> >> The scsi_hostN's are an ever increasing value of N. Once created and >> deleted, the N will not be reused. >> >>> Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy) >>> =======================================|============================= >>> | # attempt to destroy a device >>> | obj = nodeDeviceObjFindByName() >>> | >>> | # @obj is locked >>> | def = virNodeDeviceObjGetDef(obj) >>> | >>> | virNodeDeviceObjEndAPI(&obj) >>> | # @obj is unlocked >>> <------ >>> # change event | >>> udevAddOneDevice() | >>> | >>> obj = virNodeDeviceObjListFindByName | >>> # @obj locked | >>> new_device = false | >>> # @obj unlocked | >>> | >>> obj = virNodeDeviceObjListAssignDef | >>> # @obj locked | >>> virNodeDeviceObjEndAPI(&obj) | >>> # @obj unlocked and @def changed | >>> ------> >>> | virNodeDeviceObjListGetParentHost() >>> | if (def->parent) ===> SIGSEGV >>> >>> In patch 12/14 this would have been a deadlock because you first locked the >>> @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I >>> don't know about NPIV so I'm not sure whether there is another way of removing >>> a device and updating the parent device tree, might require an updated model, >>> but for now, you need to make a deep copy of @def. I can see that the chance of >>> getting an 'change' event from udev on a vHBA device is low, but I'm trying to >>> look at it from a higher perspective, as we want to be able to remove mdevs >>> this way, possibly other devices in the future. >> >> I think what happens is code from virNodeDeviceGetWWNs through >> virVHBAManageVport gets placed into a function that handles vHBA's on >> deletion. Similarly for CreateXML, vHBA specific code ends up in a >> helper function. Those helpers would be called based on the type of >> object/device we're talking about (vHBA/mdev). > > Despite the likelihood of the case I'm describing, the main point I'm trying to > make is that the lock protects a mutable resource (@def) and by releasing it > followed by querying it without actually holding the lock violates thread > safety. > I understand the position and while the likelihood is essentially next to zero that something like that could happen it's also possible to remove @def and pass copies of the name, parent, parent_wwnn, parent_wwpn, and parent_fabric_wwn. So before I do that - can we close on patches 1-12? I can then post a v5 that alters the virNodeDeviceObjListGetParentHost to take parameters instead of @def. That should allay this concern and make patches 13/14 be 2 and 3 of the next series. John >> >> BTW: I recall doing a review suggesting perhaps creating an mdev pool >> driver of sorts. Daniel essentially responded that using the node device >> driver and augmenting it to fit the needs would be OK. At the time, I >> wasn't specifically thinking about this case, but it certainly qualifies >> as something of concern where the existing node device code can make >> some assumptions about names that the underlying udev code provides. >> >> You may need to add some extra layer of protection if names can be >> reused especially because of this event mgmt "problem"... You may also >> need to modify that "if (dev)" code to check if the dev is an mdev and >> if so, do something different. >> >> Looks like the code was added by commit id '546fa3ef'. Perhaps as a way >> to "follow" how other drivers did things. >> >> "Fortunately" the only consumer of CreateXML/Destroy ends up being vHBA >> and we know there's rules around the naming. IIRC - you were letting >> MDEV also set the name, right? That is, a 'name' on input is essentially > > No, mdev's name is a readonly attribute that is optional and exposed by the > vendor, the only thing libvirt can currently write to the mdev's sysfs > interface is UUID. > > Erik > >> and happily ignored. So what creates that name? And can you be assured >> it's going to be unique for the life/run time of the host? If so, >> there's no way a CreateXML could reuse a name that AddOneDevice would be >> using, right? >> >> Maybe I need to think some more - it's been a long day already >> >> John >> >>> >>> The rest of the patch looks okay, but I want to try it first. >>> >>> Erik >>> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list