On Thu, Jul 13, 2017 at 07:23:58PM -0400, John Ferlan wrote: > > > On 07/11/2017 10:38 AM, Erik Skultety wrote: > > On Mon, Jul 03, 2017 at 05:25:30PM -0400, John Ferlan wrote: > >> Since virnodedeviceobj now has a self-lockable hash table, there's no > >> need to lock the table from the driver for processing. Thus remove the > >> locks from the driver for NodeDeviceObjList mgmt. > >> > >> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> > > > > [..] > > > >> @@ -601,8 +565,6 @@ nodeDeviceDestroy(virNodeDevicePtr device) > >> return -1; > >> def = virNodeDeviceObjGetDef(obj); > >> > >> - nodeDeviceLock(); > >> - > > > > Consider the following scenario handling the same device: > > > > What if I told you that's impossible? You cannot "have" a scsi_hostN, > delete a scsi_hostN, and then have a new one created with the same name. Except I wasn't considering creation, rather than plain change. Although I didn't manage to find out under what circumstances would kernel actually emit the 'change' event (I tried to write to various writable attributes - but since it either lacks documentation completely or it's just well hidden from me - I wasn't able to trigger it), other than explicitly triggering it by writing to sysfs' uevent - so in that aspect, as I wrote below in my previous response, it's highly unlikely but not impossible to hit the race. > > The scsi_hostN's are an ever increasing value of N. Once created and > deleted, the N will not be reused. > > > Thread 1 (udev event handler callback) | Thread 2 (nodeDeviceDestroy) > > =======================================|============================= > > | # attempt to destroy a device > > | obj = nodeDeviceObjFindByName() > > | > > | # @obj is locked > > | def = virNodeDeviceObjGetDef(obj) > > | > > | virNodeDeviceObjEndAPI(&obj) > > | # @obj is unlocked > > <------ > > # change event | > > udevAddOneDevice() | > > | > > obj = virNodeDeviceObjListFindByName | > > # @obj locked | > > new_device = false | > > # @obj unlocked | > > | > > obj = virNodeDeviceObjListAssignDef | > > # @obj locked | > > virNodeDeviceObjEndAPI(&obj) | > > # @obj unlocked and @def changed | > > ------> > > | virNodeDeviceObjListGetParentHost() > > | if (def->parent) ===> SIGSEGV > > > > In patch 12/14 this would have been a deadlock because you first locked the > > @obj, then nodedev driver while udevAddOneDevice did in the reverse order. I > > don't know about NPIV so I'm not sure whether there is another way of removing > > a device and updating the parent device tree, might require an updated model, > > but for now, you need to make a deep copy of @def. I can see that the chance of > > getting an 'change' event from udev on a vHBA device is low, but I'm trying to > > look at it from a higher perspective, as we want to be able to remove mdevs > > this way, possibly other devices in the future. > > I think what happens is code from virNodeDeviceGetWWNs through > virVHBAManageVport gets placed into a function that handles vHBA's on > deletion. Similarly for CreateXML, vHBA specific code ends up in a > helper function. Those helpers would be called based on the type of > object/device we're talking about (vHBA/mdev). Despite the likelihood of the case I'm describing, the main point I'm trying to make is that the lock protects a mutable resource (@def) and by releasing it followed by querying it without actually holding the lock violates thread safety. > > BTW: I recall doing a review suggesting perhaps creating an mdev pool > driver of sorts. Daniel essentially responded that using the node device > driver and augmenting it to fit the needs would be OK. At the time, I > wasn't specifically thinking about this case, but it certainly qualifies > as something of concern where the existing node device code can make > some assumptions about names that the underlying udev code provides. > > You may need to add some extra layer of protection if names can be > reused especially because of this event mgmt "problem"... You may also > need to modify that "if (dev)" code to check if the dev is an mdev and > if so, do something different. > > Looks like the code was added by commit id '546fa3ef'. Perhaps as a way > to "follow" how other drivers did things. > > "Fortunately" the only consumer of CreateXML/Destroy ends up being vHBA > and we know there's rules around the naming. IIRC - you were letting > MDEV also set the name, right? That is, a 'name' on input is essentially No, mdev's name is a readonly attribute that is optional and exposed by the vendor, the only thing libvirt can currently write to the mdev's sysfs interface is UUID. Erik > and happily ignored. So what creates that name? And can you be assured > it's going to be unique for the life/run time of the host? If so, > there's no way a CreateXML could reuse a name that AddOneDevice would be > using, right? > > Maybe I need to think some more - it's been a long day already > > John > > > > > The rest of the patch looks okay, but I want to try it first. > > > > Erik > > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list