On 11/02/2017 09:10 AM, Cedric Bosdonnat wrote: > Hi John, > > On Thu, 2017-11-02 at 07:03 -0400, John Ferlan wrote: >> >> On 11/02/2017 06:20 AM, Cedric Bosdonnat wrote: >>> Hi all, >>> >>> I'm investigating a crash in the storage driver, and it seems the solution >>> is not as obvious as I wanted it. >>> >>> Basically what happens is that libvirtd crashes due to the pool refresh thread >>> clearing the list of pool volumes while someone is adding a new volume. Here >>> is the valgrind log I have for that: >>> >>> http://paste.opensuse.org/58733866 >>> >> >> What version of libvirt? Your line numbers don't match up with current >> HEAD. > > It's a 3.8.0. > >>> I tried adding a call to virStoragePoolObjLock() in the >>> virStorageVolPoolRefreshThread(), since the storageVolCreateXML() >>> appears to have it, but I'm getting a dead lock. Here are the waiting >>> threads I got from gdb: >> >> virStoragePoolObjFindByName returns a locked pool obj, so not sure >> where/why you added this... > > Oh I overlooked that... then there is already a lock on it, the solution > is elsewhere > >>> >>> http://paste.opensuse.org/45961070 >>> >>> Any idea what would be a proper fix for that? My vision of the storage >>> drivers locks is too limited it seems ;) >> >> From just a quick look... >> >> The virStorageVolPoolRefreshThread will take storageDriverLock, get a >> locked pool object (virStoragePoolObjFindByName), clear out the pool, >> add back volumes that it knows about, unlock the pool object, and call >> storageDriverUnlock. >> >> If you were adding a new pool... You'd take storageDriverLock, then >> eventually virStoragePoolObjAssignDef would be called and the pool's >> object lock taken and unlocked, and returns a locked pool object which >> gets later unlocked in cleanup, followd by a storageDriverUnlock. >> >> If you're adding a new volume object, you'd get a locked pool object via >> virStoragePoolObjFromStoragePool, then if building, that lock gets >> dropped after increasing the async job count and setting the building >> flag, the volume is built, then the object lock retaken while >> temporarily holding the storageDriverLock, the async job count gets >> decremented and the building flag cleared, eventually we fall into >> cleanup with unlocks the pool again. > > Thanks for the details: this is really useful to globally understand how > locks are working in the storage driver. Maybe worth turning it into a doc > somewhere? > You mean code isn't self documenting, shocking ;-) [tongue firmly planted in cheek]. Perhaps when I complete the conversion for Storage Pool/Volume and do more Domain fix-ups I can add something somewhere that "generally" describes internal flow. The goal I've had is to be fairly similar for all drivers w/r/t object usage and code flow. It's taking a long time to get there though because it's a lot of "roto-tilling" of code. John FWIW: I saw your query before we hopped in the car for a 7 hour drive - figured at least providing something before I left would be helpful ;-). >> So how to fix - well seems to me the storageDriverLock in VolCreateXML >> may be the way since the refresh thread takes the driver lock first, >> then the pool object second. Perhaps even like the build code where it >> takes it temporarily while getting the pool object. I'd have to think a >> bit more about though. Still might be something to try since the Vol >> Refresh thread takes it while running... > > Ok, I'll look into that direction. > >> John >> >> Not related to this problem per se, but what may help even more is if I >> can get the storage driver usage of a common object model patches >> completely reviewed, but that's a different problem ;-)... I'll have to >> go look and see if I may have fixed this there. The newer model uses >> hash tables, RW locks, and reduces the storage driver hammer lock, but >> this one condition may not have been addressed. > > I can have a go at it and see if I can reproduce the crash with it. Not sure > I'll be the one skilled enough for the full review though ;) > > -- > Cedric > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list