On 11/07/2017 04:36 AM, Ján Tomko wrote: > On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote: >> Resolve a storage driver crash as a result of a long running >> storageVolCreateXML when the virStorageVolPoolRefreshThread is >> run as a result of a storageVolUpload complete and ran the >> virStoragePoolObjClearVols without checking if the creation >> code was currently processing a buildVol after incrementing >> the driver->asyncjob count. > > I thought the whole point of refreshing the pool in a thread > was to have the storage driver usable for other things. > This effectively disables pool refresh during long-running > jobs. > This would "disable" the refresh during a long running create/build job, but that's already being done as other than another create, not much can be done while a create/build is "in progress". The pool undefine, destroy, delete, and refresh calls are all blocked (they fail) until the create completes. The reason for virStorageVolPoolRefreshThread is that updateVol can be long running and once it's done (the stream is closed), it was desired that the pool sizing values be updated. That meant either refresh the entire pool (the easy way - theoretically) or update just the volume and adjust the pool sizing values with the delta from the upload (similar to how create/build alters the values). Unfortunately the latter is not possible (yet) because only a subset of pool types that support uploadVol also support refreshVol. But this really is only a "problem" for the intersection of those that support the buildVol too. That is the only reason we'd be blocking is if a pool type supports both buildVol/buildVolFrom and uploadVol - so the fs, logical, and vstorage pools. The virStoragePoolFCRefreshThread used similar type logic w/r/t refreshing the pool after a createVport completed. Here though, we're waiting for udev to work it's magic in creating the vHBA LUN. The asyncjob won't be > 0 for scsi pools yet, so perhaps patch 2 isn't necessary, but it's preventative. In any case, at this point in time I would think it's more desirable to block during the build/upload condition as opposed to crash. An "alternative" is to refresh the pool after a create vol rather than just altering the pool available/allocation values from the created volume target allocation. That way the VolPoolRefreshThread could just "exit" knowing that volCreate will do the update. However, this refresh could only happen when there are no asyncjobs outstanding. In the mean time the pool sizing values probably would still need to be updated if asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool because the sizing values weren't properly updated. >> >> The refreshThread needs to wait until all creation threads >> are completed so as to not alter the volume list while the >> volume is being built. >> >> Crash from valgrind is as follows (with a bit of editing): >> >> ==21309== Invalid read of size 8 >> ==21309== at 0x153E47AF: storageBackendUpdateVolTargetInfo >> ==21309== by 0x153E4C30: virStorageBackendUpdateVolInfo >> ==21309== by 0x153E52DE: virStorageBackendVolRefreshLocal >> ==21309== by 0x153DE29E: storageVolCreateXML > > This is a fault of storageVolCreateXML for using invalid objects > after dropping the locks. > Partially, but the code also goes through the trouble of incrementing the pool->asyncjobs count and setting the voldef->building flag for what appears to be the reasoning that nothing else should modify the pool volume list while we create/build a new volume, but the refresh thread just ignores that. The asyncjobs was added in much simpler times in commit id '2cd9b2d8'. And this brings up a possible 3rd alternative. Perhaps the more real problem is that virStoragePoolObjClearVols and refreshPool processing is overly invasive with respect to free-ing the entire volume list and rebuilding it rather than perhaps doing some sort of more intelligent processing to "add" new volumes found in a pool that are not in the list, "remove" ones no longer in the pool that are in the list, and "update" those that are still in the list. That to me is outside the scope of this change, but a noble goal at some point in time. John > Jan > >> ==21309== by 0x562035B: virStorageVolCreateXML >> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML >> ... >> ==21309== Address 0x2590a720 is 64 bytes inside a block of size 336 >> free'd >> ==21309== at 0x4C2F2BB: free >> ==21309== by 0x54CB9FA: virFree >> ==21309== by 0x55BC800: virStorageVolDefFree >> ==21309== by 0x55BF1D8: virStoragePoolObjClearVols >> ==21309== by 0x153D967E: virStorageVolPoolRefreshThread >> ... >> ==21309== Block was alloc'd at >> ==21309== at 0x4C300A5: calloc >> ==21309== by 0x54CB483: virAlloc >> ==21309== by 0x55BDC1F: virStorageVolDefParseXML >> ==21309== by 0x55BDC1F: virStorageVolDefParseNode >> ==21309== by 0x55BE5A4: virStorageVolDefParse >> ==21309== by 0x153DDFF1: storageVolCreateXML >> ==21309== by 0x562035B: virStorageVolCreateXML >> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML >> ... >> >> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> >> --- >> src/storage/storage_driver.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list