On 11/07/2017 08:38 AM, John Ferlan wrote: > > > On 11/07/2017 04:36 AM, Ján Tomko wrote: >> On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote: >>> Resolve a storage driver crash as a result of a long running >>> storageVolCreateXML when the virStorageVolPoolRefreshThread is >>> run as a result of a storageVolUpload complete and ran the >>> virStoragePoolObjClearVols without checking if the creation >>> code was currently processing a buildVol after incrementing >>> the driver->asyncjob count. >> >> I thought the whole point of refreshing the pool in a thread >> was to have the storage driver usable for other things. >> This effectively disables pool refresh during long-running >> jobs. >> Jan - So I'm curious to know whether your comments amount to a NAK of the design of the patches of if the explanation provided here was sufficient? Tks - John > > This would "disable" the refresh during a long running create/build job, > but that's already being done as other than another create, not much can > be done while a create/build is "in progress". The pool undefine, > destroy, delete, and refresh calls are all blocked (they fail) until the > create completes. > > The reason for virStorageVolPoolRefreshThread is that updateVol can be > long running and once it's done (the stream is closed), it was desired > that the pool sizing values be updated. That meant either refresh the > entire pool (the easy way - theoretically) or update just the volume and > adjust the pool sizing values with the delta from the upload (similar to > how create/build alters the values). > > Unfortunately the latter is not possible (yet) because only a subset of > pool types that support uploadVol also support refreshVol. But this > really is only a "problem" for the intersection of those that support > the buildVol too. That is the only reason we'd be blocking is if a pool > type supports both buildVol/buildVolFrom and uploadVol - so the fs, > logical, and vstorage pools. > > The virStoragePoolFCRefreshThread used similar type logic w/r/t > refreshing the pool after a createVport completed. Here though, we're > waiting for udev to work it's magic in creating the vHBA LUN. The > asyncjob won't be > 0 for scsi pools yet, so perhaps patch 2 isn't > necessary, but it's preventative. > > In any case, at this point in time I would think it's more desirable to > block during the build/upload condition as opposed to crash. > > An "alternative" is to refresh the pool after a create vol rather than > just altering the pool available/allocation values from the created > volume target allocation. That way the VolPoolRefreshThread could just > "exit" knowing that volCreate will do the update. However, this refresh > could only happen when there are no asyncjobs outstanding. In the mean > time the pool sizing values probably would still need to be updated if > asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool > because the sizing values weren't properly updated. > >>> >>> The refreshThread needs to wait until all creation threads >>> are completed so as to not alter the volume list while the >>> volume is being built. >>> >>> Crash from valgrind is as follows (with a bit of editing): >>> >>> ==21309== Invalid read of size 8 >>> ==21309== at 0x153E47AF: storageBackendUpdateVolTargetInfo >>> ==21309== by 0x153E4C30: virStorageBackendUpdateVolInfo >>> ==21309== by 0x153E52DE: virStorageBackendVolRefreshLocal >>> ==21309== by 0x153DE29E: storageVolCreateXML >> >> This is a fault of storageVolCreateXML for using invalid objects >> after dropping the locks. >> > > Partially, but the code also goes through the trouble of incrementing > the pool->asyncjobs count and setting the voldef->building flag for what > appears to be the reasoning that nothing else should modify the pool > volume list while we create/build a new volume, but the refresh thread > just ignores that. The asyncjobs was added in much simpler times in > commit id '2cd9b2d8'. > > And this brings up a possible 3rd alternative. Perhaps the more real > problem is that virStoragePoolObjClearVols and refreshPool processing is > overly invasive with respect to free-ing the entire volume list and > rebuilding it rather than perhaps doing some sort of more intelligent > processing to "add" new volumes found in a pool that are not in the > list, "remove" ones no longer in the pool that are in the list, and > "update" those that are still in the list. That to me is outside the > scope of this change, but a noble goal at some point in time. > > John > >> Jan >> >>> ==21309== by 0x562035B: virStorageVolCreateXML >>> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML >>> ... >>> ==21309== Address 0x2590a720 is 64 bytes inside a block of size 336 >>> free'd >>> ==21309== at 0x4C2F2BB: free >>> ==21309== by 0x54CB9FA: virFree >>> ==21309== by 0x55BC800: virStorageVolDefFree >>> ==21309== by 0x55BF1D8: virStoragePoolObjClearVols >>> ==21309== by 0x153D967E: virStorageVolPoolRefreshThread >>> ... >>> ==21309== Block was alloc'd at >>> ==21309== at 0x4C300A5: calloc >>> ==21309== by 0x54CB483: virAlloc >>> ==21309== by 0x55BDC1F: virStorageVolDefParseXML >>> ==21309== by 0x55BDC1F: virStorageVolDefParseNode >>> ==21309== by 0x55BE5A4: virStorageVolDefParse >>> ==21309== by 0x153DDFF1: storageVolCreateXML >>> ==21309== by 0x562035B: virStorageVolCreateXML >>> ==21309== by 0x147366: remoteDispatchStorageVolCreateXML >>> ... >>> >>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> >>> --- >>> src/storage/storage_driver.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list