On 8/8/20 7:01 AM, Hao Wang wrote:
From: Hao Wang <wanghao232@xxxxxxxxxx> Subject: [PATCH] NVRAM: check NVRAM file size and recover from template A corrupted nvram file (e.g. caused by last unsuccessful creation due to insufficient memory) can lead to boot or migration failure. Check the size of the existed nvram file when qemuPrepareNVRAM, and re-create if the existed one is unhealthy. Signed-off-by: Hao Wang <wanghao232@xxxxxxxxxx> --- src/qemu/qemu_process.c | 54 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 126fabf5ef..42060bb36c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4376,6 +4376,48 @@ qemuProcessUpdateCPU(virQEMUDriverPtr driver, } +static bool +qemuIsNvramFileHealthy(virQEMUDriverConfigPtr cfg, + virDomainLoaderDefPtr loader) +{ + const char *masterNvramPath; + off_t nvramSize; + off_t masterSize; + + masterNvramPath = loader->templt; + if (!loader->templt) { + size_t i; + for (i = 0; i < cfg->nfirmwares; i++) { + if (STREQ(cfg->firmwares[i]->name, loader->path)) { + masterNvramPath = cfg->firmwares[i]->nvram; + break; + } + } + } + + if (!masterNvramPath) { + VIR_WARN("no nvram template is found; assume the nvram file is healthy"); + return true; + }
The issue I'm seeing here is that this code is duplicated in the body of qemuPrepareNVRAM .....
+ + if ((nvramSize = virFileLength(loader->nvram, -1)) < 0 || + (masterSize = virFileLength(masterNvramPath, -1)) < 0) { + virReportSystemError(errno, + _("unable to get the size of '%s' or '%s'"), + loader->nvram, masterNvramPath); + return false; + } + + if (nvramSize != masterSize) { + VIR_WARN("the size(%zd) of the nvram file is not equal to that of the template %s", + nvramSize, masterNvramPath); + return false; + } + + return true; +} + + static int qemuPrepareNVRAM(virQEMUDriverConfigPtr cfg, virDomainObjPtr vm) @@ -4388,9 +4430,19 @@ qemuPrepareNVRAM(virQEMUDriverConfigPtr cfg, const char *master_nvram_path; ssize_t r; - if (!loader || !loader->nvram || virFileExists(loader->nvram)) + if (!loader || !loader->nvram) return 0; + if (virFileExists(loader->nvram)) { + if (qemuIsNvramFileHealthy(cfg, loader)) + return 0; + + ignore_value(virFileRemove(loader->nvram, -1, -1)); + VIR_WARN("the nvram file %s exists but may be corrupted! " + "Remove it and try to copy a new one from template.", + loader->nvram); + } + master_nvram_path = loader->templt; if (!loader->templt) { size_t i;
^ right here. And not only that, the code is being duplicated and doing two different things. The code you're adding in qemuIsNvramFileHealthy() is checking for !master_nvram_path, throwing a VIR_WARN() and returning 'true'. The code in the body of qemuPrepareNVRAM is aborting and returning -1 in the same condition: master_nvram_path = loader->templt; if (!loader->templt) { size_t i; for (i = 0; i < cfg->nfirmwares; i++) { if (STREQ(cfg->firmwares[i]->name, loader->path)) { master_nvram_path = cfg->firmwares[i]->nvram; break; } } } if (!master_nvram_path) { virReportError(VIR_ERR_OPERATION_FAILED, _("unable to find any master var store for " "loader: %s"), loader->path); goto cleanup; } What will end up happening then, if master_nvram_path is NULL, is that your code will give a VIR_WARN claiming it is fine, then the rest of qemuProcessNVRAM will error out claiming this is not fine. To preserve the already existing behavior and avoid repetition, my suggestion here is to move this code block as is to qemuIsNvramFileHealthy(), same error message and returning 'false'. The remaining of qemuNVRAMPrepare() depends upon master_nvram_path not being NULL, so you're either embed this assumption in your new function or you change the body of qemuNVRAMPrepare(). I believe the former is what we would want here. Thanks, DHB
-- 2.23.0