On 9/23/22 15:06, Martin Kletzander wrote: > On Mon, Sep 12, 2022 at 03:46:41PM +0200, Michal Privoznik wrote: >> When reconnecting to a running QEMU process, we construct the >> per-domain path in all hugetlbfs mounts. This is a relict from >> the past (v3.4.0-100-g5b24d25062) where we switched to a >> per-domain path and we want to create those paths when libvirtd >> restarts on upgrade. >> >> And with namespaces enabled there is one corner case where the >> path is not created. In fact an error is reported and the >> reconnect fails. Ideally, all mount events are propagated into >> the QEMU's namespace. And they probably are, except when the >> target path does not exist inside the namespace. Now, it's pretty >> common for users to mount hugetlbfs under /dev (e.g. >> /dev/hugepages), but if domain is started without hugepages (or >> more specifically - private hugetlbfs path wasn't created on >> domain startup), then the reconnect code tries to create it. >> But it fails to do so, well, it fails to set seclabels on the >> path because, because the path does not exist in the private > > s/because, // at least =) > >> namespace. And it doesn't exist because we specifically create >> only a subset of all possible /dev nodes. Therefore, the mount >> event, whilst propagated, is not successful and hence the >> filesystem is not mounted. We have to do it ourselves. >> >> If hugetlbfs is mount anywhere else there's no problem and this >> is effectively a dead code. >> >> Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2123196 >> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >> --- >> docs/kbase/qemu-passthrough-security.rst | 6 ------ >> src/qemu/qemu_process.c | 3 +++ >> 2 files changed, 3 insertions(+), 6 deletions(-) >> >> diff --git a/docs/kbase/qemu-passthrough-security.rst >> b/docs/kbase/qemu-passthrough-security.rst >> index 106c3cc5b9..ef10d8af9b 100644 >> --- a/docs/kbase/qemu-passthrough-security.rst >> +++ b/docs/kbase/qemu-passthrough-security.rst >> @@ -172,9 +172,3 @@ command before any guest is started: >> :: >> >> # mount --make-rshared / >> - >> -Another requirement for dynamic mount point propagation is to not place >> -``hugetlbfs`` mount points under ``/dev`` because these won't be >> propagated as >> -corresponding directories do not exist in the private namespace. Or >> just use >> -``memfd`` memory backend instead which does not require ``hugetlbfs`` >> mount >> -points. >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index cbfdd3bda5..b05ad059c3 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -3976,6 +3976,9 @@ >> qemuProcessBuildDestroyMemoryPathsImpl(virQEMUDriver *driver, >> return -1; >> } >> >> + if (qemuDomainNamespaceSetupPath(vm, path, NULL) < 0) >> + return -1; >> + > > What's not visible here is that this branch is accessible only if the > domain has any hugepage backing configured. And it is only created on > startup and reconnect. That means if you start a VM without any > hugepages mounted, then mount them somewhere under /dev and restart the > daemon (which was always required) then attaching a hugepage-backed > dimm, for example, will work without namespaces, but will fail with > namespaces because we skipped the creation on restart. I think just > recreating the structure unconditionally would provide more benefit than > doubly isolating qemu from hugepages (once with permissions and once > with namespaces). Actually, the path is created also on memory hotplug: src/qemu/qemu_hotplug.c=2231=qemuDomainAttachMemory(virQEMUDriver *driver, -- src/qemu/qemu_hotplug.c-2275- src/qemu/qemu_hotplug.c:2276: if (qemuProcessBuildDestroyMemoryPaths(driver, vm, mem, true) < 0) src/qemu/qemu_hotplug.c-2277- goto cleanup; So the per-domain dir should be created in all three cases (startup, reconnect & hotplug). > > Since this patch solves the issue of us killing perfectly fine qemu I > think it's fine if that's a follow'up patch: > > Reviewed-by: Martin Kletzander <mkletzan@xxxxxxxxxx> Pushed, thanks. Michal