On Mon, Feb 24, 2020 at 02:34:16PM +0100, Peter Krempa wrote: > On Wed, Feb 19, 2020 at 13:12:53 -0600, Eric Blake wrote: > > [adding qemu] > > Adding Daniel as he objected to qemu-img. > > > > > On 2/19/20 12:57 PM, Peter Krempa wrote: > > [...] > > > > Additionally I think that we could just get rid of the copy of the image > > > detection copy in libvirt and replace it by invocation of qemu-img. That > > > way we could avoid any discrepancies in the detection process in the > > > first place. > > > > Now there's an interesting thought. Since data corruption occurs when there > > is disagreement about which mode to use, getting libvirt out of the probing > > business by deferring all decisions to qemu-img info is a smart move - if > > qemu says an image probes as qcow2 (in an environment where probing is > > safe), then libvirt passing an explicit qcow2 to qemu for guest usage (in an > > environment where probing is not safe) will at least see the same > > guest-visible data. Less code to maintain in libvirt, and no chance for a > > mismatch between the two projects on which format a probe should result in. > > I raised the use of qemu-img to Daniel and he disagreed with use of > qemu-img in libvirt for doing the probing on multiple reasons: > - qemu-img instantiates many data structures relevant to the format so > it has a huge attack surface This was the most critical reason why we have this code in libvirt in the first place. NB, we need to be sure we are comparing the same things between libvirt and QEMU when we discuss "probing". What we're talking about by probing in libvirt context is - Detect the image format - Detect the image virtual size - Detect the image physical size - Detect the image backing store location - Detect the image backing store format - Detect the image encryption usage In QEMU 'format probing' (as impl by bdrv_probe in QEMU's block layer) only covers the very first point, 'detect the image format'. All the other information can only be acquired by opening the image (bdrv_open in QEMU's block layer). The issue is that bdrv_open does waaaaaay more than we desire here, because it is serving the broader purpose of allowing QEMU to actually use the image. qcow2 is probably the worst case example, as it has to parse the image and setup data structures for l1, l2 tables, refcount tables, snapshots, and initialize the encryption layer if present. It is known that this code is vulnerable to maliciously created qcow2 images. This resulted in OpenStack being vulnerable to CVE-2015-5162 https://bugs.launchpad.net/ossa/+bug/1449062 It isn't possible to do anything to avoid this risk if you are invoking qemu-img on untrustworthy images. The best you can do is to mitigate the effects by placing memory/CPU ulimits on the qemu-img process. Determining these limits then introduces a new problem, as you have to pick a limit which is low enough to avoid DoS, while large enough to allow all valid usage. Since mitigating CVE-2015-5162 OpenStack has faced this problem with users reporting that the limits it set were breaking valid usage, as so had to increase the limits, which increases the DoS impact. Then there's also the pain that OSP suffered when QEMU introduced mandatory locking which broke all existing usage of 'qemu-img info' when a VM was running. Of course when you launch QEMU later, it is susceptible to the DoS in the system emulator, but this is mitigated by fact that upfront probing is going to reject some malicious images. If some bad images do get past, then it will be dealt with by the mgmt apps normal monitoring of a running VM resource usage and/or cgroups limits. The libvirt probing code is designed to do the minimal work needed to get the information we require. Of course there may be bugs in libvirt's code, but it is much more straightforward for us to analyse & understand risks, as most of the problematic code that QEMU has simply doesn't exist in libvirt. > - performance of spawning extra processes would be way worse Yes, this was the second motivation for having this code in libvirt originally. The QEMU VM startup case wasn't the target, but rather storage pools code. When we start a storage pool with 100's of images, the time to spawn 100's of instances of qemu-img adds up very quickly. Even if qemu-img had exactly the same minimalist code as libvirt's current probing logic it would still be worse. The overhead of process startup vs the time spent probing the image is a poor ratio, such that process startup/exec time dominates. The third reason why libvirt has this code is because historically the error reporting from qemu-img has been quite unhelpful - many errors just end up being a generic EINVAL error message. Things have improved over time, but error reporting is always a challenge when spawning external commands to do work. The fourth reason why libvirt has this image file detection code is that it is used by non-QEMU drivers in libvirt, mostly notably the storage pool driver, and we didn't wish to force people to install qemu-img on hosts which were not running the QEMU virt driver. I don't think this reason is especially a technical show stopper, since these days all distros allow you to install qemu-img, without pulling in the rest of QEMU. In fact the storage pool driver RPM depends on qemu-img explicitly since we dropped support for the Xen tools for image creation a while ago. There is scope for something to replace the current libvirt probing code, but spawning 'qemu-img info' is certainly not that something. Libvirt (and apps above libvirt in general) would really benefit from having a library that they can use for readonly querying of information about disk images. Of course that library can't just spawn qemu-img otherwise that defeats the point of using a library. Unfortunately QEMU's block layer can't practically serve this role because its GPLv2-only licensing is too restrictive for some apps needs. It would have to be something conceptually similar in complexity to what libvirt's current probing code does, so that we can have good confidence in its behaviour in the face of malicious input. > I'll reiterate the historical state of the problem because I think it's > important: > > Pre-blockdev: > - we internally assumed that if the image format of an backing image > was not present in the overlay it is 'raw' > - this influenced security labelling but not actually how qemu viewed > or probed the file. If it was qcow2 probed as qcow2 qemu opened it > as qcow2 possibly even including the backing file if selinux or > other mechanism didn't prevent it. > > post-blockdev: > - the assumption of 'raw' would now be expressed into the qemu > configuration. This assumption turned into data corruption since we > no longer allowed qemu to probe the format and forced it as raw. > - fix was to always require the format to be recorded in the overlay > - this made users unhappy who neglected to record the format into the > overlay when creating it manually So the key problem we have is that with -blockdev we are always explicitly telling QEMU what the backing file is for every image. Can we fix this to have the exact same behaviour as before by *not* telling QEMU anything about the backing file when using -blockdev, if there is no well defined backing format present. ie, use -blockdev, but let QEMU probe just as it did in non-blockdev days. Would there be any downsides to this that did not already exist in the non-blockdev days ? I don't think we can solve the regressions in behaviour of backing files by doing probing of the backing files in libvirt, because that only works for the case where libvirt can actually open the file. ie a local file on disk. We don't have logic for opening backing files on RBD, GlusterFS, iSCSI, HTTP, SSH, etc, and nor do we want todo that. So to me it looks like the only viable option is to not specify the backing file info to QEMU at all. > Now this adds an interresting dimension to this problem. If libvirt > forces the users to specify the image format, and the users don't know > it they will probe. So we are basically making this a problem of > somebody else. [2] As you can see in that patch, it uses 'qemu-img' > anyways and also additionally actually allows the chain to continue > deeper! [3] Yeah, this is a really bad situation given the difficulty in safely using qemu-img, without also breaking valid usage. We don't want to push this off to apps > This boils down to whether we want to accept some possibility of image > corruption in trade for avoiding regression of behaviour in the secure > cases as well as management apps and users not having to re-invent when > probing an image is actually safe. I feel like the risk of image corruption is pretty minor. Our probing handles all normal cases the same way as QEMU and newly introduced image formats are rare. > > Finally I think we should either decide to fix it in this release, or > stick with the error message forever. Fixing it later will not make > much sense as many users already fixed their scripts and we'd just add > back the trade-off of possible image corruption. > > Peter > > [1] If e.g. the security subsystem of the host didn't forbid the use of > the backing file such a qcow2 qemu would happily open it. > > [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00013.html > > [3] As implemented in [2] the backing image is not checked whether it > has a backing file or not but the format is probed, which way result > into accessing the backing chain of the probed image. > > Prior to this detection, it would be prevented by sVirt or alternatively > also by libvit itself in -blockdev mode when this patch would be > accepted. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|