> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > Sent: Thursday, August 23, 2018 11:47 AM > > On Wed, 22 Aug 2018 02:30:12 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > > Sent: Wednesday, August 22, 2018 10:08 AM > > > > > > On Wed, 22 Aug 2018 01:27:05 +0000 > > > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > > > > > From: Wang, Zhi A > > > > > Sent: Wednesday, August 22, 2018 2:43 AM > > > > > > > > > > > > Are there any suggestions how we can deal with security issues? > > > > > > Allowing userspace to provide a data stream representing the > internal > > > > > > state of a virtual device model living within the kernel seems > > > > > > troublesome. If we need to trust the data stream, do we need to > > > > > > somehow make the operation more privileged than what a vfio > user > > > > > might > > > > > > have otherwise? Does the data stream need to be somehow > signed > > > and > > > > > how > > > > > > might we do that? How can we build in protection against an > > > untrusted > > > > > > restore image? Thanks, > > > > > > > > imo it is not necessary. restoring mdev state should be handled as if > > > > guest is programming the mdev. > > > > > > To me this suggests that a state save/restore is just an algorithm > > > executed by userspace using the existing vfio device accesses. This is > > > not at all what we've been discussing for migration. I believe the > > > > not algorithm by userspace. It's kernel driver to apply the audit when > > receiving opaque state data. > > And a kernel driver receiving and processing opaque state date from a > user doesn't raise security concerns for you? opaque is from userspace p.o.v. kernel driver understands the actual format and thus can audit when restoring the state. > > > > interface we've been hashing out exposes opaque device state through > a > > > vfio region. We therefore must assume that that opaque data contains > > > not only device state, but also emulation state, similar to what we see > > > for any QEMU device. Not only is there internal emulation state, but > > > we have no guarantee that the device state goes through the same > > > auditing as it does through the vfio interface. Since this device and > > > emulation state live inside the kernel and not just within the user's > > > own process, a malicious user can do far more than shoot themselves. > It > > > would be one thing devices were IOMMU isolated, but they're not, > > > they're isolated through vendor and device specific mechanism, and for > > > all we know the parameters of that isolation are included in the > > > restore state. I don't see how we can say this is not an issue. > > > > I didn't quite get this. My understanding is that isolation configuration > > is completed when a mdev is created on DEST machine given a type > > definition. The state image contains just runtime data reflecting what > > guest driver does on SRC machine. Restoring such state shouldn't > > change the isolation policy. > > Let's invent an example where the mdev vendor driver has a set of > pinned pages which are the current working set for the device at the > time of migration. Information about that pinning might be included in > the opaque migration state. If a malicious user discovers this, they > can potentially also craft a modified state which can exploit the host > kernel isolation. pinned pages may be not a good example. the pin knowledge could be reconstructed when restoring the state (e.g. in GVT-g pinning is triggered by shadowing GPU page table which has to be recreated on DEST). > > > > > Then all the audits/security checks > > > > enforced in normal emulation path should still apply. vendor driver > > > > may choose to audit every state restore operation one-by-one, and > > > > do it altoghter at a synchronization point (e.g. when the mdev is re- > > > > scheduled, similar to what we did before VMENTRY). > > > > > > Giving the vendor driver the choice of whether to be secure or not is > > > exactly what I'm trying to propose we spend some time thinking about. > > > For instance, what if instead of allowing the user to load device state > > > through a region, the kernel could side load it using sometime similar > > > to the firmware loading path. The user could be provided with a file > > > name token that they push through the vfio interface to trigger the > > > state loading from a location with proper file level ACLs such that the > > > image can be considered trusted. Unfortunately the collateral is that > > > libvirt would need to become the secure delivery entity, somehow > > > stripping this section of the migration stream into a file and > > > providing a token for the user to ask the kernel to load it. What are > > > some other options? Could save/restore be done simply as an > > > algorithmic script matched to stack of data, as I read into your first > > > statement above? I have doubts that we can achieve the internal state > > > we need, or maybe even the performance we need using such a process. > > > Thanks, > > > > > > > for GVT-g I think we invoke common functions as used in emulation path > > to recover vGPU state, e.g. gtt rw handler, etc. Zhi can correct me if > > I'm wrong. > > One example of migration state being restored in a secure manner does > not prove that such an interface is universally secure or a good idea. > > > Can you elaborate the difference between device state and emulation > > state which you mentioned earlier? We may need look at some concrete > > example to understand the actual problem here. > > See my example above, or imagine that the migration state information > includes any sort of index field where a user might be able to modify > the index and trick the driver into inserting malicious code elsewhere > in the host kernel stack. It's a security nightmare waiting to > happen. Thanks, > one thing which I'm not sure is why this becomes a new concern but not applied to existing VM state maintained by kernel, e.g. various vCPU states... Thanks Kevin -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list