On Mon, 04/20 19:36, Michael S. Tsirkin wrote: > On Fri, Apr 17, 2015 at 03:59:15PM +0800, Fam Zheng wrote: > > Currently, virtio code chooses to kill QEMU if the guest passes any invalid > > data with vring. > > That has drawbacks such as losing unsaved data (e.g. when > > guest user is writing a very long email), or possible denial of service in > > a nested vm use case where virtio device is passed through. > > > > virtio-1 has introduced a new status bit "NEEDS RESET" which could be used to > > improve this by communicating the error state between virtio devices and > > drivers. The device notifies guest upon setting the bit, then the guest driver > > should detect this bit and report to userspace, or recover the device by > > resetting it. > > Unfortunately, virtio 1 spec does not have a conformance statement > that requires driver to recover. We merely have a non-normative looking > text: > Note: For example, the driver can’t assume requests in flight > will be completed if DEVICE_NEEDS_RESET is set, nor can it assume that > they have not been completed. A good implementation will try to recover > by issuing a reset. > > Implementing this reset for all devices in a race-free manner might also > be far from trivial. I think we'd need a feature bit for this. > OTOH as long as we make this a new feature, would an ability to > reset a single VQ be a better match for what you are trying to > achieve? I think that is too complicated as a recovery measure, a device level resetting will be better to get to a deterministic state, at least. > > > This series makes necessary changes in virtio core code, based on which > > virtio-blk is converted. Other devices now keep the existing behavior by > > passing in "error_abort". They will be converted in following series. The Linux > > driver part will also be worked on. > > > > One concern with this behavior change is that it's now harder to notice the > > actual driver bug that caused the error, as the guest continues to run. To > > address that, we could probably add a new error action option to virtio > > devices, similar to the "read/write werror" in block layer, so the vm could be > > paused and the management will get an event in QMP like pvpanic. This work can > > be done on top. > > At the architectural level, that's only one concern. Others would be > - workloads such as openstack handle guest crash better than > a guest that's e.g. slow because of a memory leak What memory leak are you referring to? > - it's easier for guests to probe host for security issues > if guest isn't killed > - guest can flood host log with guest-triggered errors We can still abort() if guest is triggering error too quickly. Fam _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization