On Fri, Aug 25, 2017 at 10:33:57AM +0200, Pierre Morel wrote: > On 24/08/2017 23:23, Michael S. Tsirkin wrote: > > On Thu, Aug 24, 2017 at 07:42:07PM +0200, Pierre Morel wrote: > > > On 24/08/2017 16:19, Michael S. Tsirkin wrote: > > > > On Wed, Aug 23, 2017 at 06:33:02PM +0200, Pierre Morel wrote: > > > > > Reseting a device can sometime fail, even a virtual device. > > > > > If the device is not reseted after a while the driver should > > > > > abandon the retries. > > > > > This is the change proposed for the modern virtio_pci. > > > > > > > > > > More generally, when this happens,the virtio driver can set the > > > > > VIRTIO_CONFIG_S_FAILED status flag to advertise the caller. > > > > > > > > > > The virtio core can test if the reset was succesful by testing > > > > > this flag after a reset. > > > > > > > > > > This behavior is backward compatible with existing drivers. > > > > > This behavior seems to me compatible with Virtio-1.0 specifications, > > > > > Chapters 2.1 Device Status Field. > > > > > There I definitively need your opinion: Is it right? > > > > > > > > > > This patch also lead to another question: > > > > > do we care if a device provided by the hypervisor is buggy? > > > > > > > > > > Signed-off-by: Pierre Morel <pmorel@xxxxxxxxxxxxxxxxxx> > > > > > > > > So I think this is not the best place to start to add error recovery. > > > > > > I agree, there can not be any error recovery there. > > > If reset does not work we can let fall the device until next reset of the > > > hypervisor. > > > > On probe, yes. But failures are more likely to trigger at other times. > > OK, what about: > - On probe if reset fail, the probe fail. > > - On freeze and remove : we can not free resources which are common > with the device, at least the queues. > ... we can only signal the error and give up with the device. > > > > > > > It should be much more common to have a situation where device gets > > > > broken while it's being used. Spec has a NEEDS_RESET flag for this. > > > > > > Yes the device side can set this flag, but it is another problem, it is > > > supposing that: > > > - the transport, device side, still works. > > > - it is able to detect that the device need a reset > > > - a reset is effective > > > > Right. OTOH in this case there's more we can do. > > Yes, I did not find a single test of this flag (NEEDS_RESET). > even QEMU set it quite often (though virtio_error()) > > The decision to reset the device must come from the driver. > The protocol to reset the device is device/driver specific... lotta work > > Shouldn't it be separate from the "reset failed" problem? > > > Regards, > > Pierre > I just don't think we can do a lot about reset failed without risk of breaking some working config. So I would start with need reset and maybe some reset failures will be fixable as a side effect. Yes it's a lot of work. For example we need to validate device input, can't rely on it to be consistent. -- MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization