Re: [PATCH] vfio/pci: Support error recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 13, 2016 at 09:27:59AM -0700, Alex Williamson wrote:
> On Tue, 13 Dec 2016 18:12:34 +0200
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> 
> > On Mon, Dec 12, 2016 at 08:39:48PM -0700, Alex Williamson wrote:
> > > On Tue, 13 Dec 2016 05:15:13 +0200
> > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > >   
> > > > On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote:  
> > > > > > So just don't do it then. Topology must match between host and guest,
> > > > > > except maybe for the case of devices with host driver  (e.g. PF)
> > > > > > which we might be able to synchronize against.    
> > > > > 
> > > > > We're talking about host kernel level handling here.  The host kernel
> > > > > cannot defer the link reset to the user under the assumption that the
> > > > > user is handling the devices in a very specific way.  The moment we do
> > > > > that, we've lost.    
> > > > 
> > > > The way is same as baremetal though, so why not?  
> > > 
> > > How do we know this?  What if the user is dpdk?  The kernel is
> > > responsible for maintaining the integrity of the system and devices,
> > > not the user.
> > >   
> > > > And if user doesn't do what's expected, we can
> > > > do the full link reset on close.  
> > > 
> > > That's exactly my point, if we're talking about multiple devices,
> > > there's no guarantee that the close() for each is simultaneous.  If one
> > > function is released before the other we cannot do a bus reset.  If
> > > that device is then opened by another user before its sibling is
> > > released, then we once again cannot perform a link reset.  I don't
> > > think it would be reasonable to mark the released device quarantined
> > > until the sibling is released, that would be a terrible user experience.  
> > 
> > Not sure why you find it so terrible, and I don't think there's another way.
> 
> If we can't do it without regressing the support we currently have,
> let's not do it at all.

Why would we regress?  As long as there are no unrecoverable errors,
there's no need to change behaviour at all.

Alex, do you have a picture of how error recovery can work in your mind?
Your answers seem to imply you do, and these patches don't implement
this correctly.  I'm not sure about others, but I for one am unable to
piece it together from the comments you provide.  If yes, could you
maybe do a short writeup of an architecture you would be comfortable
with?

Thanks,

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux