On 12/06/2016 11:35 PM, Alex Williamson wrote: > On Tue, 6 Dec 2016 18:46:04 +0800 > Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: > >> On 12/06/2016 12:59 PM, Alex Williamson wrote: >>> On Tue, 6 Dec 2016 05:55:28 +0200 >>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >>> >>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote: >>>>> If you're going to take the lead for these AER patches, I would >>>>> certainly suggest that understanding the reasoning behind the bus reset >>>>> behavior is a central aspect to this series. This effort has dragged >>>>> out for nearly two years and I apologize, but I don't really have a lot >>>>> of patience for rehashing some of these issues if you're not going to >>>>> read the previous discussions or consult with your colleagues to >>>>> understand how we got to this point. If you want to challenge some of >>>>> the design points, that's great, it could use some new eyes, but please >>>>> understand how we got here first. >>>> >>>> Well I'm guessing Cao jin here isn't the only one not >>>> willing to plough through all historical versions of the patchset >>>> just to figure out the motivation for some code. >>>> >>>> Including a summary of a high level architecture couldn't hurt. >>>> >>>> Any chance of writing such? Alternatively, we can try to build it as >>>> part of this thread. Shouldn't be hard as it seems somewhat >>>> straight-forward on the surface: >>>> >>>> - detect link error on the host, don't reset link as we would normally do >>> >>> This is actually a new approach that I'm not sure I agree with. By >>> skipping the host directed link reset, vfio is taking responsibility >>> for doing this, but then we just assume the user will do it. I have >>> issues with this. >>> >>> The previous approach was to use the error detected notifier to block >>> access to the device, allowing the host to perform the link reset. A >>> subsequent notification in the AER process released the user access >>> which allowed the user AER process to proceed. This did result in both >>> a host directed and a guest directed link reset, but other than >>> coordinating the blocking of the user process during host reset, that >>> hasn't been brought up as an issue previously. >>> >> >> Tests on previous versions didn't bring up issues as I find, I think >> that is because we didn't test it completely. As I know, before August >> of this year, we didn't have cable connected to NIC, let alone >> connecting NIC to gateway. > > Lack of testing has been a significant issue throughout the development > of this series. > >> Even if I fixed the guest oops issue in igb driver that Alex found in >> v9, v9 still cannot work in my test. And in my test, disable link >> reset(in host) in aer core for vfio-pci is the most significant step to >> get my test passed. > > But is it the correct step? I'm not convinced. Why did blocking guest > access not work? How do you plan to manage vfio taking the > responsibility to perform a bus reset when you don't know whether QEMU > is the user of the device or whether the user supports AER recovery? > >>>> - report link error to guest >>>> - detect link reset request from guest >>>> - reset link on host >>>> >>>> Since link reset will reset all devices behind it, for this to work we >>>> need same set of devices behind the link in host and guest. Enforcing >>>> this would be nice to have. >>> >>> This is a pretty significant complication and I think it's a >>> requirement. This is the heart of why we have an AER vfio-pci device >>> option and why we require that QEMU should fail to initialize the >>> device if AER is enabled in an incompatible configuration. If a user >>> cannot be sure that AER is enabled on a device, it's pointless to >>> bother implementing it, though honestly I question the value of it in >>> the VM altogether given configuration requirements (ie. are users >>> going to accept the reason that all the ports of a device need to be >>> assigned to a single guest for guest-based AER recovery when they were >>> previously able to assign each port to a separate VM?). >>> >>>> - as link now might end up in bad state, reset >>>> it when device is unassigned >>> >>> This is also a new aspect for the approach here, previously we allowed >>> the host directed link reset so we could assume the device was >>> sufficiently recovered. In the proposal here, the AER core skips any >>> devices bound to vfio-pci, but vfio can't guarantee that we can do a >>> bus reset on them. PCI bus isolation is not accounted for in DMA >>> isolation, which is the basis for IOMMU groups. A bus can host >>> multiple IOMMU groups, each of which may have a different user. Only >>> in a very specific configuration can vfio do a bus reset. >>> >>>> Any details I missed? >>> >>> AIUI, the critical feature is that the guest needs to be able to reset >>> the device link, all the other design elements play out from the >>> logical expansion of that feature. It means that a guest bus reset >>> needs to translate to a host bus reset, which means that all of the >>> affected host devices need to be accounted for and those that are >>> assigned to the guest need to be affected in the guest in the same >>> way. QEMU must enforce this configuration or else a user cannot know >>> the result of a AER fault, ie. will it cause a VMSTOP condition or is >>> the fault forwarded to the guest. The required configuration >>> restrictions are quite involved, therefore we can't simply require this >>> of all configurations, so a vfio-pci device option is added. The >>> coordination of the host directed reset with the guest directed reset >>> was only a complications discovered within the last few revisions of >>> the series. As noted above, the previous solution to this was to >>> attempt to block access to the device while the host reset proceeds. >>> >>> Clearly it's a little disconcerting if we throw all of that away and >>> simply assume that an FLR is sufficient to reset the device when it >>> seems like link issues might be a nontrivial source of AER faults. If >>> FLR is sufficient, then why does the core AER handling code in the >>> kernel not simply do this? Thanks, >>> >> >> I agree with the points here. Now I understand why translate a guest >> link reset to host link reset[*], and FLR shouldn't be equivalent to >> link reset, but the PITY is, we can't trigger a real fatal error to see >> if a FLR is sufficient to reset the device. > > In this case, it's not a matter of finding a scenario where it works. > Using FLR to recover from a fatal error would need to be supported by > verbiage in a specification. I'm not interested in cases where it > happens to work on single device for a certain type of error. The link > reset is the bare metal mechanism for recovering from a fatal error and > it should be our mechanism as well unless there's spec wording to > support another approach. Thanks, > Ok, I understand your points here, will drag that dropped patch back and test. -- Sincerely, Cao jin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html