On 12/08/2016 10:46 PM, Cao jin wrote: > > > On 12/06/2016 11:35 PM, Alex Williamson wrote: >> On Tue, 6 Dec 2016 18:46:04 +0800 >> Cao jin <caoj.fnst@xxxxxxxxxxxxxx> wrote: >> >>> On 12/06/2016 12:59 PM, Alex Williamson wrote: >>>> On Tue, 6 Dec 2016 05:55:28 +0200 >>>> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: >>>> >>>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote: >>>>>> If you're going to take the lead for these AER patches, I would >>>>>> certainly suggest that understanding the reasoning behind the bus reset >>>>>> behavior is a central aspect to this series. This effort has dragged >>>>>> out for nearly two years and I apologize, but I don't really have a lot >>>>>> of patience for rehashing some of these issues if you're not going to >>>>>> read the previous discussions or consult with your colleagues to >>>>>> understand how we got to this point. If you want to challenge some of >>>>>> the design points, that's great, it could use some new eyes, but please >>>>>> understand how we got here first. >>>>> >>>>> Well I'm guessing Cao jin here isn't the only one not >>>>> willing to plough through all historical versions of the patchset >>>>> just to figure out the motivation for some code. >>>>> >>>>> Including a summary of a high level architecture couldn't hurt. >>>>> >>>>> Any chance of writing such? Alternatively, we can try to build it as >>>>> part of this thread. Shouldn't be hard as it seems somewhat >>>>> straight-forward on the surface: >>>>> >>>>> - detect link error on the host, don't reset link as we would normally do >>>> >>>> This is actually a new approach that I'm not sure I agree with. By >>>> skipping the host directed link reset, vfio is taking responsibility >>>> for doing this, but then we just assume the user will do it. I have >>>> issues with this. >>>> >>>> The previous approach was to use the error detected notifier to block >>>> access to the device, allowing the host to perform the link reset. A >>>> subsequent notification in the AER process released the user access >>>> which allowed the user AER process to proceed. This did result in both >>>> a host directed and a guest directed link reset, but other than >>>> coordinating the blocking of the user process during host reset, that >>>> hasn't been brought up as an issue previously. >>>> >>> >>> Tests on previous versions didn't bring up issues as I find, I think >>> that is because we didn't test it completely. As I know, before August >>> of this year, we didn't have cable connected to NIC, let alone >>> connecting NIC to gateway. >> >> Lack of testing has been a significant issue throughout the development >> of this series. >> >>> Even if I fixed the guest oops issue in igb driver that Alex found in >>> v9, v9 still cannot work in my test. And in my test, disable link >>> reset(in host) in aer core for vfio-pci is the most significant step to >>> get my test passed. >> >> But is it the correct step? I'm not convinced. Why did blocking guest >> access not work? How do you plan to manage vfio taking the >> responsibility to perform a bus reset when you don't know whether QEMU >> is the user of the device or whether the user supports AER recovery? >> > > Maybe currently we don't have enough proof to prove the correctness, but > I think I did find some facts to prove that link reset in host is a big > trouble, and can answer part of questions above. > > 1st, some thoughts: > In pci-error-recovery.txt and do_recovery() of kernel tree, we can see, > a recovery consists of several steps(callbacks), link reset is one of > them, and except link reset, the others are seems kind of device > specific. In our case, both host & guest will do recovery, I think the > host recovery actually is some kind of fake recovery, see vfio-pci > driver's error_detected & resume callback, they don't do anything > special, mainly signal error to user, but the link reset in host "fake > reset" does some serious work, in other words, I think host does the > recovery incompletely, so I was thinking, why not just drop incompletely > host recovery(drop link reset) for vfio-pci, and let the guest take care > of the whole serious recovery. This is part of the reason of why my > version looks like this. But yes, I admit the issue Alex mentioned, > vfio can't guarantee that user will do a bus reset, this is an issue I > will keep looking for a solution. > > 2nd, some facts and analyzation from test: > In host, the relationship between time and behviour in each component > roughly looks as following: > > + HW + host kernel + qemu + guest kernel + > | |(error recovery)| | | > | | | | | > | | vfio-pci's | | | > | | error_detected | | | > | | + | | | > | | | | | | > | | | error notify | | | > | | | via eventfd | | | > | | +---------------> +----------+ | | > | | | +vfio_err_ | | | > | | | |notifier_ | | | > | +---- +<---+link reset | |handler | | | > | | HW | | | | | | | > | | | | | | | | > | |r | | vfio-pci's | |pass aer | | | > | |e.. | | resume | |to guest | | | > | |s. | | (blocking end) | | | | | > | |e | | | | *2* | | | > | |t | | | +----+-----+ | | > | |i | | | | | | > | |n | | | +--------> +----------+ | > | |g | | | | | guest | | > | | | | | | | recovery | | > | | | | | | | process | | > | | | | | | |(include | | > | | | | | | |register | | > | | *1* | | | | |access) | | > | | | | | | | | | > | | | | | | | *3* | | > | | | | +----------+ | > Time | | | | | > | | | | | > | | | | | > | | | | | > v | | | | > > Now let me try to answer: Why did blocking guest access not work? > Some important factor: > 1. host recovery doesn't do anything special except error notifying, so > it may be executed very fast. > 2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing > it need many ms, pretty long? > > some facts found in v9(block config write, not read, during host > recovery) test: > 1. reading uncor error register in vfio_err_notifier_handler sometimes > returns correct value, sometimes return invalid value(All F's) > I forget another facts: 2. the igb bug[*] our patch v9 triggered also is the same kind things. error_detected() of igb in guest read config space, and return invalid all F's, then igb NULLed its hardware address, then oops shows. I think it is also a concurrent issue as described below. [*] http://patchwork.ozlabs.org/patch/692171 > So, I am thinking, if host blocking on host end early, and *2*, *3* is > parallel with *1*, the way used in v9 to blocking guest access, may not > work. > -- Sincerely, Cao jin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html