(2012/10/12 2:28), Khalid Aziz wrote: > On Thu, 2012-10-11 at 15:16 +0900, Takao Indoh wrote: >> (2012/10/11 5:08), Khalid Aziz wrote: > ..... >>>> +static void __init do_reset(u8 bus, u8 slot, u8 func) >>>> +{ >>>> + u16 ctrl; >>>> + >>>> + printk(KERN_INFO "pci 0000:%02x:%02x.%d reset\n", bus, slot, func); >>>> + >>>> + /* Assert Secondary Bus Reset */ >>>> + ctrl = read_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL); >>>> + ctrl |= PCI_BRIDGE_CTL_BUS_RESET; >>>> + write_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL, ctrl); >>>> + >>>> + pci_udelay(5000); >>>> + >>>> + /* De-assert Secondary Bus Reset */ >>>> + ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; >>>> + write_pci_config_16(bus, slot, func, PCI_BRIDGE_CONTROL, ctrl); >>>> + >>>> + pci_udelay(500000); >>> >>> This is 0.5 second. This will add up quickly on larger servers with >>> multiple busses. Is 0.5 second required by the spec? >>> aer_do_secondary_bus_reset() holds PCI_BRIDGE_CTL_BUS_RESET for 2 ms and >>> then waits another 200 ms after de-asserting it. Still long, but less >>> than half of the delay in above code.. >> >> According to PCIe spec, they should be more than 1ms and 100ms. The reason >> why they're 2ms and 200ms in aer_do_secondary_bus_reset() is that it's a >> kind of safety margin, I think. >> >> At first these delay were 2ms and 200ms as well as >> aer_do_secondary_bus_reset(), but I found a problem that on certain >> machine it was not enough. I think this problem occurs because >> native_io_delay() is not precise. Therefore I extended them to have more >> margin. >> >> If this delay should be shortened, I have two solutions. >> >> 1) >> Call early_reset_pcie_devices() after calibrate_delay() so that we can >> use precise mdelay. Personally I don't want to do this because I think >> DMA should be stopped asap. >> >> 2) >> Make it tuneable. For exmple, improve "reset_devices" >> reset_devices=reset_delay=500 >> or something like that. As default, 200ms is enough as far as I tested. >> But if it is not enough, we can adjust delay time using this. >> >> Any other idea? >> > > I don't like the idea of asking end user to determine how many msec of > delay they would need on their system for a reset. If we have to go that > route, I would rather have a default of 200 msec and then add a kernel > option like "reset_devices=reset_delay=long" which changes the delay to > 500 msec. > > Here is another idea. The code you currently have does: > > for (each device) { > save config registers > reset > wait for 500 ms > restore config registers > } > > You need to wait for 500 ms because you can not access config registers > until reset is complete. So how about this - why not just save config > registers, reset each device and then wait only once for 500 ms before > restoring config registers on all devices. Here is what this will look > like: > > for (each device) { > save config registers > reset > } > wait 500 ms > for (each device) { > restore config registers > } > > This is obviously more complex and requires you to allocate more space > for saving state, but it has the benefits of minimizing boot up delay > and making the delay constant irrespective of number of switches/bridges > on the system. > > Does this sound reasonable? Yep, that looks good idea. I'll update my patch with this idea. Thanks, Takao Indoh > > >>> >>> We have been seeing problems with kexec/kdump kernel for quite some time >>> that are related to I/O devices not being quiesced before kexec. I had >>> added code to clear Bus Master bit to help stop runaway DMAs which >>> helped many cases, but obviously not all. If resetting downstream ports >>> helps stop runaway I/O from PCIe devices, I am all for this approach. >>> This patch still doesn't do anything for old PCI devices though. >> >> This patch does not reset PCI devices because of VGA problem. As I wrote >> in patch description, the monitor blacks out when VGA controller is >> reset. So when VGA device is found we need to skip it. In the case of >> PCIe, we can decide whether we do bus reset or not on each device, but >> we cannot in the case of PCI. > > OK. My concern is there are plenty of older servers out there that > potentially have the problem with kexec/kdump failing often and we are > not solving the problem for them. If we can only solve it for PCIe for > now, I am ok with starting there. > > -- > Khalid > > > >