Re: [PATCH RFC 1/4] PCI: hotplug: Add parameter to put devices to reset during rescan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 18, 2018 at 6:55 AM, Sergey Miroshnichenko
<s.miroshnichenko@xxxxxxxxx> wrote:
> Hello Sam,
>
> On 9/17/18 8:28 AM, Sam Bobroff wrote:
>> Hi Sergey,
>>
>> On Fri, Sep 14, 2018 at 07:14:01PM +0300, Sergey Miroshnichenko wrote:
>>> Introduce a new command line option "pci=pcie_movable_bars" that indicates
>>> support of PCIe hotplug without prior reservation of memory regions by
>>> BIOS/bootloader.
>>>
>>> If a new PCIe device has been hot-plugged between two active ones, which
>>> have no (or not big enough) gap between their BARs, allocating new BARs
>>> requires to move BARs of the following working devices:
>>>
>>> 1)                   dev 4
>>>                        |
>>>                        v
>>> .. |  dev 3  |  dev 3  |  dev 5  |  dev 7  |
>>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>>
>>> 2)                             dev 4
>>>                                  |
>>>                                  v
>>> .. |  dev 3  |  dev 3  | -->           --> |  dev 5  |  dev 7  |
>>> .. |  BAR 0  |  BAR 1  | -->           --> |  BAR 0  |  BAR 0  |
>>>
>>> 3)
>>>
>>> .. |  dev 3  |  dev 3  |  dev 4  |  dev 4  |  dev 5  |  dev 7  |
>>> .. |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 1  |  BAR 0  |  BAR 0  |
>>>
>>> Not only BARs, but also bridge windows can be updated during a PCIe rescan,
>>> threatening all memory transactions during this procedure, so the PCI
>>> subsystem will instruct the drivers to pause by calling the reset_prepare()
>>> and reset_done() callbacks.
>>>
>>> If a device may be affected by BAR movement, the BAR changes tracking must
>>> be implemented in its driver.
>>>
>>> Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@xxxxxxxxx>
>>> ---
>>>  .../admin-guide/kernel-parameters.txt         |  6 +++
>>>  drivers/pci/pci.c                             |  2 +
>>>  drivers/pci/probe.c                           | 43 +++++++++++++++++++
>>>  include/linux/pci.h                           |  1 +
>>>  4 files changed, 52 insertions(+)
>>>
>>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>>> index 64a3bf54b974..f8132a709061 100644
>>> --- a/Documentation/admin-guide/kernel-parameters.txt
>>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>>> @@ -3311,6 +3311,12 @@
>>>                              bridges without forcing it upstream. Note:
>>>                              this removes isolation between devices and
>>>                              may put more devices in an IOMMU group.
>>> +            pcie_movable_bars       Arrange a space at runtime for BARs of
>>> +                            hotplugged PCIe devices - usable if bootloader
>>> +                            doesn't reserve memory regions for them. Freeing
>>> +                            a space may require moving BARs of active devices
>>> +                            to higher addresses, so device drivers will be
>>> +                            paused during rescan.
>>>
>>>      pcie_aspm=      [PCIE] Forcibly enable or disable PCIe Active State Power
>>>                      Management.
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 1835f3a7aa8d..5f07a59b5924 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -6105,6 +6105,8 @@ static int __init pci_setup(char *str)
>>>                              pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
>>>                      } else if (!strncmp(str, "disable_acs_redir=", 18)) {
>>>                              disable_acs_redir_param = str + 18;
>>> +                    } else if (!strncmp(str, "pcie_movable_bars", 17)) {
>>> +                            pci_add_flags(PCI_MOVABLE_BARS);
>>>                      } else {
>>>                              printk(KERN_ERR "PCI: Unknown option `%s'\n",
>>>                                              str);
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 201f9e5ff55c..bdaafc48dc4c 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -3138,6 +3138,45 @@ unsigned int pci_rescan_bus_bridge_resize(struct pci_dev *bridge)
>>>      return max;
>>>  }
>>>
>>> +/*
>>> + * Put all devices of the bus and its children to reset
>>> + */
>>> +static void pci_bus_reset_prepare(struct pci_bus *bus)
>>> +{
>>> +    struct pci_dev *dev;
>>> +
>>> +    list_for_each_entry(dev, &bus->devices, bus_list) {
>>> +            struct pci_bus *child = dev->subordinate;
>>> +
>>> +            if (child) {
>>> +                    pci_bus_reset_prepare(child);
>>> +            } else if (dev->driver &&
>>> +                       dev->driver->err_handler &&
>>> +                       dev->driver->err_handler->reset_prepare) {
>>> +                    dev->driver->err_handler->reset_prepare(dev);
>>> +            }
>>
>> What about devices with drivers that don't have reset_prepare()?  It
>> looks like it will just reconfigure them anyway. Is that right?
>
> It is possible that unprepared driver without these hooks will get BARs
> moved, I should put a warning message there. There three ways we can see
> to make this safe:>

It might make more sense to allow drivers to opt-in to this rather
than assuming any driver with the error handlers supports re-assigning
BARs on the fly. As far as I know we restore the previous BAR values a
part of the EEH recovery process when there's a driver bound, but I
could be wrong.

>  - add the reset_prepare()/reset_done() hooks to *every* PCIe driver;
>  - refuse BAR movement if at least one unprepared driver has been
> encountered during rescan;
>  - reduce the number of drivers which can be affected to some
> controllable value and prepare them on demand.

Is there any reason we couldn't just unbind the unaware drivers and
re-bind them afterwards? That might be useful when hotplugging NVMe
racks since you wouldn't want a switch management driver (or similar)
to prevent re-assignments if they're required.

> Applying the second proposal as a major restriction seems fairly
> reasonable, but for our particular setups and use-cases it is probably
> too stiff:
>  - we've noticed that devices connected directly to the root bridge
> don't get moved BARs, and this covers our x86_64 servers: we only
> insert/remove devices into "second-level" and "lower" bridges there, but
> not root;
>
>  - on PowerNV we have system devices (network interfaces, USB hub, etc.)
> grouped into dedicated domain, with all other domains ready for hotplug,
> and only these domains can be rescanned.

Are you doing anything to enforce this or just relying on people not
re-scanning the system device bus?

> With our scenarios currently reduced to these two, we can live with just
> a few drivers "prepared" for now: NVME and few Ethernet adapters, this
> gives us a possibility to use this feature before "converting" *all* the
> drivers, and even have the NVidia cards running on a closed proprietary
> driver.
>
> Should we make this behavior adjustable with something like
> "pcie_movable_bars=safe" and "pcie_movable_bars=always" ?
>
> Thanks!
>
> Best regards,
> Serge
>
>>> +    }
>>> +}
>>> +
>>> +/*
>>> + * Complete the reset of all devices for the bus and its children
>>> + */
>>> +static void pci_bus_reset_done(struct pci_bus *bus)
>>> +{
>>> +    struct pci_dev *dev;
>>> +
>>> +    list_for_each_entry(dev, &bus->devices, bus_list) {
>>> +            struct pci_bus *child = dev->subordinate;
>>> +
>>> +            if (child) {
>>> +                    pci_bus_reset_done(child);
>>> +            } else if (dev->driver && dev->driver->err_handler &&
>>> +                       dev->driver->err_handler->reset_done) {
>>> +                    dev->driver->err_handler->reset_done(dev);
>>> +            }
>>> +    }
>>> +}
>>> +
>>>  /**
>>>   * pci_rescan_bus - Scan a PCI bus for devices
>>>   * @bus: PCI bus to scan
>>> @@ -3151,8 +3190,12 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
>>>  {
>>>      unsigned int max;
>>>
>>> +    if (pci_has_flag(PCI_MOVABLE_BARS))
>>> +            pci_bus_reset_prepare(bus);
>>>      max = pci_scan_child_bus(bus);
>>>      pci_assign_unassigned_bus_resources(bus);
>>> +    if (pci_has_flag(PCI_MOVABLE_BARS))
>>> +            pci_bus_reset_done(bus);
>>>      pci_bus_add_devices(bus);
>>>
>>>      return max;
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index 6925828f9f25..a8cb1a367c34 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -847,6 +847,7 @@ enum {
>>>      PCI_ENABLE_PROC_DOMAINS = 0x00000010,   /* Enable domains in /proc */
>>>      PCI_COMPAT_DOMAIN_0     = 0x00000020,   /* ... except domain 0 */
>>>      PCI_SCAN_ALL_PCIE_DEVS  = 0x00000040,   /* Scan all, not just dev 0 */
>>> +    PCI_MOVABLE_BARS        = 0x00000080,   /* Runtime BAR reassign after hotplug */
>>>  };
>>>
>>>  /* These external functions are only available when PCI support is enabled */
>>> --
>>> 2.17.1
>>>



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux