Hi Igor, On 12/13/23 08:54, Michael S. Tsirkin wrote: > On Wed, Dec 13, 2023 at 05:49:39PM +0100, Igor Mammedov wrote: >> On Wed, Dec 13, 2023 at 2:08 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: >>> >>> On Wed, Dec 13, 2023 at 1:36 AM Igor Mammedov <imammedo@xxxxxxxxxx> wrote: >>>> >>>> previous commit ("PCI: acpiphp: enable slot only if it hasn't been enabled already" >>>> introduced a workaround to avoid a race between SCSI_SCAN_ASYNC job and >>>> bridge reconfiguration in case of single HBA hotplug. >>>> However in virt environment it's possible to pause machine hotplug several >>>> HBAs and let machine run. That can hit the same race when 2nd hotplugged >>>> HBA will start re-configuring bridge. >>>> Do the same thing as SHPC and throttle down hotplug of 2nd and up >>>> devices within single hotplug event. >>>> >>>> Signed-off-by: Igor Mammedov <imammedo@xxxxxxxxxx> >>>> --- >>>> drivers/pci/hotplug/acpiphp_glue.c | 6 ++++++ >>>> 1 file changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/pci/hotplug/acpiphp_glue.c b/drivers/pci/hotplug/acpiphp_glue.c >>>> index 6b11609927d6..30bca2086b24 100644 >>>> --- a/drivers/pci/hotplug/acpiphp_glue.c >>>> +++ b/drivers/pci/hotplug/acpiphp_glue.c >>>> @@ -37,6 +37,7 @@ >>>> #include <linux/mutex.h> >>>> #include <linux/slab.h> >>>> #include <linux/acpi.h> >>>> +#include <linux/delay.h> >>>> >>>> #include "../pci.h" >>>> #include "acpiphp.h" >>>> @@ -700,6 +701,7 @@ static void trim_stale_devices(struct pci_dev *dev) >>>> static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) >>>> { >>>> struct acpiphp_slot *slot; >>>> + int nr_hp_slots = 0; >>>> >>>> /* Bail out if the bridge is going away. */ >>>> if (bridge->is_going_away) >>>> @@ -723,6 +725,10 @@ static void acpiphp_check_bridge(struct acpiphp_bridge *bridge) >>>> >>>> /* configure all functions */ >>>> if (slot->flags != SLOT_ENABLED) { >>>> + if (nr_hp_slots) >>>> + msleep(1000); >>> >>> Why is 1000 considered the most suitable number here? Any chance to >>> define a symbol for it? >> >> Timeout was borrowed from SHPC hotplug workflow where it apparently >> makes race harder to reproduce. >> (though it's not excuse to add more timeouts elsewhere) >> >>> And won't this affect the cases when the race in question is not a concern? >> >> In practice it's not likely, since even in virt scenario hypervisor won't >> stop VM to hotplug device (which beats whole purpose of hotplug). >> >> But in case of a very slow VM (overcommit case) it's possible for >> several HBA's to be hotplugged by the time acpiphp gets a chance >> to handle the 1st hotplug event. SHPC is more or less 'safe' with its >> 1sec delay. >> >>> Also, adding arbitrary timeouts is not the most robust way of >>> addressing race conditions IMV. Wouldn't it be better to add some >>> proper synchronization between the pieces of code that can race with >>> each other? >> >> I don't like it either, it's a stop gap measure to hide regression on >> short notice, >> which I can fixup without much risk in short time left, before folks >> leave on holidays. >> It's fine to drop the patch as chances of this happening are small. >> [1/2] should cover reported cases. >> >> Since it's RFC, I basically ask for opinions on a proper way to fix >> SCSI_ASYNC_SCAN >> running wild while the hotplug is in progress (and maybe SCSI is not >> the only user that >> schedules async job from device probe). > > Of course not. And things don't have to be scheduled from probe right? > Can be triggered by an interrupt or userspace activity. I agree with Michael. TBH, I am curious if the two patches can workaround/resolve the issue. Would you mind helping explain if to run enable_slot() for a new PCI device can impact the other PCI devices existing on the bridge? E.g.,: 1. Attach several virtio-scsi or virtio-net on the same bridge. 2. Trigger workload for those PCI devices. They may do mmio write to kick the doorbell (to trigger KVM/QEMU ioeventfd) very frequently. 3. Now hot-add an extra PCI device. Since the slot is never enabled, it enables the slot via enable_slot(). Can I assume the last enable_slot() will temporarily re-configure the bridge window so that all other PCI devices' mmio will lose effect at that time point? Since drivers always kick the doorbell conditionally, they may hang forever. As I have reported, we used to have the similar issue. PCI: Probe bridge window attributes once at enumeration-time https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=51c48b310183ab6ba5419edfc6a8de889cc04521 Therefore, can I assume the issue is not because to re-enable an already-enabled slot, but to touch the bridge window for more than once? Thank you very much! Dongli Zhang > >> So adding synchronisation and testing >> would take time (not something I'd do this late in the cycle). >> >> So far I'm thinking about adding rw mutex to bridge with the PCI >> hotplug subsystem >> being a writer while scsi scan jobs would be readers and wait till hotplug code >> says it's safe to proceed. >> I plan to work in this direction and give it some testing, unless >> someone has a better idea. > >>> >>>> + >>>> + ++nr_hp_slots; >>>> enable_slot(slot, true); >>>> } >>>> } else { >>>> -- >>> >