On 22/10/2021 00:57, Bjorn Helgaas wrote: > On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote: >> On 21/10/2021 02:50, Bjorn Helgaas wrote: >>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote: >>>> On 20/10/2021 03:23, Bjorn Helgaas wrote: >>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote: > >>>>>> This patch will try to handle this case in a more common way >>>>>> instead of the original 'arch' specific solution, which will be >>>>>> beneficial to all the APEI-dependent platforms after that. >>>>> >>>>> This actually doesn't say anything about what the patch does or >>>>> how it works. It says "handles this case in a more common way" >>>>> but with no details. >>>> >>>> Good suggestion, I'll give more details about that... >>>> >>>>> The EINJ table contains "injection instructions" that can read >>>>> or write "register regions" described by generic address >>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and >>>>> __einj_error_trigger() requests those register regions with >>>>> request_mem_region() or request_region() before executing the >>>>> injections instructions. >>>>> >>>>> IIUC, this patch basically says "if this region is part of the >>>>> MCFG area, we don't need to reserve it." That leads to the >>>>> questions of why we need to reserve *any* of the areas >>>> >>>> AFAIK, the MCFG area is reserved since the ECAM module will >>>> provide a generic Kernel Programming Interfaces(KPI), e.g, >>>> pci_generic_config_read(...), so all the drivers are allowed to >>>> access the pci config space only by those KPIs in a consistent >>>> and safe way, direct raw access will break the rule. Correct me >>>> if I am missing sth. >>>> >>>>> and why it's safe to simply skip reserving regions that are part >>>>> of the MCFG area. >>>> >>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error >>>> injection tolerance level") before to address this issue, the >>>> entire commit log as below: >>>> >>>> Some BIOSes utilize PCI MMCFG space read/write opertion to trigger >>>> specific errors. EINJ will report errors as below when hitting such >>>> cases: >>>> >>>> APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers >>>> >>>> It is because on x86 platform ACPI based PCI MMCFG logic has >>>> reserved all MMCFG spaces so that EINJ can't reserve it again. >>>> We already trust the ACPI/APEI code when using the EINJ interface >>>> so it is not a big leap to also trust it to access the right >>>> MMCFG addresses. Skip address checking to allow the access. >>> >>> I'm not really convinced by that justification because I don't >>> think the issue here is *trust*. If all we care about is trust, >>> and we trust the ACPI/APEI code, why do we need to reserve >>> anything at all when executing EINJ actions? >>> >>> I think the resource reservation issue is about coordinating >>> multiple users of the address space. A driver reserves the MMIO >>> address space of a device it controls so no other driver can >>> reserve it at the same time and cause conflicts. >>> >>> I'm not really convinced by this mutual exclusion argument either, >>> because I haven't yet seen a situation where we say "EINJ needs a >>> resource that's already in use by somebody else, so we can't use >>> EINJ." When conflicts arise, the response is always "we'll just >>> stop reserving this conflicting resource but use it anyway." >>> >>> I think the only real value in apei_resources_request() is a >>> little bit of documentation in /proc/iomem. For ERST and EINJ, >>> even that only lasts for the tiny period when we're actually >>> executing an action. >>> >>> So convince me there's a reason why we shouldn't just remove >>> apei_resources_request() completely :) >> >> I have to confess that currently I have no strong evidence/reason to >> convince you that it's absolute safe to remove >> apei_resources_request(), probably in some conditions it *does* >> require to follow the mutual exclusion usage model. The ECAM/MCFG >> maybe a special case not like other normal device driver, since all >> its MCFG space has been reserved during the initialization. Anyway, >> it's another topic and good point well worth discussing in the >> future. > > This is missing the point. It's not the MCFG reservation during > initialization that would make this safe. What would make it safe is > the fact that ECAM does not require mutual exclusion. > > When the hardware implements ECAM correctly, PCI config accesses do > not require locking because a config access requires a single MMIO > load or store. > I don't quite understand here, we're talking about apei_resources_request() which is a mechanism to void resource conflict,"request_mem_region() tells the kernel that your driver is going to use this range of I/O addresses, which will prevent other drivers to make any overlapping call to the same region through request_mem_region()", but according to the context of 'a single MMIO load or store', are you talking about something like the mutex lock primitive? > Many non-ECAM config accessors *do* require locking because they use > several register accesses, e.g., the 0xCF8/0xCFC address/data pairs > used by pci_conf1_read(). If EINJ actions used these, we would have > to enforce mutual exclusion between EINJ config accesses and those > done by other drivers. I take a look at the pci_conf1_read() function, there's only a pair of raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the mutual exclusion you mentioned, seems it's not related to the apei_resources_request() we're talking about... > > Some ARM64 platforms do not implement ECAM correctly, e.g., > tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus() > sets an RTDID register before the MMIO load/store. Platforms like > this *do* require mutual exclusion between an EINJ config access and > other config accesses. What's the mutual exclusion for those quirk functions(tegra194 and xgene)? *mutual* is not applied for single side. I can see neither locking nor request_mem_region() in those bus map functions. > > These platforms are supported via quirks in pci_mcfg.c, so they will > have resources in the pci_mcfg_list, and if we just ignore all the > MCFG resources in apei_resources_request(), there will be nothing to > prevent ordinary driver config accesses from being corrupted by EINJ > accesses. > > I think in general, is probably *is* safe to remove MCFG resources > from the APEI reservations, but it would be better if we had some way > to prevent EINJ from using MCFG on platforms like tegra194 and xgene. Just as I mentioned, since there's no mutual exclusion applied for the tegra194 and xgene(correct me if I am wrong), putting their MCFG resources into the APEI reservation(so the apei_resources_request() applied) does nothing Thanks, Xuesong > >> From the patch set itself, I don't think it's a nice idea to make a >> dramatic change regarding the apei_resources_request() part, I >> suggest to keep the original rationale untouched and based on that >> to fix the real issue at hand in a more generic way. > > There *was* no original rationale. The whole point of this > conversation is to figure out what the real rationale is. > >>>> Except that the above explanation, IMO the EINJ is only a RAS >>>> debug framework, in this code path, sometimes we need to acesss >>>> the address within the MCFG space directly to trigger kind of HW >>>> error, which behavior does not like the normal device driver's, >>>> in this case some possible unsafe operations (bypass the ecam >>>> ops) can be mitigated because the touched device will generate >>>> some HW errors and the RAS handling part will preempt its >>>> corresponding drivers to fix/log the HW error, that's my >>>> understanding about that. >>> >>>>>> Signed-off-by: Xuesong Chen <xuesong.chen@xxxxxxxxxxxxxxxxx> >>>>>> Reported-by: kernel test robot <lkp@xxxxxxxxx> >>>>>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx> >>>>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> >>>>>> Cc: James Morse <james.morse@xxxxxxx> >>>>>> Cc: Will Deacon <will@xxxxxxxxxx> >>>>>> Cc: Rafael. J. Wysocki <rafael@xxxxxxxxxx> >>>>>> Cc: Tony Luck <tony.luck@xxxxxxxxx> >>>>>> Cc: Tomasz Nowicki <tn@xxxxxxxxxxxx> >>>>>> --- >>>>>> arch/x86/pci/mmconfig-shared.c | 28 -------------------------- >>>>>> drivers/acpi/apei/apei-base.c | 45 ++++++++++++++++++++++++++++-------------- >>>>>> 2 files changed, 30 insertions(+), 43 deletions(-) >>>>>> >>>>>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c >>>>>> index 0b961fe6..12f7d96 100644 >>>>>> --- a/arch/x86/pci/mmconfig-shared.c >>>>>> +++ b/arch/x86/pci/mmconfig-shared.c >>>>>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header) >>>>>> return 0; >>>>>> } >>>>>> >>>>>> -#ifdef CONFIG_ACPI_APEI >>>>>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, >>>>>> - void *data), void *data); >>>>>> - >>>>>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size, >>>>>> - void *data), void *data) >>>>>> -{ >>>>>> - struct pci_mmcfg_region *cfg; >>>>>> - int rc; >>>>>> - >>>>>> - if (list_empty(&pci_mmcfg_list)) >>>>>> - return 0; >>>>>> - >>>>>> - list_for_each_entry(cfg, &pci_mmcfg_list, list) { >>>>>> - rc = func(cfg->res.start, resource_size(&cfg->res), data); >>>>>> - if (rc) >>>>>> - return rc; >>>>>> - } >>>>>> - >>>>>> - return 0; >>>>>> -} >>>>>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region) >>>>>> -#else >>>>>> -#define set_apei_filter() >>>>>> -#endif >>>>>> - >>>>>> static void __init __pci_mmcfg_init(int early) >>>>>> { >>>>>> pci_mmcfg_reject_broken(early); >>>>>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void) >>>>>> else >>>>>> acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); >>>>>> __pci_mmcfg_init(1); >>>>>> - >>>>>> - set_apei_filter(); >>>>>> } >>>>>> } >>>>>> >>>>>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c >>>>>> index c7fdb12..daae75a 100644 >>>>>> --- a/drivers/acpi/apei/apei-base.c >>>>>> +++ b/drivers/acpi/apei/apei-base.c >>>>>> @@ -21,6 +21,7 @@ >>>>>> #include <linux/kernel.h> >>>>>> #include <linux/module.h> >>>>>> #include <linux/init.h> >>>>>> +#include <linux/pci.h> >>>>>> #include <linux/acpi.h> >>>>>> #include <linux/slab.h> >>>>>> #include <linux/io.h> >>>>>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources) >>>>>> return acpi_nvs_for_each_region(apei_get_res_callback, resources); >>>>>> } >>>>>> >>>>>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, >>>>>> - void *data), void *data); >>>>>> -static int apei_get_arch_resources(struct apei_resources *resources) >>>>>> +#ifdef CONFIG_PCI >>>>>> +extern struct list_head pci_mmcfg_list; >>>>>> +static int apei_filter_mcfg_addr(struct apei_resources *res, >>>>>> + struct apei_resources *mcfg_res) >>>>>> +{ >>>>>> + int rc = 0; >>>>>> + struct pci_mmcfg_region *cfg; >>>>>> + >>>>>> + if (list_empty(&pci_mmcfg_list)) >>>>>> + return 0; >>>>>> + >>>>>> + apei_resources_init(mcfg_res); >>>>>> + list_for_each_entry(cfg, &pci_mmcfg_list, list) { >>>>>> + rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res)); >>>>>> + if (rc) >>>>>> + return rc; >>>>>> + } >>>>>> >>>>>> + /* filter the mcfg resource from current APEI's */ >>>>>> + return apei_resources_sub(res, mcfg_res); >>>>>> +} >>>>>> +#else >>>>>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res, >>>>>> + struct apei_resources *mcfg_res) >>>>>> { >>>>>> - return arch_apei_filter_addr(apei_get_res_callback, resources); >>>>>> + return 0; >>>>>> } >>>>>> +#endif >>>>>> >>>>>> /* >>>>>> * IO memory/port resource management mechanism is used to check >>>>>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources, >>>>>> if (rc) >>>>>> goto nvs_res_fini; >>>>>> >>>>>> - if (arch_apei_filter_addr) { >>>>>> - apei_resources_init(&arch_res); >>>>>> - rc = apei_get_arch_resources(&arch_res); >>>>>> - if (rc) >>>>>> - goto arch_res_fini; >>>>>> - rc = apei_resources_sub(resources, &arch_res); >>>>>> - if (rc) >>>>>> - goto arch_res_fini; >>>>>> - } >>>>>> + rc = apei_filter_mcfg_addr(resources, &arch_res); >>>>>> + if (rc) >>>>>> + goto arch_res_fini; >>>>>> >>>>>> rc = -EINVAL; >>>>>> list_for_each_entry(res, &resources->iomem, list) { >>>>>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources, >>>>>> release_mem_region(res->start, res->end - res->start); >>>>>> } >>>>>> arch_res_fini: >>>>>> - if (arch_apei_filter_addr) >>>>>> - apei_resources_fini(&arch_res); >>>>>> + apei_resources_fini(&arch_res); >>>>>> nvs_res_fini: >>>>>> apei_resources_fini(&nvs_resources); >>>>>> return rc; >>>>>> -- >>>>>> 1.8.3.1 >>>>>>