Re: [PATCH v3 2/2] ACPI: APEI: Filter the PCI MCFG address with an arch-agnostic method

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 22/10/2021 00:57, Bjorn Helgaas wrote:
> On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote:
>> On 21/10/2021 02:50, Bjorn Helgaas wrote:
>>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote:
>>>> On 20/10/2021 03:23, Bjorn Helgaas wrote:
>>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote:
> 
>>>>>> This patch will try to handle this case in a more common way
>>>>>> instead of the original 'arch' specific solution, which will be
>>>>>> beneficial to all the APEI-dependent platforms after that.
>>>>>
>>>>> This actually doesn't say anything about what the patch does or
>>>>> how it works.  It says "handles this case in a more common way"
>>>>> but with no details.
>>>>
>>>> Good suggestion, I'll give more details about that...
>>>>
>>>>> The EINJ table contains "injection instructions" that can read
>>>>> or write "register regions" described by generic address
>>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and
>>>>> __einj_error_trigger() requests those register regions with
>>>>> request_mem_region() or request_region() before executing the
>>>>> injections instructions.
>>>>>
>>>>> IIUC, this patch basically says "if this region is part of the
>>>>> MCFG area, we don't need to reserve it." That leads to the
>>>>> questions of why we need to reserve *any* of the areas
>>>>
>>>> AFAIK, the MCFG area is reserved since the ECAM module will
>>>> provide a generic Kernel Programming Interfaces(KPI), e.g,
>>>> pci_generic_config_read(...), so all the drivers are allowed to
>>>> access the pci config space only by those KPIs in a consistent
>>>> and safe way, direct raw access will break the rule.  Correct me
>>>> if I am missing sth.
>>>>
>>>>> and why it's safe to simply skip reserving regions that are part
>>>>> of the MCFG area.
>>>>
>>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error
>>>> injection tolerance level") before to address this issue, the
>>>> entire commit log as below:
>>>>
>>>>     Some BIOSes utilize PCI MMCFG space read/write opertion to trigger
>>>>     specific errors. EINJ will report errors as below when hitting such
>>>>     cases:
>>>>     
>>>>     APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers
>>>>     
>>>>     It is because on x86 platform ACPI based PCI MMCFG logic has
>>>>     reserved all MMCFG spaces so that EINJ can't reserve it again.
>>>>     We already trust the ACPI/APEI code when using the EINJ interface
>>>>     so it is not a big leap to also trust it to access the right
>>>>     MMCFG addresses. Skip address checking to allow the access.
>>>
>>> I'm not really convinced by that justification because I don't
>>> think the issue here is *trust*.  If all we care about is trust,
>>> and we trust the ACPI/APEI code, why do we need to reserve
>>> anything at all when executing EINJ actions?
>>>
>>> I think the resource reservation issue is about coordinating
>>> multiple users of the address space.  A driver reserves the MMIO
>>> address space of a device it controls so no other driver can
>>> reserve it at the same time and cause conflicts.
>>>
>>> I'm not really convinced by this mutual exclusion argument either,
>>> because I haven't yet seen a situation where we say "EINJ needs a
>>> resource that's already in use by somebody else, so we can't use
>>> EINJ."  When conflicts arise, the response is always "we'll just
>>> stop reserving this conflicting resource but use it anyway."
>>>
>>> I think the only real value in apei_resources_request() is a
>>> little bit of documentation in /proc/iomem.  For ERST and EINJ,
>>> even that only lasts for the tiny period when we're actually
>>> executing an action.
>>>
>>> So convince me there's a reason why we shouldn't just remove
>>> apei_resources_request() completely :)
>>
>> I have to confess that currently I have no strong evidence/reason to
>> convince you that it's absolute safe to remove
>> apei_resources_request(),  probably in some conditions it *does*
>> require to follow the mutual exclusion usage model.  The ECAM/MCFG
>> maybe a special case not like other normal device driver, since all
>> its MCFG space has been reserved during the initialization. Anyway,
>> it's another topic and good point well worth discussing in the
>> future.
> 
> This is missing the point.  It's not the MCFG reservation during
> initialization that would make this safe.  What would make it safe is
> the fact that ECAM does not require mutual exclusion.
> 
> When the hardware implements ECAM correctly, PCI config accesses do
> not require locking because a config access requires a single MMIO
> load or store.
> 
I don't quite understand here, we're talking about apei_resources_request()
which is a mechanism to void resource conflict,"request_mem_region() tells the
kernel that your driver is going to use this range of I/O addresses, which
will prevent other drivers to make any overlapping call to the same region
through request_mem_region()", but according to the context of 'a single MMIO
load or store', are you talking about something like the mutex lock primitive?

> Many non-ECAM config accessors *do* require locking because they use
> several register accesses, e.g., the 0xCF8/0xCFC address/data pairs
> used by pci_conf1_read().  If EINJ actions used these, we would have
> to enforce mutual exclusion between EINJ config accesses and those
> done by other drivers.

I take a look at the pci_conf1_read() function, there's only a pair of
raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the
mutual exclusion you mentioned, seems it's not related to the
apei_resources_request() we're talking about... 

> 
> Some ARM64 platforms do not implement ECAM correctly, e.g.,
> tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus()
> sets an RTDID register before the MMIO load/store.  Platforms like
> this *do* require mutual exclusion between an EINJ config access and
> other config accesses.

What's the mutual exclusion for those quirk functions(tegra194 and xgene)?
*mutual* is not applied for single side. I can see neither locking nor
request_mem_region() in those bus map functions. 
> 
> These platforms are supported via quirks in pci_mcfg.c, so they will
> have resources in the pci_mcfg_list, and if we just ignore all the
> MCFG resources in apei_resources_request(), there will be nothing to
> prevent ordinary driver config accesses from being corrupted by EINJ
> accesses.
> 
> I think in general, is probably *is* safe to remove MCFG resources
> from the APEI reservations, but it would be better if we had some way
> to prevent EINJ from using MCFG on platforms like tegra194 and xgene.

Just as I mentioned, since there's no mutual exclusion applied for the
tegra194 and xgene(correct me if I am wrong), putting their MCFG resources
into the APEI reservation(so the apei_resources_request() applied) does nothing 

Thanks,
Xuesong
> 
>> From the patch set itself, I don't think it's a nice idea to make a
>> dramatic change regarding the apei_resources_request() part, I
>> suggest to keep the original rationale untouched and based on that
>> to fix the real issue at hand in a more generic way.
> 
> There *was* no original rationale.  The whole point of this
> conversation is to figure out what the real rationale is.
> 
>>>> Except that the above explanation, IMO the EINJ is only a RAS
>>>> debug framework, in this code path, sometimes we need to acesss
>>>> the address within the MCFG space directly to trigger kind of HW
>>>> error, which behavior does not like the normal device driver's,
>>>> in this case some possible unsafe operations (bypass the ecam
>>>> ops) can be mitigated because the touched device will generate
>>>> some HW errors and the RAS handling part will preempt its
>>>> corresponding drivers to fix/log the HW error, that's my
>>>> understanding about that.
>>>
>>>>>> Signed-off-by: Xuesong Chen <xuesong.chen@xxxxxxxxxxxxxxxxx>
>>>>>> Reported-by: kernel test robot <lkp@xxxxxxxxx>
>>>>>> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@xxxxxxx>
>>>>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>>>>> Cc: James Morse <james.morse@xxxxxxx>
>>>>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>>>>> Cc: Rafael. J. Wysocki <rafael@xxxxxxxxxx>
>>>>>> Cc: Tony Luck <tony.luck@xxxxxxxxx>
>>>>>> Cc: Tomasz Nowicki <tn@xxxxxxxxxxxx>
>>>>>> ---
>>>>>>  arch/x86/pci/mmconfig-shared.c | 28 --------------------------
>>>>>>  drivers/acpi/apei/apei-base.c  | 45 ++++++++++++++++++++++++++++--------------
>>>>>>  2 files changed, 30 insertions(+), 43 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c
>>>>>> index 0b961fe6..12f7d96 100644
>>>>>> --- a/arch/x86/pci/mmconfig-shared.c
>>>>>> +++ b/arch/x86/pci/mmconfig-shared.c
>>>>>> @@ -605,32 +605,6 @@ static int __init pci_parse_mcfg(struct acpi_table_header *header)
>>>>>>  	return 0;
>>>>>>  }
>>>>>>  
>>>>>> -#ifdef CONFIG_ACPI_APEI
>>>>>> -extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data);
>>>>>> -
>>>>>> -static int pci_mmcfg_for_each_region(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data)
>>>>>> -{
>>>>>> -	struct pci_mmcfg_region *cfg;
>>>>>> -	int rc;
>>>>>> -
>>>>>> -	if (list_empty(&pci_mmcfg_list))
>>>>>> -		return 0;
>>>>>> -
>>>>>> -	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>>>> -		rc = func(cfg->res.start, resource_size(&cfg->res), data);
>>>>>> -		if (rc)
>>>>>> -			return rc;
>>>>>> -	}
>>>>>> -
>>>>>> -	return 0;
>>>>>> -}
>>>>>> -#define set_apei_filter() (arch_apei_filter_addr = pci_mmcfg_for_each_region)
>>>>>> -#else
>>>>>> -#define set_apei_filter()
>>>>>> -#endif
>>>>>> -
>>>>>>  static void __init __pci_mmcfg_init(int early)
>>>>>>  {
>>>>>>  	pci_mmcfg_reject_broken(early);
>>>>>> @@ -665,8 +639,6 @@ void __init pci_mmcfg_early_init(void)
>>>>>>  		else
>>>>>>  			acpi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
>>>>>>  		__pci_mmcfg_init(1);
>>>>>> -
>>>>>> -		set_apei_filter();
>>>>>>  	}
>>>>>>  }
>>>>>>  
>>>>>> diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
>>>>>> index c7fdb12..daae75a 100644
>>>>>> --- a/drivers/acpi/apei/apei-base.c
>>>>>> +++ b/drivers/acpi/apei/apei-base.c
>>>>>> @@ -21,6 +21,7 @@
>>>>>>  #include <linux/kernel.h>
>>>>>>  #include <linux/module.h>
>>>>>>  #include <linux/init.h>
>>>>>> +#include <linux/pci.h>
>>>>>>  #include <linux/acpi.h>
>>>>>>  #include <linux/slab.h>
>>>>>>  #include <linux/io.h>
>>>>>> @@ -448,13 +449,34 @@ static int apei_get_nvs_resources(struct apei_resources *resources)
>>>>>>  	return acpi_nvs_for_each_region(apei_get_res_callback, resources);
>>>>>>  }
>>>>>>  
>>>>>> -int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size,
>>>>>> -				     void *data), void *data);
>>>>>> -static int apei_get_arch_resources(struct apei_resources *resources)
>>>>>> +#ifdef CONFIG_PCI
>>>>>> +extern struct list_head pci_mmcfg_list;
>>>>>> +static int apei_filter_mcfg_addr(struct apei_resources *res,
>>>>>> +			struct apei_resources *mcfg_res)
>>>>>> +{
>>>>>> +	int rc = 0;
>>>>>> +	struct pci_mmcfg_region *cfg;
>>>>>> +
>>>>>> +	if (list_empty(&pci_mmcfg_list))
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	apei_resources_init(mcfg_res);
>>>>>> +	list_for_each_entry(cfg, &pci_mmcfg_list, list) {
>>>>>> +		rc = apei_res_add(&mcfg_res->iomem, cfg->res.start, resource_size(&cfg->res));
>>>>>> +		if (rc)
>>>>>> +			return rc;
>>>>>> +	}
>>>>>>  
>>>>>> +	/* filter the mcfg resource from current APEI's */
>>>>>> +	return apei_resources_sub(res, mcfg_res);
>>>>>> +}
>>>>>> +#else
>>>>>> +static inline int apei_filter_mcfg_addr(struct apei_resources *res,
>>>>>> +			struct apei_resources *mcfg_res)
>>>>>>  {
>>>>>> -	return arch_apei_filter_addr(apei_get_res_callback, resources);
>>>>>> +	return 0;
>>>>>>  }
>>>>>> +#endif
>>>>>>  
>>>>>>  /*
>>>>>>   * IO memory/port resource management mechanism is used to check
>>>>>> @@ -486,15 +508,9 @@ int apei_resources_request(struct apei_resources *resources,
>>>>>>  	if (rc)
>>>>>>  		goto nvs_res_fini;
>>>>>>  
>>>>>> -	if (arch_apei_filter_addr) {
>>>>>> -		apei_resources_init(&arch_res);
>>>>>> -		rc = apei_get_arch_resources(&arch_res);
>>>>>> -		if (rc)
>>>>>> -			goto arch_res_fini;
>>>>>> -		rc = apei_resources_sub(resources, &arch_res);
>>>>>> -		if (rc)
>>>>>> -			goto arch_res_fini;
>>>>>> -	}
>>>>>> +	rc = apei_filter_mcfg_addr(resources, &arch_res);
>>>>>> +	if (rc)
>>>>>> +		goto arch_res_fini;
>>>>>>  
>>>>>>  	rc = -EINVAL;
>>>>>>  	list_for_each_entry(res, &resources->iomem, list) {
>>>>>> @@ -544,8 +560,7 @@ int apei_resources_request(struct apei_resources *resources,
>>>>>>  		release_mem_region(res->start, res->end - res->start);
>>>>>>  	}
>>>>>>  arch_res_fini:
>>>>>> -	if (arch_apei_filter_addr)
>>>>>> -		apei_resources_fini(&arch_res);
>>>>>> +	apei_resources_fini(&arch_res);
>>>>>>  nvs_res_fini:
>>>>>>  	apei_resources_fini(&nvs_resources);
>>>>>>  	return rc;
>>>>>> -- 
>>>>>> 1.8.3.1
>>>>>>



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux