On Fri, Oct 22, 2021 at 05:52:15PM +0800, Xuesong Chen wrote: > On 22/10/2021 00:57, Bjorn Helgaas wrote: > > On Thu, Oct 21, 2021 at 11:46:40PM +0800, Xuesong Chen wrote: > >> On 21/10/2021 02:50, Bjorn Helgaas wrote: > >>> On Wed, Oct 20, 2021 at 11:16:38AM +0800, Xuesong Chen wrote: > >>>> On 20/10/2021 03:23, Bjorn Helgaas wrote: > >>>>> On Tue, Oct 19, 2021 at 12:50:33PM +0800, Xuesong Chen wrote: > > > >>>>>> This patch will try to handle this case in a more common way > >>>>>> instead of the original 'arch' specific solution, which will be > >>>>>> beneficial to all the APEI-dependent platforms after that. > >>>>> > >>>>> This actually doesn't say anything about what the patch does or > >>>>> how it works. It says "handles this case in a more common way" > >>>>> but with no details. > >>>> > >>>> Good suggestion, I'll give more details about that... > >>>> > >>>>> The EINJ table contains "injection instructions" that can read > >>>>> or write "register regions" described by generic address > >>>>> structures (see ACPI v6.3, sec 18.6.2 and 18.6.3), and > >>>>> __einj_error_trigger() requests those register regions with > >>>>> request_mem_region() or request_region() before executing the > >>>>> injections instructions. > >>>>> > >>>>> IIUC, this patch basically says "if this region is part of the > >>>>> MCFG area, we don't need to reserve it." That leads to the > >>>>> questions of why we need to reserve *any* of the areas > >>>> > >>>> AFAIK, the MCFG area is reserved since the ECAM module will > >>>> provide a generic Kernel Programming Interfaces(KPI), e.g, > >>>> pci_generic_config_read(...), so all the drivers are allowed to > >>>> access the pci config space only by those KPIs in a consistent > >>>> and safe way, direct raw access will break the rule. Correct me > >>>> if I am missing sth. > >>>> > >>>>> and why it's safe to simply skip reserving regions that are part > >>>>> of the MCFG area. > >>>> > >>>> Actual there is a commit d91525eb8ee6("ACPI, EINJ: Enhance error > >>>> injection tolerance level") before to address this issue, the > >>>> entire commit log as below: > >>>> > >>>> Some BIOSes utilize PCI MMCFG space read/write opertion to trigger > >>>> specific errors. EINJ will report errors as below when hitting such > >>>> cases: > >>>> > >>>> APEI: Can not request [mem 0x83f990a0-0x83f990a3] for APEI EINJ Trigger registers > >>>> > >>>> It is because on x86 platform ACPI based PCI MMCFG logic has > >>>> reserved all MMCFG spaces so that EINJ can't reserve it again. > >>>> We already trust the ACPI/APEI code when using the EINJ interface > >>>> so it is not a big leap to also trust it to access the right > >>>> MMCFG addresses. Skip address checking to allow the access. > >>> > >>> I'm not really convinced by that justification because I don't > >>> think the issue here is *trust*. If all we care about is trust, > >>> and we trust the ACPI/APEI code, why do we need to reserve > >>> anything at all when executing EINJ actions? > >>> > >>> I think the resource reservation issue is about coordinating > >>> multiple users of the address space. A driver reserves the MMIO > >>> address space of a device it controls so no other driver can > >>> reserve it at the same time and cause conflicts. > >>> > >>> I'm not really convinced by this mutual exclusion argument either, > >>> because I haven't yet seen a situation where we say "EINJ needs a > >>> resource that's already in use by somebody else, so we can't use > >>> EINJ." When conflicts arise, the response is always "we'll just > >>> stop reserving this conflicting resource but use it anyway." > >>> > >>> I think the only real value in apei_resources_request() is a > >>> little bit of documentation in /proc/iomem. For ERST and EINJ, > >>> even that only lasts for the tiny period when we're actually > >>> executing an action. > >>> > >>> So convince me there's a reason why we shouldn't just remove > >>> apei_resources_request() completely :) > >> > >> I have to confess that currently I have no strong evidence/reason to > >> convince you that it's absolute safe to remove > >> apei_resources_request(), probably in some conditions it *does* > >> require to follow the mutual exclusion usage model. The ECAM/MCFG > >> maybe a special case not like other normal device driver, since all > >> its MCFG space has been reserved during the initialization. Anyway, > >> it's another topic and good point well worth discussing in the > >> future. > > > > This is missing the point. It's not the MCFG reservation during > > initialization that would make this safe. What would make it safe is > > the fact that ECAM does not require mutual exclusion. > > > > When the hardware implements ECAM correctly, PCI config accesses do > > not require locking because a config access requires a single MMIO > > load or store. > > I don't quite understand here, we're talking about > apei_resources_request() which is a mechanism to void resource > conflict,"request_mem_region() tells the kernel that your driver is > going to use this range of I/O addresses, which will prevent other > drivers to make any overlapping call to the same region through > request_mem_region()", but according to the context of 'a single > MMIO load or store', are you talking about something like the mutex > lock primitive? My point was that when ECAM is implemented correctly, a CPU does a single MMIO load to do a PCI config read and a single MMIO store to do a PCI config write. In that case there no need for any locking, so there's no need for APEI to reserve those resources. This is what d91525eb8ee6 ("ACPI, EINJ: Enhance error injection tolerance level") does. That code change makes sense, but the commit log does not -- it has nothing to do with trusting the ACPI/APEI code; it's just that no matter what the EINJ actions do with the MCFG regions, they cannot interfere with other drivers. > > Many non-ECAM config accessors *do* require locking because they use > > several register accesses, e.g., the 0xCF8/0xCFC address/data pairs > > used by pci_conf1_read(). If EINJ actions used these, we would have > > to enforce mutual exclusion between EINJ config accesses and those > > done by other drivers. > > I take a look at the pci_conf1_read() function, there's only a pair of > raw_spin_lock_irqsave() and raw_spin_unlock_irqrestore(), if that's the > mutual exclusion you mentioned, seems it's not related to the > apei_resources_request() we're talking about... This was an example of a case where EINJ mutual exclusion *would* be required. I do not expect EINJ actions to use the 0xCF8/0xCFC registers because there is no mechanism to coordinate that with the OS use of the same registers. > > Some ARM64 platforms do not implement ECAM correctly, e.g., > > tegra194_map_bus() programs an outbound ATU and xgene_pcie_map_bus() > > sets an RTDID register before the MMIO load/store. Platforms like > > this *do* require mutual exclusion between an EINJ config access and > > other config accesses. > > What's the mutual exclusion for those quirk functions (tegra194 and > xgene)? *mutual* is not applied for single side. I can see neither > locking nor request_mem_region() in those bus map functions. These currently depend on the pci_lock. See PCI_OP_READ() in drivers/pci/access.c. EINJ actions cannot acquire the pci_lock, so EINJ actions cannot safely use ECAM space on those platforms. > > These platforms are supported via quirks in pci_mcfg.c, so they will > > have resources in the pci_mcfg_list, and if we just ignore all the > > MCFG resources in apei_resources_request(), there will be nothing to > > prevent ordinary driver config accesses from being corrupted by EINJ > > accesses. > > > > I think in general, is probably *is* safe to remove MCFG resources > > from the APEI reservations, but it would be better if we had some way > > to prevent EINJ from using MCFG on platforms like tegra194 and xgene. > > Just as I mentioned, since there's no mutual exclusion applied for > the tegra194 and xgene (correct me if I am wrong), putting their MCFG > resources into the APEI reservation (so the apei_resources_request() > applied) does nothing I think apei_resources_request() should continue to reserve MCFG areas on tegra194 and xgene, but it does not need to reserve them on other ARM64 platforms. Bjorn