RE: [PATCHv4 next 0/3] Limiting pci access

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn,

You're welcome and likewise, the PCIe spec will only benefit by having a better understanding of the constraints and concerns of the operating systems that utilize PCIe so thank you as well for your feedback from the OS and PCIe driver side.

I know of two ways that are being discussed on how to handle this in a generic way that don't require a priori knowledge of what the register value should be.  Both have tradeoffs.  It may be that neither are acceptable from OS standpoint or both are. I'm interested in your perspective and if neither are acceptable, then it will be great to get your thoughts on if there is anything that could be added to the PCIe Base Spec to provide the OS with an acceptable way to handle these types of errors.

1. Check Vendor ID Register
If 1's are returned on a config read then read the Vendor ID.  If the Vendor ID also returns 1's (which it will if the device is removed) then we know with certainty it is an error.  On removal events, we see this check working in practice.

How the check above can fail:  It is possible (though we've not been able to replicate this failure IRL) that a device could be removed causing all 1's to be returned for the data and by the time the error checking code in the config read API reads the Vendor ID, a new device has been inserted and Vendor ID returns a valid value (the config read API would need to be really slow or get suspended or otherwise stalled for a large amount of time). An operating system could probably add some sort of co-ordination between this error checking code in the config space API and the operating system's PCIe hot-plug driver (perhaps using dev->is_removed or the like as the proxy) to cover this corner case and make this bullet-proof but I'm not sure if it's necessary.

This mechanism would work for PCI and PCIe.

OSes could choose to not handle this in the core config space APIs and require all software entities that may access a device's config space (PCI bus driver/device driver/systems management software/diagnostic utilities/etc.) to perform this check (what we do today).

2. Use the Root Port Programmed IO (RP PIO) mechanism
The PCIe Spec architected way is to use RP PIO extensions which were added in the PCIe 3.1 specification (see section 6.2.10.3 and 7.31) as part of the Enhanced Downstream Port Containment ECN.  This ECN added status bits in root ports for the various errors (Unsupported Request, Completer Abort, Completion Timeout) that could occur due to a failed non-posted request (Configuration Space access, MMIO access, IO access), TLP Header Logs to determine the which device is associated with the failure, and a synchronous exception mechanism to notify the operating system.  This mechanism was intended to solve all corner cases for these types of failures.

This feature won't be available in products for some time and is optional so all root ports may not implement it.

This feature is PCIe only so won't work for PCI.  Is this a deal-breaker?

Unlike option 1, this mechanism most likely cannot be used by anyone but the OS so this work cannot be pushed to the caller.  There would be no way to co-ordinate ownership of the RP PIO registers amongst the many software entities that may access a device's config space (PCI bus driver/device driver/systems management software/diagnostic utilities/etc.)

Thanks,
Austin


-----Original Message-----
From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx] 
Sent: Wednesday, January 25, 2017 3:17 PM
To: Bolen, Austin <Austin_Bolen@xxxxxxxx>
Cc: keith.busch@xxxxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; lukas@xxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; bhelgaas@xxxxxxxxxx; wzhang@xxxxxx
Subject: Re: [PATCHv4 next 0/3] Limiting pci access

Hi Austin,

Thanks a lot for bringing some more PCIe spec expertise.  I wish we had more insight into what the spec authors were thinking because I'm sure we often miss the point and cause face-palms on the committee.

On Wed, Jan 25, 2017 at 12:44:34AM +0000, Austin.Bolen@xxxxxxxx wrote:
> To add to what Keith sent, I wanted to throw in my 2 cents on this 
> issue.  Full disclosure - I'm not a Linux developer and so my thoughts 
> on this are more from the PCIe Base Specification point of view and 
> specifically, hardening the spec to better support PCIe hot-plug for 
> storage use cases (NVMe) and providing better guidance in the PCIe 
> spec on how system software can interact with the PCIe subsystem to 
> better determine/handle the sort of errors that arise in PCIe removal 
> scenarios.
> 
> From Dell EMC side, we would definitely like to see the config read 
> routines in all operating environments return an error status code any 
> time data is synthesized by the root port - but especially for 
> failures due to device removal.  Today, if bad data is synthesized by 
> a root port due to a failed config read, the config read routines in 
> various operating environments do not return an error.  We, and 
> others, have found cases where the caller of these config routines 
> will consume the bad data since the API did not return an error due to 
> this type of failure.  Most folks have been patching up every caller 
> to check if data is synthesized after every config read.

How exactly do the callers do this?  By checking for all ones data?  It's legal for a config register to contain all ones, of course, so checking for all ones by itself is not definitive.  A driver does have additional knowledge about what registers exist and what their legal contents are, but of course, the config accessor in the core does not.

> This is not very efficient as there are many entities that read config 
> space (device driver, bus driver, systems management
> software, diagnostic software, etc.).   This patch, or similar,
> is very beneficial because it localizes the error checking for 
> synthesized data to a single location in the core config read APIs 
> rather than distributing this code to every caller.

This sounds like you mean checking for synthesized data in the PCI core.  How can we figure that out?  Is there something in the root port that can tell us the data was synthesized?

Is there a way to do this check for both conventional PCI and PCIe, since we use the same config accessors for PCI and PCIe?

We support many host bridges and most of the time (x86 platforms with ACPI) we have no bridge-specific support in Linux, so anything we do in the core would have to be pretty generic.  I'm sure this is all obvious, so I apologize for restating it.  I'm just not sure what the concrete details should look like.

> The callers benefit by only requiring a simple check of the return 
> status code from the API (which many are already doing).

Very roughly, I'd guess about 10% of current Linux callers check the status code:

  $ git grep "pci_read_config" | wc -l
  2922
  $ git grep "pci_read_config" | egrep "if \(pci| = pci" | wc -l
  219

But I certainly agree that the current API is hard for callers to deal with.

> Having the check for a removed device before reading config space will 
> result in correctly returning an error for most failures of this 
> nature.  There is a race condition even with this patch, however.  If 
> the device is present when the check occurs, but is removed just after 
> that and before the config read goes out, the config routine can still 
> return bad data without an error status return code.  To eliminate 
> this race condition where bad data is returned, I think that config 
> read routines will need to have a check *after* reading the data in 
> addition to the check *before* reading.  The check after reading would 
> need to determine if the data was synthesized and return and error 
> status if so.
> 
> I'm interested in hearing what others think on returning an error for 
> this case in the core config read APIs or any alternative ways to more 
> gracefully handle these types of errors.

> 
> Thanks,
> Austin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux