Re: [PATCHv4 next 0/3] Limiting pci access

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Wed, 25 Jan 2017 15:17:18 -0600

Hi Austin,

Thanks a lot for bringing some more PCIe spec expertise.  I wish
we had more insight into what the spec authors were thinking
because I'm sure we often miss the point and cause face-palms on
the committee.

On Wed, Jan 25, 2017 at 12:44:34AM +0000, Austin.Bolen@xxxxxxxx wrote:
> To add to what Keith sent, I wanted to throw in my 2 cents on
> this issue.  Full disclosure - I'm not a Linux developer and so
> my thoughts on this are more from the PCIe Base Specification
> point of view and specifically, hardening the spec to better
> support PCIe hot-plug for storage use cases (NVMe) and
> providing better guidance in the PCIe spec on how system
> software can interact with the PCIe subsystem to better
> determine/handle the sort of errors that arise in PCIe removal
> scenarios.
> 
> From Dell EMC side, we would definitely like to see the config
> read routines in all operating environments return an error
> status code any time data is synthesized by the root port - but
> especially for failures due to device removal.  Today, if bad
> data is synthesized by a root port due to a failed config read,
> the config read routines in various operating environments do
> not return an error.  We, and others, have found cases where
> the caller of these config routines will consume the bad data
> since the API did not return an error due to this type of
> failure.  Most folks have been patching up every caller to
> check if data is synthesized after every config read.

How exactly do the callers do this?  By checking for all ones
data?  It's legal for a config register to contain all ones, of
course, so checking for all ones by itself is not definitive.  A
driver does have additional knowledge about what registers exist
and what their legal contents are, but of course, the config
accessor in the core does not.

> This is not very efficient as there are many entities that read
> config space (device driver, bus driver, systems management
> software, diagnostic software, etc.).   This patch, or similar,
> is very beneficial because it localizes the error checking for
> synthesized data to a single location in the core config read
> APIs rather than distributing this code to every caller.

This sounds like you mean checking for synthesized data in the
PCI core.  How can we figure that out?  Is there something in the
root port that can tell us the data was synthesized?

Is there a way to do this check for both conventional PCI and
PCIe, since we use the same config accessors for PCI and PCIe?

We support many host bridges and most of the time (x86 platforms
with ACPI) we have no bridge-specific support in Linux, so
anything we do in the core would have to be pretty generic.  I'm
sure this is all obvious, so I apologize for restating it.  I'm
just not sure what the concrete details should look like.

> The callers benefit by only requiring a simple check of the
> return status code from the API (which many are already doing).

Very roughly, I'd guess about 10% of current Linux callers check
the status code:

  $ git grep "pci_read_config" | wc -l
  2922
  $ git grep "pci_read_config" | egrep "if \(pci| = pci" | wc -l
  219

But I certainly agree that the current API is hard for callers to
deal with.

> Having the check for a removed device before reading config
> space will result in correctly returning an error for most
> failures of this nature.  There is a race condition even with
> this patch, however.  If the device is present when the check
> occurs, but is removed just after that and before the config
> read goes out, the config routine can still return bad data
> without an error status return code.  To eliminate this race
> condition where bad data is returned, I think that config read
> routines will need to have a check *after* reading the data in
> addition to the check *before* reading.  The check after
> reading would need to determine if the data was synthesized and
> return and error status if so.
> 
> I'm interested in hearing what others think on returning an
> error for this case in the core config read APIs or any
> alternative ways to more gracefully handle these types of
> errors.

> 
> Thanks,
> Austin
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html