Re: [PATCH V2] PCI: dwc ep: cache config until DBI regs available

Vidya Sagar <vidyas@xxxxxxxxxx> · Fri, 25 Oct 2019 19:20:54 +0530

On 1/8/2019 5:35 PM, Kishon Vijay Abraham I wrote:
Hi Stephen,

On 04/01/19 1:32 PM, Kishon Vijay Abraham I wrote:
Hi Stephen,

On 02/01/19 10:04 PM, Stephen Warren wrote:
On 12/19/18 7:37 AM, Kishon Vijay Abraham I wrote:
Hi,

On 14/12/18 10:31 PM, Stephen Warren wrote:
On 12/11/18 10:23 AM, Stephen Warren wrote:
On 12/10/18 9:36 PM, Kishon Vijay Abraham I wrote:
Hi,

On 27/11/18 4:39 AM, Stephen Warren wrote:
From: Stephen Warren <swarren@xxxxxxxxxx>

Some implementations of the DWC PCIe endpoint controller do not allow
access to DBI registers until the attached host has started REFCLK,
released PERST, and the endpoint driver has initialized clocking of the
DBI registers based on that. One such system is NVIDIA's T194 SoC. The
PCIe endpoint subsystem and DWC driver currently don't work on such
hardware, since they assume that all endpoint configuration can happen
at any arbitrary time.

Enhance the DWC endpoint driver to support such systems by caching all
endpoint configuration in software, and only writing the configuration
to hardware once it's been initialized. This is implemented by splitting
all endpoint controller ops into two functions; the first which simply
records/caches the desired configuration whenever called by the
associated function driver and optionally calls the second, and the
second which actually programs the configuration into hardware, which
may be called either by the first function, or later when it's known
that the DBI registers are available.

diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
b/drivers/pci/controller/dwc/pcie-designware-ep.c

+void dw_pcie_set_regs_available(struct dw_pcie *pci)
+{

When will this function be invoked? Does the wrapper get an interrupt when
refclk is enabled where this function will be invoked?

Yes, there's an IRQ from the HW that indicates when PEXRST is released. I
don't recall right now if this IRQ is something that exists for all DWC
instantiations, or is Tegra-specific.

+    struct dw_pcie_ep *ep = &(pci->ep);
+    int i;
+
+    ep->hw_regs_not_available = false;

This can race with epc_ops.

+
+    dw_pcie_ep_write_header_regs(ep);
+    for_each_set_bit(i, ep->ib_window_map, ep->num_ib_windows) {
+        dw_pcie_prog_inbound_atu(pci, i,
+            ep->cached_inbound_atus[i].bar,
+            ep->cached_inbound_atus[i].cpu_addr,
+            ep->cached_inbound_atus[i].as_type);

Depending on the context in which this function is invoked, programming
inbound/outbound ATU can also race with EPC ops.
   >
+        dw_pcie_ep_set_bar_regs(ep, 0, ep->cached_inbound_atus[i].bar);
+    }
+    for_each_set_bit(i, ep->ob_window_map, ep->num_ob_windows)
+        dw_pcie_prog_outbound_atu(pci, i, PCIE_ATU_TYPE_MEM,
+            ep->cached_outbound_atus[i].addr,
+            ep->cached_outbound_atus[i].pci_addr,
+            ep->cached_outbound_atus[i].size);
+    dw_pcie_dbi_ro_wr_en(pci);
+    dw_pcie_writew_dbi(pci, PCI_MSI_FLAGS, ep->cached_msi_flags);
+    dw_pcie_writew_dbi(pci, PCI_MSIX_FLAGS, ep->cached_msix_flags);
+    dw_pcie_dbi_ro_wr_dis(pci);

IMHO we should add a new epc ops ->epc_init() which indicates if the EPC is
ready to be initialized or not. Only if the epc_init indicates it's ready
to be
initialized, the endpoint function driver should go ahead with further
initialization. Or else it should wait for a notification from EPC to
indicate
when it's ready to be initialized.

(Did you mean epf op or epc op?)

I'm not sure how exactly how that would work; do you want the DWC core driver
or the endpoint subsystem to poll that epc op to find out when the HW is
ready to be initialized? Or do you envisage the controller driver still
calling dw_pcie_set_regs_available() (possibly renamed), which in turn calls
->epc_init() calls for some reason?

If you don't want to cache the endpoint configuration, perhaps you want:

a) Endpoint function doesn't pro-actively call the endpoint controller
functions to configure the endpoint.

b) When endpoint HW is ready, the relevant driver calls pci_epc_ready() (or
whatever name), which lets the core know the HW can be configured. Perhaps
this schedules a work queue item to implement locking to avoid the races you
mentioned.

c) Endpoint core calls pci_epf_init() which calls epf op ->init().

One gotcha with this approach, which the caching approach helps avoid:

Once PEXRST is released, the system must respond to PCIe enumeration requests
within 50ms. Thus, SW must very quickly respond to the IRQ indicating PEXRST
release and program the endpoint configuration into HW. By caching the
configuration in the DWC driver and immediately/synchronously applying it in
the PEXRST IRQ handler, we reduce the number of steps and amount of code
taken to program the HW, so it should get done pretty quickly. If instead we
call back into the endpoint function driver's ->init() op, we run the risk of
that op doing other stuff besides just calling the endpoint HW configuration
APIs (e.g. perhaps the function driver defers memory buffer allocation or
IOVA programming to that ->init function) which in turns makes it much less
likely the 50ms requirement will be hit. Perhaps we can solve this by naming
the op well and providing lots of comments, but my guess is that endpoint
function authors won't notice that...

Kishon,

Do you have any further details exactly how you'd prefer this to work? Does the
approach I describe in points a/b/c above sound like what you want? Thanks.

Agree with your PERST comment.

What I have in mind is we add a new epc_init() ops. I feel there are more uses
for it (For e.g I have an internal patch which uses epc_init to initialize DMA.
Hopefully I'll post it soon).
If you look at pci_epf_test, pci_epf_test_bind() is where the function actually
starts to write to HW (i.e using pci_epc_*).
So before the endpoint function invokes pci_epc_write_header(), it should
invoke epc_init(). Only if that succeeds, it should go ahead with other
initialization.
If epc_init_* fails, we can have a particular error value to indicate the
controller is waiting for clock from host (so that we don't return error from
->bind()). Once the controller receives the clock, it can send an atomic
notification to the endpoint function driver to indicate it is ready to be
initialized. (Atomic notification makes it easy to handle for multi function
endpoint devices.)
The endpoint function can then initialize the controller.
I think except for pci_epf_test_alloc_space() all other functions are
configuring the HW (in pci_epf_test_bind). pci_epf_test_alloc_space() could be
moved to pci_epf_test_probe() so there are no expensive operations to be done
once the controller is ready to be initialized.
I have epc_init() and the atomic notification part already implemented and I'm
planning to post it before next week. Once that is merged, we might have to
reorder function in pci_epf_test driver and you have to return the correct
error value for epc_init() if the clock is not there.

Kishon, did you manage to post the patches that implement epc_init()? If so, a
link would be appreciated. Thanks.

I haven't posted the patches yet. Sorry for the delay. Give me some more time
please (till next week).

I have posted one set of cleanup for EPC features [1] by introducing
epc_get_features(). Some of the things I initially thought should be in
epc_init actually fits in epc_get_features. However I still believe for your
usecase we should introduce ->epc_init().

Hi Kishon,
Do you have a Work-In-Progress patch set for ->epc_init()?
If not, I would like to start working on that.

Thanks,
Vidya Sagar

Thanks
Kishon

[1] -> https://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1891393.html

Thanks
Kishon