Re: [PATCH 3/3] tile pci: enable IOMMU to support DMA for legacy devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the slow reply to your feedback; I had to coordinate with our
primary PCI developer (in another timezone) and we both had various
unrelated fires to fight along the way.

I've appended the patch that corrects all the issues you reported. Bjorn,
I'm assuming that it's appropriate for me to push this change through the
tile tree (along with all the infrastructural changes to support the
TILE-Gx TRIO shim that implements PCIe for our chip) rather than breaking
it out to push it through the pci tree; does that sound correct to you?

On 6/22/2012 7:24 AM, Bjorn Helgaas wrote:
> On Fri, Jun 15, 2012 at 1:23 PM, Chris Metcalf <cmetcalf@xxxxxxxxxx> wrote:
>> This change uses the TRIO IOMMU to map the PCI DMA space and physical
>> memory at different addresses.  We also now use the dma_mapping_ops
>> to provide support for non-PCI DMA, PCIe DMA (64-bit) and legacy PCI
>> DMA (32-bit).  We use the kernel's software I/O TLB framework
>> (i.e. bounce buffers) for the legacy 32-bit PCI device support since
>> there are a limited number of TLB entries in the IOMMU and it is
>> non-trivial to handle indexing, searching, matching, etc.  For 32-bit
>> devices the performance impact of bounce buffers should not be a concern.
>>
>>
>> +extern void
>> +pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
>> +                       struct resource *res);
>> +
>> +extern void
>> +pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
>> +                       struct pci_bus_region *region);
> These extern declarations look like leftovers that shouldn't be needed.

Thanks. Removed.

>> +/* PCI I/O space support is not implemented. */
>> +static struct resource pci_ioport_resource = {
>> +       .name   = "PCI IO",
>> +       .start  = 0,
>> +       .end    = 0,
>> +       .flags  = IORESOURCE_IO,
>> +};
> You don't need to define pci_ioport_resource at all if you don't
> support I/O space.

We have some internal changes to support I/O space, but for now I've gone
ahead and removed pci_ioport_resource.

>> +               /*
>> +                * The PCI memory resource is located above the PA space.
>> +                * The memory range for the PCI root bus should not overlap
>> +                * with the physical RAM
>> +                */
>> +               pci_add_resource_offset(&resources, &iomem_resource,
>> +                                       1ULL << CHIP_PA_WIDTH());
> This says that your entire physical address space (currently
> 0x0-0xffffffff_ffffffff) is routed to the PCI bus, which is not true.
> I think what you want here is pci_iomem_resource, but I'm not sure
> that's set up correctly.  It should contain the CPU physical address
> that are routed to the PCI bus.  Since you mention an offset, the PCI
> bus addresses will "CPU physical address - offset".

Yes, we've changed it to use pci_iomem_resource. On TILE-Gx, there are two
types of CPU physical addresses: physical RAM addresses and MMIO addresses.
The MMIO address has the MMIO attribute in the page table. So, the physical
address spaces for the RAM and the PCI are completely separate. Instead, we
have the following relationship: PCI bus address = PCI resource address -
offset, where the PCI resource addresses are defined by pci_iomem_resource
and they are never generated by the CPU.

> I don't understand the CHIP_PA_WIDTH() usage -- that seems to be the
> physical address width, but you define TILE_PCI_MEM_END as "((1ULL <<
> CHIP_PA_WIDTH()) + TILE_PCI_BAR_WINDOW_TOP)", which would mean the CPU
> could never generate that address.

Exactly. The CPU-generated physical addresses for the PCI space, i.e. the
MMIO addresses, have an address format that is defined by the RC
controller. They go to the RC controller directly, because the page table
entry also encodes the RC controller’s location on the chip.

> I might understand this better if you could give a concrete example of
> the CPU address range and the corresponding PCI bus address range.
> For example, I have a box where CPU physical address range [mem
> 0xf0000000000-0xf007edfffff] is routed to PCI bus address range
> [0x80000000-0xfedfffff].  In this case, the struct resource contains
> 0xf0000000000-0xf007edfffff, and the offset is 0xf0000000000 -
> 0x80000000 or 0xeff80000000.

The TILE-Gx chip’s CHIP_PA_WIDTH is 40-bit. In the following example, the
system has 32GB RAM installed, with 16GB in each of the 2 memory
controllers. For the first mvsas device, its PCI memory resource is
[0x100c0000000, 0x100c003ffff], the corresponding PCI bus address range is
[0xc0000000, 0xc003ffff] after subtracting the offset of (1ul << 40). The
aforementioned PCI MMIO address’s low 32-bits contains the PCI bus address.

# cat /proc/iomem
00000000-3fbffffff : System RAM
00000000-007eeb1f : Kernel code
00860000-00af6e4b : Kernel data
4000000000-43ffffffff : System RAM
100c0000000-100c003ffff : mvsas
100c0040000-100c005ffff : mvsas
100c0200000-100c0203fff : sky2
100c0300000-100c0303fff : sata_sil24
100c0304000-100c030407f : sata_sil24
100c0400000-100c0403fff : sky2

Note that in above example, the 2 mvsas devices are in a separate PCI
domain than the other 4 devices.

> The comments at TILE_PCI_MEM_MAP_BASE_OFFSET suggest that you have two
> MMIO regions (one for bus addresses <4GB), so there should be two
> resources on the list here.

There is a single MMIO region, defined by the corresponding resource
pci_iomem_resource. The TILE_PCI_MEM_MAP_BASE_OFFSET is used in the context
of inbound access only, i.e. for DMA access. Yes, there are two inbound
windows. First is [1ULL << CHIP_PA_WIDTH(), 1ULL << CHIP_PA_WIDTH() + 1],
used by devices that can generate 64-bit DMA addresses. The HW IOMMU is
used to derive the real RAM address by subtracting 1ULL << CHIP_PA_WIDTH()
from the DMA address. The second inbound window is [0, 3GB] with direct
mapping, used by 32-bit devices, where 3GB = 4GB – MMIO_region.

> The list should also include a bus number resource describing the bus
> numbers claimed by the host bridge.  Since you don't have that, we'll
> default to [bus 00-ff], but that's wrong if you have more than one
> host bridge.

Fixed.

> In fact, since it appears that you *do* have multiple host bridges,
> the "resources" list should be constructed so it contains the bus
> number and MMIO apertures for each bridge, which should be
> non-overlapping.

We use the same pci_iomem_resource for different domains or host bridges,
but the MMIO apertures for each bridge do not overlap because
non-overlapping resource ranges are allocated for each domains.

>>  void __devinit pcibios_fixup_bus(struct pci_bus *bus)
>>  {
>> -       /* Nothing needs to be done. */
>> +       struct pci_dev *dev = bus->self;
>> +
>> +       if (!dev) {
>> +               /* This is the root bus. */
>> +               bus->resource[0] = &pci_ioport_resource;
>> +               bus->resource[1] = &pci_iomem_resource;
>> +       }
> Please don't add this.  I'm in the process of removing
> pcibios_fixup_bus() altogether.  Instead, you should put
> pci_iomem_resource on a resources list and use pci_scan_root_bus().

I removed the contents of pcibios_fixup_bus(), but leaving the no-op
function in for now, until after the 3.6 merge.

>>  /*
>> - * We reserve all resources above 4GB so that PCI won't try to put
>> + * On Pro, we reserve all resources above 4GB so that PCI won't try to put
>>  * mappings above 4GB; the standard allows that for some devices but
>>  * the probing code trunates values to 32 bits.
> I think this comment about probing code truncating values is out of
> date.  Or if it's not, please point me to it so we can fix it :)

Yes, it's out of date; fixed.

>> @@ -1588,7 +1585,7 @@ static int __init request_standard_resources(void)
>>        enum { CODE_DELTA = MEM_SV_INTRPT - PAGE_OFFSET };
>>
>>        iomem_resource.end = -1LL;
> This patch isn't touching iomem_resource, but iomem_resource.end
> *should* be set to the highest physical address your CPU can generate,
> which is probably smaller than this.

This is not necessarily true. True on x86 where the PA space is shared by
the RAM and the PCI. On TILE-Gx, iomem_resource covers all resources of
type IORESOURCE_MEM, which include the RAM resource and the PCI resource.
On the other hand, setting it here is not necessary because it is set to -1
in iomem_resource’s definition in kernel/resource.c.

The change follows.

commit d52776fade4dadf0b034d101f0cd4ce4f8d2f48f
Author: Chris Metcalf <cmetcalf@xxxxxxxxxx>
Date:   Sun Jul 1 14:42:49 2012 -0400

    tile: updates to pci root complex from community feedback

diff --git a/arch/tile/include/asm/pci.h b/arch/tile/include/asm/pci.h
index 553b7ff..93a1f14 100644
--- a/arch/tile/include/asm/pci.h
+++ b/arch/tile/include/asm/pci.h
@@ -161,6 +161,7 @@ struct pci_controller {

        uint64_t mem_offset;    /* cpu->bus memory mapping offset. */

+       int first_busno;
        int last_busno;

        struct pci_ops *ops;
@@ -179,14 +180,6 @@ extern gxio_trio_context_t trio_contexts[TILEGX_NUM_TRIO];

 extern void pci_iounmap(struct pci_dev *dev, void __iomem *);

-extern void
-pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
-                       struct resource *res);
-
-extern void
-pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
-                       struct pci_bus_region *region);
-
 /*
  * The PCI address space does not equal the physical memory address
  * space (we have an IOMMU). The IDE and SCSI device layers use this
diff --git a/arch/tile/kernel/pci_gx.c b/arch/tile/kernel/pci_gx.c
index 27f7ab0..56a3c97 100644
--- a/arch/tile/kernel/pci_gx.c
+++ b/arch/tile/kernel/pci_gx.c
@@ -96,14 +96,6 @@ static struct pci_ops tile_cfg_ops;
 /* Mask of CPUs that should receive PCIe interrupts. */
 static struct cpumask intr_cpus_map;

-/* PCI I/O space support is not implemented. */
-static struct resource pci_ioport_resource = {
-       .name   = "PCI IO",
-       .start  = 0,
-       .end    = 0,
-       .flags  = IORESOURCE_IO,
-};
-
 static struct resource pci_iomem_resource = {
        .name   = "PCI mem",
        .start  = TILE_PCI_MEM_START,
@@ -588,6 +580,7 @@ int __init pcibios_init(void)
 {
        resource_size_t offset;
        LIST_HEAD(resources);
+       int next_busno;
        int i;

        tile_pci_init();
@@ -628,7 +621,7 @@ int __init pcibios_init(void)
        msleep(250);

        /* Scan all of the recorded PCI controllers.  */
-       for (i = 0; i < num_rc_controllers; i++) {
+       for (next_busno = 0, i = 0; i < num_rc_controllers; i++) {
                struct pci_controller *controller = &pci_controllers[i];
                gxio_trio_context_t *trio_context = controller->trio;
                TRIO_PCIE_INTFC_PORT_CONFIG_t port_config;
@@ -843,13 +836,14 @@ int __init pcibios_init(void)
                 * The memory range for the PCI root bus should not overlap
                 * with the physical RAM
                 */
-               pci_add_resource_offset(&resources, &iomem_resource,
+               pci_add_resource_offset(&resources, &pci_iomem_resource,
                                        1ULL << CHIP_PA_WIDTH());

-               bus = pci_scan_root_bus(NULL, 0, controller->ops,
+               controller->first_busno = next_busno;
+               bus = pci_scan_root_bus(NULL, next_busno, controller->ops,
                                        controller, &resources);
                controller->root_bus = bus;
-               controller->last_busno = bus->subordinate;
+               next_busno = bus->subordinate + 1;

        }

@@ -1011,20 +1005,9 @@ alloc_mem_map_failed:
 }
 subsys_initcall(pcibios_init);

-/*
- * PCI scan code calls the arch specific pcibios_fixup_bus() each time it scans
- * a new bridge. Called after each bus is probed, but before its children are
- * examined.
- */
+/* Note: to be deleted after Linux 3.6 merge. */
 void __devinit pcibios_fixup_bus(struct pci_bus *bus)
 {
-       struct pci_dev *dev = bus->self;
-
-       if (!dev) {
-               /* This is the root bus. */
-               bus->resource[0] = &pci_ioport_resource;
-               bus->resource[1] = &pci_iomem_resource;
-       }
 }

 /*
@@ -1172,11 +1155,11 @@ static int __devinit tile_cfg_read(struct pci_bus *bus,
        void *mmio_addr;

        /*
-        * Map all accesses to the local device (bus == 0) into the
+        * Map all accesses to the local device on root bus into the
         * MMIO space of the MAC. Accesses to the downstream devices
         * go to the PIO space.
         */
-       if (busnum == 0) {
+       if (pci_is_root_bus(bus)) {
                if (device == 0) {
                        /*
                         * This is the internal downstream P2P bridge,
@@ -1205,11 +1188,11 @@ static int __devinit tile_cfg_read(struct pci_bus *bus,
        }

        /*
-        * Accesses to the directly attached device (bus == 1) have to be
+        * Accesses to the directly attached device have to be
         * sent as type-0 configs.
         */

-       if (busnum == 1) {
+       if (busnum == (controller->first_busno + 1)) {
                /*
                 * There is only one device off of our built-in P2P bridge.
                 */
@@ -1303,11 +1286,11 @@ static int __devinit tile_cfg_write(struct pci_bus *bus,
        u8 val_8 = (u8)val;

        /*
-        * Map all accesses to the local device (bus == 0) into the
+        * Map all accesses to the local device on root bus into the
         * MMIO space of the MAC. Accesses to the downstream devices
         * go to the PIO space.
         */
-       if (busnum == 0) {
+       if (pci_is_root_bus(bus)) {
                if (device == 0) {
                        /*
                         * This is the internal downstream P2P bridge,
@@ -1336,11 +1319,11 @@ static int __devinit tile_cfg_write(struct pci_bus *bus,
        }

        /*
-        * Accesses to the directly attached device (bus == 1) have to be
+        * Accesses to the directly attached device have to be
         * sent as type-0 configs.
         */

-       if (busnum == 1) {
+       if (busnum == (controller->first_busno + 1)) {
                /*
                 * There is only one device off of our built-in P2P bridge.
                 */
diff --git a/arch/tile/kernel/setup.c b/arch/tile/kernel/setup.c
index 2b8b689..ea930ba 100644
--- a/arch/tile/kernel/setup.c
+++ b/arch/tile/kernel/setup.c
@@ -1536,8 +1536,7 @@ static struct resource code_resource = {

 /*
  * On Pro, we reserve all resources above 4GB so that PCI won't try to put
- * mappings above 4GB; the standard allows that for some devices but
- * the probing code trunates values to 32 bits.
+ * mappings above 4GB.
  */
 #if defined(CONFIG_PCI) && !defined(__tilegx__)
 static struct resource* __init

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux