* Mike Travis (travis@xxxxxxx) wrote: > Chris Wright wrote: > >OK, I was actually interested in the !pt case. But this is useful > >still. The iova lookup being distinct from the identity_mapping() case. > > I can get that as well, but having every device using maps caused it's > own set of problems (hundreds of dma maps). Here's a list of devices > on the system under test. You can see that even 'minor' glitches can > get magnified when there are so many... Yeah, I was focused on the overhead of actually mapping/unmapping an address in the non-pt case. > Blade Location NASID PCI Address X Display Device > ---------------------------------------------------------------------- > 0 r001i01b00 0 0000:01:00.0 - Intel 82576 Gigabit Network Connection > . . . 0000:01:00.1 - Intel 82576 Gigabit Network Connection > . . . 0000:04:00.0 - LSI SAS1064ET Fusion-MPT SAS > . . . 0000:05:00.0 - Matrox MGA G200e > 2 r001i01b02 4 0001:02:00.0 - Mellanox MT26428 InfiniBand > 3 r001i01b03 6 0002:02:00.0 - Mellanox MT26428 InfiniBand > 4 r001i01b04 8 0003:02:00.0 - Mellanox MT26428 InfiniBand > 11 r001i01b11 22 0007:02:00.0 - Mellanox MT26428 InfiniBand > 13 r001i01b13 26 0008:02:00.0 - Mellanox MT26428 InfiniBand > 15 r001i01b15 30 0009:07:00.0 :0.0 nVidia GF100 [Tesla S2050] > . . . 0009:08:00.0 :1.1 nVidia GF100 [Tesla S2050] > 18 r001i23b02 36 000b:02:00.0 - Mellanox MT26428 InfiniBand > 20 r001i23b04 40 000c:01:00.0 - Intel 82599EB 10-Gigabit Network Connection > . . . 000c:01:00.1 - Intel 82599EB 10-Gigabit Network Connection > . . . 000c:04:00.0 - Mellanox MT26428 InfiniBand > 23 r001i23b07 46 000d:07:00.0 - nVidia GF100 [Tesla S2050] > . . . 000d:08:00.0 - nVidia GF100 [Tesla S2050] > 25 r001i23b09 50 000e:01:00.0 - Intel 82599EB 10-Gigabit Network Connection > . . . 000e:01:00.1 - Intel 82599EB 10-Gigabit Network Connection > . . . 000e:04:00.0 - Mellanox MT26428 InfiniBand > 26 r001i23b10 52 000f:02:00.0 - Mellanox MT26428 InfiniBand > 27 r001i23b11 54 0010:02:00.0 - Mellanox MT26428 InfiniBand > 29 r001i23b13 58 0011:02:00.0 - Mellanox MT26428 InfiniBand > 31 r001i23b15 62 0012:02:00.0 - Mellanox MT26428 InfiniBand > 34 r002i01b02 68 0013:01:00.0 - Mellanox MT26428 InfiniBand > 35 r002i01b03 70 0014:02:00.0 - Mellanox MT26428 InfiniBand > 36 r002i01b04 72 0015:01:00.0 - Mellanox MT26428 InfiniBand > 41 r002i01b09 82 0018:07:00.0 - nVidia GF100 [Tesla S2050] > . . . 0018:08:00.0 - nVidia GF100 [Tesla S2050] > 43 r002i01b11 86 0019:01:00.0 - Mellanox MT26428 InfiniBand > 45 r002i01b13 90 001a:01:00.0 - Mellanox MT26428 InfiniBand > 48 r002i23b00 96 001c:07:00.0 - nVidia GF100 [Tesla S2050] > . . . 001c:08:00.0 - nVidia GF100 [Tesla S2050] > 50 r002i23b02 100 001d:02:00.0 - Mellanox MT26428 InfiniBand > 52 r002i23b04 104 001e:01:00.0 - Intel 82599EB 10-Gigabit Network Connection > . . . 001e:01:00.1 - Intel 82599EB 10-Gigabit Network Connection > . . . 001e:04:00.0 - Mellanox MT26428 InfiniBand > 57 r002i23b09 114 0020:01:00.0 - Intel 82599EB 10-Gigabit Network Connection > . . . 0020:01:00.1 - Intel 82599EB 10-Gigabit Network Connection > . . . 0020:04:00.0 - Mellanox MT26428 InfiniBand > 58 r002i23b10 116 0021:02:00.0 - Mellanox MT26428 InfiniBand > 59 r002i23b11 118 0022:02:00.0 - Mellanox MT26428 InfiniBand > 61 r002i23b13 122 0023:02:00.0 - Mellanox MT26428 InfiniBand > 63 r002i23b15 126 0024:02:00.0 - Mellanox MT26428 InfiniBand > > > > >>uv48-sys was receiving and uv-debug sending. > >>ksoftirqd/640 was running at approx. 100% cpu utilization. > >>I had pinned the nttcp process on uv48-sys to cpu 64. > >> > >># Samples: 1255641 > >># > >># Overhead Command Shared Object Symbol > >># ........ ............. ............. ...... > >># > >> 50.27%ESC[m ksoftirqd/640 [kernel] [k] _spin_lock > >> 27.43%ESC[m ksoftirqd/640 [kernel] [k] iommu_no_mapping > > > >>... > >> 0.48% ksoftirqd/640 [kernel] [k] iommu_should_identity_map > >> 0.45% ksoftirqd/640 [kernel] [k] ixgbe_alloc_rx_buffers [ > >>ixgbe] > > > >Note, ixgbe has had rx dma mapping issues (that's why I wondered what > >was causing the massive slowdown under !pt mode). > > I think since this profile run, the network guys updated the ixgbe > driver with a later version. (I don't know the outcome of that test.) OK. The ixgbe fix I was thinking of is in since 2.6.34: 43634e82 (ixgbe: Fix DMA mapping/unmapping issues when HWRSC is enabled on IOMMU enabled on IOMMU enabled kernels). > ><snip> > >>I tracked this time down to identity_mapping() in this loop: > >> > >> list_for_each_entry(info, &si_domain->devices, link) > >> if (info->dev == pdev) > >> return 1; > >> > >>I didn't get the exact count, but there was approx 11,000 PCI devices > >>on this system. And this function was called for every page request > >>in each DMA request. > > > >Right, so this is the list traversal (and wow, a lot of PCI devices). > > Most of the PCI devices were the 45 on each of 256 Nahalem sockets. > Also, there's a ton of bridges as well. > > >Did you try a smarter data structure? (While there's room for another > >bit in pci_dev, the bit is more about iommu implementation details than > >anything at the pci level). > > > >Or the domain_dev_info is cached in the archdata of device struct. > >You should be able to just reference that directly. > > > >Didn't think it through completely, but perhaps something as simple as: > > > > return pdev->dev.archdata.iommu == si_domain; > > I can try this, thanks! Err, I guess that'd be info = archdata.iommu; info->domain == si_domain (and probably need some sanity checking against things like DUMMY_DEVICE_DOMAIN_INFO). But you get the idea. thanks, -chris -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html