What do these commands report about the NUMA and non-uniform IO topology on the test system? numactl --hardware lspci -t > -----Original Message----- > From: Jeff Moyer [mailto:jmoyer@xxxxxxxxxx] > Sent: Monday, 12 November, 2012 3:27 PM > To: Bart Van Assche > Cc: Elliott, Robert (Server Storage); linux-scsi@xxxxxxxxxxxxxxx > Subject: Re: [patch,v2 00/10] make I/O path allocations more numa-friendly > > Bart Van Assche <bvanassche@xxxxxxx> writes: > > > On 11/09/12 21:46, Jeff Moyer wrote: > >>> On 11/06/12 16:41, Elliott, Robert (Server Storage) wrote: > >>>> It's certainly better to tie them all to one node then let them be > >>>> randomly scattered across nodes; your 6% observation may simply be > >>>> from that. > >>>> > >>>> How do you think these compare, though (for structures that are per-IO)? > >>>> - tying the structures to the node hosting the storage device > >>>> - tying the structures to the node running the application > >> > >> This is a great question, thanks for asking it! I went ahead and > >> modified the megaraid_sas driver to take a module parameter that > >> specifies on which node to allocate the scsi_host data structure (and > >> all other structures on top that are tied to that). I then booted the > >> system 4 times, specifying a different node each time. Here are the > >> results as compared to a vanilla kernel: > >> > [snip] > > Which NUMA node was processing the megaraid_sas interrupts in these > > tests ? Was irqbalance running during these tests or were interrupts > > manually pinned to a specific CPU core ? > > irqbalanced was indeed running, so I can't say for sure what node the > irq was pinned to during my tests (I didn't record that information). > > I re-ran the tests, this time turning off irqbalance (well, I set it to > one-shot), and the pinning the irq to the node running the benchmark. > In this configuration, I saw no regressions in performance. > > As a reminder: > > >> The first number is the percent gain (or loss) w.r.t. the vanilla > >> kernel. The second number is the standard deviation as a percent of the > >> bandwidth. So, when data structures are tied to node 0, we see an > >> increase in performance for nodes 0-3. However, on node 3, which is the > >> node the megaraid_sas controller is attached to, we see no gain in > >> performance, and we see an increase in the run to run variation. The > >> standard deviation for the vanilla kernel was 1% across all nodes. > > Here are the updated numbers: > > data structures tied to node 0 > > application tied to: > node 0: 0 +/-4% > node 1: 9 +/-1% > node 2: 10 +/-2% > node 3: 0 +/-2% > > data structures tied to node 1 > > application tied to: > node 0: 5 +/-2% > node 1: 6 +/-8% > node 2: 10 +/-1% > node 3: 0 +/-3% > > data structures tied to node 2 > > application tied to: > node 0: 6 +/-2% > node 1: 9 +/-2% > node 2: 7 +/-6% > node 3: 0 +/-3% > > data structures tied to node 3 > > application tied to: > node 0: 0 +/-4% > node 1: 10 +/-2% > node 2: 11 +/-1% > node 3: 0 +/-5% > > Now, the above is apples to oranges, since the vanilla kernel was run > w/o any tuning of irqs. So, I went ahead and booted with > numa_node_parm=-1, which is the same as vanilla, and re-ran the tests. > > When we compare a vanilla kernel with and without irq binding, we get > this: > > node 0: 0 +/-3% > node 1: 9 +/-1% > node 2: 8 +/-3% > node 3: 0 +/-1% > > As you can see, binding irqs helps nodes 1 and 2 quite substantially. > What this boils down to, when you compare a patched kernel with the > vanilla kernel, where they are both tying irqs to the node hosting the > application, is a net gain of zero, but an increase in standard > deviation. > > Let me try to make that more readable. The patch set does not appear > to help at all with my benchmark configuration. ;-) One other > conclusion I can draw from this data is that irqbalance could do a > better job. > > An interesting (to me) tidbit about this hardware is that, while it has > 4 numa nodes, it only has 2 sockets. Based on the numbers above, I'd > guess nodes 0 and 3 are in the same socket, likewise for 1 and 2. > > Cheers, > Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html