RE: [patch,v2 00/10] make I/O path allocations more numa-friendly

"Elliott, Robert (Server Storage)" <Elliott@xxxxxx> · Tue, 13 Nov 2012 01:26:04 +0000



What do these commands report about the NUMA and non-uniform IO topology on the test system?
	numactl --hardware
	lspci -t


> -----Original Message-----
> From: Jeff Moyer [mailto:jmoyer@xxxxxxxxxx]
> Sent: Monday, 12 November, 2012 3:27 PM
> To: Bart Van Assche
> Cc: Elliott, Robert (Server Storage); linux-scsi@xxxxxxxxxxxxxxx
> Subject: Re: [patch,v2 00/10] make I/O path allocations more numa-friendly
> 
> Bart Van Assche <bvanassche@xxxxxxx> writes:
> 
> > On 11/09/12 21:46, Jeff Moyer wrote:
> >>> On 11/06/12 16:41, Elliott, Robert (Server Storage) wrote:
> >>>> It's certainly better to tie them all to one node then let them be
> >>>> randomly scattered across nodes; your 6% observation may simply be
> >>>> from that.
> >>>>
> >>>> How do you think these compare, though (for structures that are per-IO)?
> >>>> - tying the structures to the node hosting the storage device
> >>>> - tying the structures to the node running the application
> >>
> >> This is a great question, thanks for asking it!  I went ahead and
> >> modified the megaraid_sas driver to take a module parameter that
> >> specifies on which node to allocate the scsi_host data structure (and
> >> all other structures on top that are tied to that).  I then booted the
> >> system 4 times, specifying a different node each time.  Here are the
> >> results as compared to a vanilla kernel:
> >>
> [snip]
> > Which NUMA node was processing the megaraid_sas interrupts in these
> > tests ? Was irqbalance running during these tests or were interrupts
> > manually pinned to a specific CPU core ?
> 
> irqbalanced was indeed running, so I can't say for sure what node the
> irq was pinned to during my tests (I didn't record that information).
> 
> I re-ran the tests, this time turning off irqbalance (well, I set it to
> one-shot), and the pinning the irq to the node running the benchmark.
> In this configuration, I saw no regressions in performance.
> 
> As a reminder:
> 
> >> The first number is the percent gain (or loss) w.r.t. the vanilla
> >> kernel.  The second number is the standard deviation as a percent of the
> >> bandwidth.  So, when data structures are tied to node 0, we see an
> >> increase in performance for nodes 0-3.  However, on node 3, which is the
> >> node the megaraid_sas controller is attached to, we see no gain in
> >> performance, and we see an increase in the run to run variation.  The
> >> standard deviation for the vanilla kernel was 1% across all nodes.
> 
> Here are the updated numbers:
> 
> data structures tied to node 0
> 
> application tied to:
> node 0:  0 +/-4%
> node 1:  9 +/-1%
> node 2: 10 +/-2%
> node 3:  0 +/-2%
> 
> data structures tied to node 1
> 
> application tied to:
> node 0:  5 +/-2%
> node 1:  6 +/-8%
> node 2: 10 +/-1%
> node 3:  0 +/-3%
> 
> data structures tied to node 2
> 
> application tied to:
> node 0:  6 +/-2%
> node 1:  9 +/-2%
> node 2:  7 +/-6%
> node 3:  0 +/-3%
> 
> data structures tied to node 3
> 
> application tied to:
> node 0:  0 +/-4%
> node 1: 10 +/-2%
> node 2: 11 +/-1%
> node 3:  0 +/-5%
> 
> Now, the above is apples to oranges, since the vanilla kernel was run
> w/o any tuning of irqs.  So, I went ahead and booted with
> numa_node_parm=-1, which is the same as vanilla, and re-ran the tests.
> 
> When we compare a vanilla kernel with and without irq binding, we get
> this:
> 
> node 0:  0 +/-3%
> node 1:  9 +/-1%
> node 2:  8 +/-3%
> node 3:  0 +/-1%
> 
> As you can see, binding irqs helps nodes 1 and 2 quite substantially.
> What this boils down to, when you compare a patched kernel with the
> vanilla kernel, where they are both tying irqs to the node hosting the
> application, is a net gain of zero, but an increase in standard
> deviation.
> 
> Let me try to make that more readable.  The patch set does not appear
> to help at all with my benchmark configuration.  ;-)  One other
> conclusion I can draw from this data is that irqbalance could do a
> better job.
> 
> An interesting (to me) tidbit about this hardware is that, while it has
> 4 numa nodes, it only has 2 sockets.  Based on the numbers above, I'd
> guess nodes 0 and 3 are in the same socket, likewise for 1 and 2.
> 
> Cheers,
> Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html