Re: [PATCH v1 1/3] sparc64: NG4 memset/memcpy 32 bits overflow

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 28 Feb 2017 10:59:14 -0800

On Tue, Feb 28, 2017 at 10:56:57AM -0500, Pasha Tatashin wrote:
> Also, for consideration, machines are getting bigger, and 2G is becoming
> very small compared to the memory sizes, so some algorithms can become
> inefficient when they have to artificially limit memcpy()s to 2G chunks.

... what algorithms are deemed "inefficient" when they take a break every
2 billion bytes to, ohidon'tknow, check to see that a higher priority
process doesn't want the CPU?

> X6-8 scales up to 6T:
> http://www.oracle.com/technetwork/database/exadata/exadata-x6-8-ds-2968796.pdf
> 
> SPARC M7-16 scales up to 16T:
> http://www.oracle.com/us/products/servers-storage/sparc-m7-16-ds-2687045.pdf
> 
> 2G is just 0.012% of the total memory size on M7-16.

Right, so suppose you're copying half the memory to the other half of
memory.  Let's suppose it takes a hundred extra instructions every 2GB to
check that nobody else wants the CPU and dive back into the memcpy code.
That's 800,000 additional instructions.  Which even on a SPARC CPU is
going to execute in less than 0.001 second.  CPU memory bandwidth is
on the order of 100GB/s, so the overall memcpy is going to take about
160 seconds.

You'd have far more joy dividing the work up into 2GB chunks and
distributing the work to N CPU packages (... not hardware threads
...) than you would trying to save a millisecond by allowing the CPU to
copy more than 2GB at a time.
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html