On Tue, Feb 28, 2017 at 10:56:57AM -0500, Pasha Tatashin wrote: > Also, for consideration, machines are getting bigger, and 2G is becoming > very small compared to the memory sizes, so some algorithms can become > inefficient when they have to artificially limit memcpy()s to 2G chunks. ... what algorithms are deemed "inefficient" when they take a break every 2 billion bytes to, ohidon'tknow, check to see that a higher priority process doesn't want the CPU? > X6-8 scales up to 6T: > http://www.oracle.com/technetwork/database/exadata/exadata-x6-8-ds-2968796.pdf > > SPARC M7-16 scales up to 16T: > http://www.oracle.com/us/products/servers-storage/sparc-m7-16-ds-2687045.pdf > > 2G is just 0.012% of the total memory size on M7-16. Right, so suppose you're copying half the memory to the other half of memory. Let's suppose it takes a hundred extra instructions every 2GB to check that nobody else wants the CPU and dive back into the memcpy code. That's 800,000 additional instructions. Which even on a SPARC CPU is going to execute in less than 0.001 second. CPU memory bandwidth is on the order of 100GB/s, so the overall memcpy is going to take about 160 seconds. You'd have far more joy dividing the work up into 2GB chunks and distributing the work to N CPU packages (... not hardware threads ...) than you would trying to save a millisecond by allowing the CPU to copy more than 2GB at a time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>