On 02/19/2019 06:26 PM, Matthew Wilcox wrote: > On Tue, Feb 19, 2019 at 01:12:07PM +0530, Anshuman Khandual wrote: >> But the location of this temp page matters as well because you would like to >> saturate the inter node interface. It needs to be either of the nodes where >> the source or destination page belongs. Any other node would generate two >> internode copy process which is not what you intend here I guess. > That makes no sense. It should be allocated on the local node of the CPU > performing the copy. If the CPU is in node A, the destination is in node B > and the source is in node C, then you're doing 4k worth of reads from node C, > 4k worth of reads from node B, 4k worth of writes to node C followed by > 4k worth of writes to node B. Eventually the 4k of dirty cachelines on > node A will be written back from cache to the local memory (... or not, > if that page gets reused for some other purpose first). > > If you allocate the page on node B or node C, that's an extra 4k of writes > to be sent across the inter-node link. Thats right there will be an extra remote write. My assumption was that the CPU performing the copy belongs to either node B or node C.