On Mon, Jul 24, 2023 at 04:56:27PM +0200, Jesper Dangaard Brouer wrote: > These massive throughput numbers are important, because they *exceed* > the physical host RAM/DIMM memory speeds. That's right, this HW is all designed to use the high memory bandwidth of the parallel GPUs. The CPU must not be involved in the data movement. If you look at the reference block diagrams for a DGX-H100 you can see that the each GPU is directly wired to a 400Gb/s NIC, and there is even software that allows the GPU to directly operate its attached NIC interface. Each of the 8 GPUs in the block diagram has 800 Gb/s full duplex RDMA, and 7200 Gb/sec full duplex on the nvlink interconnect directly connected to it. The GPU is the center of all the interconnect in these systems. Jason