On Mon, Dec 21, 2020 at 11:25:22AM -0500, Liang Li wrote: > Creating a VM [64G RAM, 32 CPUs] with GPU passthrough > ===================================================== > QEMU use 4K pages, THP is off > round1 round2 round3 > w/o this patch: 23.5s 24.7s 24.6s > w/ this patch: 10.2s 10.3s 11.2s > > QEMU use 4K pages, THP is on > round1 round2 round3 > w/o this patch: 17.9s 14.8s 14.9s > w/ this patch: 1.9s 1.8s 1.9s > ===================================================== The cost of zeroing pages has to be paid somewhere. You've successfully moved it out of this path that you can measure. So now you've put it somewhere that you're not measuring. Why is this a win? > Speed up kernel routine > ======================= > This can’t be guaranteed because we don’t pre zero out all the free pages, > but is true for most case. It can help to speed up some important system > call just like fork, which will allocate zero pages for building page > table. And speed up the process of page fault, especially for huge page > fault. The POC of Hugetlb free page pre zero out has been done. Try kernbench with and without your patch.