On Sat, Feb 11, 2023 at 01:20:10AM +0800, Chih-En Lin wrote: > On Fri, Feb 10, 2023 at 11:21:16AM -0500, Pasha Tatashin wrote: > > > > > Currently, copy-on-write is only used for the mapped memory; the child > > > > > process still needs to copy the entire page table from the parent > > > > > process during forking. The parent process might take a lot of time and > > > > > memory to copy the page table when the parent has a big page table > > > > > allocated. For example, the memory usage of a process after forking with > > > > > 1 GB mapped memory is as follows: > > > > > > > > For some reason, I was not able to reproduce performance improvements > > > > with a simple fork() performance measurement program. The results that > > > > I saw are the following: > > > > > > > > Base: > > > > Fork latency per gigabyte: 0.004416 seconds > > > > Fork latency per gigabyte: 0.004382 seconds > > > > Fork latency per gigabyte: 0.004442 seconds > > > > COW kernel: > > > > Fork latency per gigabyte: 0.004524 seconds > > > > Fork latency per gigabyte: 0.004764 seconds > > > > Fork latency per gigabyte: 0.004547 seconds > > > > > > > > AMD EPYC 7B12 64-Core Processor > > > > Base: > > > > Fork latency per gigabyte: 0.003923 seconds > > > > Fork latency per gigabyte: 0.003909 seconds > > > > Fork latency per gigabyte: 0.003955 seconds > > > > COW kernel: > > > > Fork latency per gigabyte: 0.004221 seconds > > > > Fork latency per gigabyte: 0.003882 seconds > > > > Fork latency per gigabyte: 0.003854 seconds > > > > > > > > Given, that page table for child is not copied, I was expecting the > > > > performance to be better with COW kernel, and also not to depend on > > > > the size of the parent. > > > > > > Yes, the child won't duplicate the page table, but fork will still > > > traverse all the page table entries to do the accounting. > > > And, since this patch expends the COW to the PTE table level, it's not > > > the mapped page (page table entry) grained anymore, so we have to > > > guarantee that all the mapped page is available to do COW mapping in > > > the such page table. > > > This kind of checking also costs some time. > > > As a result, since the accounting and the checking, the COW PTE fork > > > still depends on the size of the parent so the improvement might not > > > be significant. > > > > The current version of the series does not provide any performance > > improvements for fork(). I would recommend removing claims from the > > cover letter about better fork() performance, as this may be > > misleading for those looking for a way to speed up forking. In my > > From v3 to v4, I changed the implementation of the COW fork() part to do Sorry, it's "RFC v2 to v3". > the accounting and checking. At the time, I also removed most of the > descriptions about the better fork() performance. Maybe it's not enough > and still has some misleading. I will fix this in the next version. > Thanks. Thanks, Chih-En Lin