On 17/04/2023 16:44, David Hildenbrand wrote: >>>> >>>>> >>>>> >>>>> So what should be safe is replacing all sub-pages of a folio that are marked >>>>> "maybe shared" by a new folio under PT lock. However, I wonder if it's really >>>>> worth the complexity. For THP we were happy so far to *not* optimize this, >>>>> implying that maybe we shouldn't worry about optimizing the fork() case for >>>>> now >>>>> that heavily. >>>> >>>> I don't have the exact numbers to hand, but I'm pretty sure I remember enabling >>>> large copies was contributing a measurable amount to the performance >>>> improvement. (Certainly, the zero-page copy case, is definitely a big >>>> contributer). I don't have access to the HW at the moment but can rerun later >>>> with and without to double check. >>> >>> In which test exactly? Some micro-benchmark? >> >> The kernel compile benchmark that I quoted numbers for in the cover letter. I >> have some trace points (not part of the submitted series) that tell me how many >> mappings of each order we get for each code path. I'm pretty sure I remember all >> of these 4 code paths contributing non-negligible amounts. > > Interesting! It would be great to see if there is an actual difference after > patch #10 was applied without the other COW replacement. > I'll aim to get some formal numbers when I next have access to the HW.