On Fri, Nov 24, 2023 at 11:07:51AM -0500, Peter Xu wrote: > On Fri, Nov 24, 2023 at 09:06:01AM +0000, Ryan Roberts wrote: > > I don't have any micro-benchmarks for GUP though, if that's your question. Is > > there an easy-to-use test I can run to get some numbers? I'd be happy to try it out. > > Thanks Ryan. Then nothing is needed to be tested if gup is not yet touched > from your side, afaict. I'll see whether I can provide some rough numbers > instead in the next post (I'll probably only be able to test it in a VM, > though, but hopefully that should still reflect mostly the truth). An update: I finished a round of 64K cont_pte test, in the slow gup micro benchmark I see ~15% perf degrade with this patchset applied on a VM on top of Apple M1. Frankly that's even less than I expected, considering not only how slow gup THP used to be, but also on the fact that that's a tight loop over slow gup, which in normal cases shouldn't happen: "present" ptes normally goes to fast-gup, while !present goes into a fault following it. I assume that's why nobody cared slow gup for THP before. I think adding cont_pte support shouldn't be very hard, but that will include making cont_pte idea global just for arm64 and riscv Svnapot. The current plan is I'll add that performance number into my commit message only, as I don't ever expect any real workload will regress with it. Maybe a global cont_pte api will be needed at some point, but perhaps not yet feel strongly for this use case. Please feel free to raise any concerns otherwise. Thanks, -- Peter Xu