On Fri, Jan 31, 2025 at 12:47:24PM -0500, Liam R. Howlett wrote: > * Davidlohr Bueso <dave@xxxxxxxxxxxx> [250131 12:31]: > > On Fri, 31 Jan 2025, Lorenzo Stoakes wrote: > > > > > On Thu, Jan 16, 2025 at 05:30:58PM -0800, SeongJae Park wrote: > > > > Optimize redundant mmap lock operations from process_madvise() by > > > > directly doing the mmap locking first, and then the remaining works for > > > > all ranges in the loop. > > > > > > > > Signed-off-by: SeongJae Park <sj@xxxxxxxxxx> > > > > > > I wonder if this might increase lock contention because now all of the > > > vector operations will hold the relevant mm lock without releasing after > > > each operation? > > > > That was exactly my concern. While afaict the numbers presented in v1 > > are quite nice, this is ultimately a micro-benchmark, where no other > > unrelated threads are impacted by these new hold times. > > Indeed, I was also concerned about this scenario. > > But this method does have the added advantage of keeping the vma space > in the same state as it was expected during the initial call - although > the race does still exist on looking vs acting on the data. This would > just remove the intermediate changes. > > > > > > Probably it's ok given limited size of iov, but maybe in future we'd want > > > to set a limit on the ranges before we drop/reacquire lock? > > > > imo, this should best be done in the same patch/series. Maybe extend > > the benchmark to use IOV_MAX and find a sweet spot? > > Are you worried this is over-engineering for a problem that may never be > an issue, or is there a particular usecase you have in mind? > > It is probably worth investigating, and maybe a potential usecase would > help with the targeted sweet spot? > > Lorenzo already explained that it is not an issue at the moment. I think this is good discussion to have as I think we will be expanding the limit from UIO_FASTIOV to higher value (maybe UIO_MAXIOV) soon in the followup. At the moment, my gut feeling is that batch size of regions given to the syscall will depend on the advise parameter. For example, For MADV_[NO]HUGEPAGE which is a simple flag [re]set, a single write lock and possible a single tree traversal will be better than multiple write lock/unlock operations even for large batch size. Anyways we will need some experimental data to show that. JFYI SJ is planning to work on two improvements: (1) single tree traversal for all the given regions and (2) TLB flush batching for MADV_DONTNEED[_LOCKED] and MADV_FREE. Thanks everyone for your time and feedback.