On Thu, Jan 18, 2024 at 10:59 PM Zach O'Keefe <zokeefe@xxxxxxxxxx> wrote: > > On Thu, Jan 18, 2024 at 5:43 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > Dang, forgot to cc linux-api... > > > > On Thu 18-01-24 14:40:19, Michal Hocko wrote: > > > On Thu 18-01-24 20:03:46, Lance Yang wrote: > > > [...] > > > > > > before we discuss the semantic, let's focus on the usecase. > > > > > > > Use Cases > > > > > > > > An immediate user of this new functionality is the Go runtime heap allocator > > > > that manages memory in hugepage-sized chunks. In the past, whether it was a > > > > newly allocated chunk through mmap() or a reused chunk released by > > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with > > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3] > > > > respectively. However, both approaches resulted in performance issues; for > > > > both scenarios, there could be entries into direct reclaim and/or compaction, > > > > leading to unpredictable stalls[4]. Now, the allocator can confidently use > > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages. > > Aside: The thought was a MADV_F_COLLAPSE_LIGHT _flag_; so it'd be > process_madvise(..., MADV_COLLAPSE, MADV_F_COLLAPSE_LIGHT) I apologize for the misunderstanding. I will provide the correct implementation in version 3. BR, Lance > > > > IIUC the primary reason is the cost of the huge page allocation which > > > can be really high if the memory is heavily fragmented and it is called > > > synchronously from the process directly, correct? Can that be worked > > > around by process_madvise and performing the operation from a different > > > context? Are there any other reasons to have a different mode? > > > > > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE - > > > e.g. non blocking one to make sure that the caller doesn't really block > > > on resource contention (be it locks or memory availability) because that > > > matches our non-blocking interface in other areas but having a LIGHT > > > operation sounds really vague and the exact semantic would be > > > implementation specific and might change over time. Non-blocking has a > > > clear semantic but it is not really clear whether that is what you > > > really need/want. > > IIUC, usecase from Go is unbounded latency due to sync compaction in a > context where the latency is unacceptable. Working w/ them to > understand how things can be improved -- it's possible the changes can > occur entirely on their side, w/o any additional kernel support. > > The non-blocking case awkwardly sits between MADV_COLLAPSE today, and > khugepaged; esp when common case is that the allocation can probably > be satisfied in fast path. > > The suggestion for something like "LIGHT" was intentionally vague > because it could allow for other optimizations / changes down the > line, as you point out. I think that might be a win, vs tying to a > specific optimization (e.g. like a MADV_F_COLLAPSE_NODEFRAG). But I > could be alone on that front, given the design of > /sys/kernel/mm/transparent_hugepage. > > But circling back, I agree w/ you that the first order of business is to > iron out a real usecase. As of right now, it's not clear something > like this is required or helpful. > > Thanks, > Zach > > > > > > > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404faca29a82689c77 > > > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181268b60a3a > > > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9bd2af > > > > [4] https://github.com/golang/go/issues/63334 > > > > > > > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@xxxxxxxxx/ > > > -- > > > Michal Hocko > > > SUSE Labs > > > > -- > > Michal Hocko > > SUSE Labs