On Fri, Jan 19, 2024 at 8:51 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Fri 19-01-24 10:03:05, Lance Yang wrote: > > Hey Michal, > > > > Thanks for taking the time to review! > > > > On Thu, Jan 18, 2024 at 9:40 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > On Thu 18-01-24 20:03:46, Lance Yang wrote: > > > [...] > > > > > > before we discuss the semantic, let's focus on the usecase. > > > > > > > Use Cases > > > > > > > > An immediate user of this new functionality is the Go runtime heap allocator > > > > that manages memory in hugepage-sized chunks. In the past, whether it was a > > > > newly allocated chunk through mmap() or a reused chunk released by > > > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with > > > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3] > > > > respectively. However, both approaches resulted in performance issues; for > > > > both scenarios, there could be entries into direct reclaim and/or compaction, > > > > leading to unpredictable stalls[4]. Now, the allocator can confidently use > > > > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages. > > > > > > IIUC the primary reason is the cost of the huge page allocation which > > > can be really high if the memory is heavily fragmented and it is called > > > synchronously from the process directly, correct? Can that be worked > > > > Yes, that's correct. > > > > > around by process_madvise and performing the operation from a different > > > context? Are there any other reasons to have a different mode? > > > > In latency-sensitive scenarios, some applications aim to enhance performance > > by utilizing huge pages as much as possible. At the same time, in case of > > allocation failure, they prefer a quick return without triggering direct memory > > reclamation and compaction. > > Could you elaborate some more on why? > > > > I mean I can think of a more relaxed (opportunistic) MADV_COLLAPSE - > > > e.g. non blocking one to make sure that the caller doesn't really block > > > on resource contention (be it locks or memory availability) because that > > > matches our non-blocking interface in other areas but having a LIGHT > > > operation sounds really vague and the exact semantic would be > > > implementation specific and might change over time. Non-blocking has a > > > clear semantic but it is not really clear whether that is what you > > > really need/want. > > > > Could you provide me with some suggestions regarding the naming of a > > more relaxed (opportunistic) MADV_COLLAPSE? > > Naming is not all that important at this stage (it could be > MADV_COLLAPSE_NOBLOCK for example). The primary question is whether > non-blocking in general is the desired behavior or the implementation > should try but not too hard. Hey Michal, Thanks for your suggestion! It seems that the implementation should try but not too hard aligns well with my desired behavior. Non-blocking in general is also a great idea. Perhaps in the future, we can add a MADV_F_COLLAPSE_NOBLOCK flag for scenarios where latency is extremely critical. Thanks again, Lance > > -- > Michal Hocko > SUSE Labs