Barry Song <21cnbao@xxxxxxxxx> writes: > On Mon, Jun 24, 2024 at 3:44 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> Barry Song <21cnbao@xxxxxxxxx> writes: >> >> > On Fri, Jun 21, 2024 at 9:24 PM Huang, Ying <ying.huang@xxxxxxxxx> wrote: >> >> >> >> Barry Song <21cnbao@xxxxxxxxx> writes: >> >> >> >> > On Fri, Jun 21, 2024 at 7:25 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: >> >> >> >> >> >> On 20/06/2024 12:34, David Hildenbrand wrote: >> >> >> > On 20.06.24 11:04, Ryan Roberts wrote: >> >> >> >> On 20/06/2024 01:26, Barry Song wrote: >> >> >> >>> From: Barry Song <v-songbaohua@xxxxxxxx> >> >> >> >>> >> >> >> >>> Both Ryan and Chris have been utilizing the small test program to aid >> >> >> >>> in debugging and identifying issues with swap entry allocation. While >> >> >> >>> a real or intricate workload might be more suitable for assessing the >> >> >> >>> correctness and effectiveness of the swap allocation policy, a small >> >> >> >>> test program presents a simpler means of understanding the problem and >> >> >> >>> initially verifying the improvements being made. >> >> >> >>> >> >> >> >>> Let's endeavor to integrate it into the self-test suite. Although it >> >> >> >>> presently only accommodates 64KB and 4KB, I'm optimistic that we can >> >> >> >>> expand its capabilities to support multiple sizes and simulate more >> >> >> >>> complex systems in the future as required. >> >> >> >> >> >> >> >> I'll try to summarize the thread with Huang Ying by suggesting this test program >> >> >> >> is "neccessary but not sufficient" to exhaustively test the mTHP swap-out path. >> >> >> >> I've certainly found it useful and think it would be a valuable addition to the >> >> >> >> tree. >> >> >> >> >> >> >> >> That said, I'm not convinced it is a selftest; IMO a selftest should provide a >> >> >> >> clear pass/fail result against some criteria and must be able to be run >> >> >> >> automatically by (e.g.) a CI system. >> >> >> > >> >> >> > Likely we should then consider moving other such performance-related thingies >> >> >> > out of the selftests? >> >> >> >> >> >> Yes, that would get my vote. But of the 4 tests you mentioned that use >> >> >> clock_gettime(), it looks like transhuge-stress is the only one that doesn't >> >> >> have a pass/fail result, so is probably the only candidate for moving. >> >> >> >> >> >> The others either use the times as a timeout and determines failure if the >> >> >> action didn't occur within the timeout (e.g. ksm_tests.c) or use it to add some >> >> >> supplemental performance information to an otherwise functionality-oriented test. >> >> > >> >> > Thank you very much, Ryan. I think you've found a better home for this >> >> > tool . I will >> >> > send v2, relocating it to tools/mm and adding a function to swap in >> >> > either the whole >> >> > mTHPs or a portion of mTHPs by "-a"(aligned swapin). >> >> > >> >> > So basically, we will have >> >> > >> >> > 1. Use MADV_PAGEPUT for rapid swap-out, putting the swap allocation code under >> >> > high exercise in a short time. >> >> > >> >> > 2. Use MADV_DONTNEED to simulate the behavior of libc and Java heap in freeing >> >> > memory, as well as for munmap, app exits, or OOM killer scenarios. This ensures >> >> > new mTHP is always generated, released or swapped out, similar to the behavior >> >> > on a PC or Android phone where many applications are frequently started and >> >> > terminated. >> >> >> >> MADV_DONTNEED 64KB memory, then memset() it, this just simulates the >> >> large folio swap-in exactly, which hasn't been merged by upstream. I >> >> don't think that it's a good idea to make such kind of trick. >> > >> > I disagree. This is how userspace heaps can manage memory >> > deallocation. >> >> Sorry, I don't understand how. Can you show some examples? Such as >> strace log with 64KB aligned MADV_DONTNEED? > > In Java heap and memory allocators such as jemalloc and Scudo, memory is freed > using the MADV_DONTNEED flag when either free() is called or garbage collection > occurs. In Android, the Java heap is freed in chunks aligned to 64KB > or larger. Originally, I heard about that MADV_FREE is used by jemalloc. Now, I know that they use MADV_DONTNEED too. Thanks! Although I still suspect that libc/java allocator will free pages in exact 64KB size (IIUC, they should free pages in much larger trunk). I agree that MADV_DONTNEED is a way to create fragmentation in swap devices. > In > Scudo and jemalloc, there is a configuration option to set the > management granularity. > This granularity is set to match the mTHP size(though the default > value is 16KB in the > latest Android if we don't run mTHP). Otherwise, you could end up with > millions of > partial unmap operations, which would severely degrade the performance of mTHP. > > Imagine libc/Java functioning like a slab allocator. When kfree() is > called, some pages > may become completely unoccupied and can be returned to the buddy allocator. In > userspace, memory is given back to the kernel in a similar manner, > typically using > MADV_DONTNEED. Therefore, MADV_DONTNEED is the most common memory > reclamation behavior in Android, coming with free(), delete() or GC. > > Imagine a system with extensive malloc, free, new, and delete > operations, where objects > are constantly being created and destroyed. > > On the other hand, whether libc/Java use MADV_DONTNEED to free memory is not > crucial, although they do. We need a method to simulate the lifecycle > of applications > —exiting and starting anew—on PCs or Android phones. It doesn't matter if you > use MADV_DONTNEED or munmap to achieve this. > > It is important to note that mTHP currently operates on a one-shot > basis(after swap-out, > you never get them back as mTHP as we don't support large folios > swapin). For the test > program, we need a method to generate new mTHPs continuously. Without this, > after the initial iterations, we would be left with only folios, > rendering the entire > test program *pointless*. I understand the requirements for new mTHPs. >> >> > Additionally, in the event of an application exit, munmap, or OOM killer, the >> > amount of freed memory can be much larger than 64KB. The primary purpose >> > of using MADV_DONTNEED is to release anonymous memory and generate >> > new mTHP so that the iteration can continue. Otherwise, the test program >> > becomes entirely pointless, as we only have large folios at the beginning. >> > That is exactly why Chris has failed to find his bugs by using other small >> > programs. >> >> Although I still don't understand how 64KB aligned MADV_DONTNEED is used >> for libc/java heap or munmap in a practical way. After more thoughts, I >> think 64KB Aligned MADV_DONTNEED can simulate the fragmentation effect >> of processes exit at some degree if 64KB folios in these processes are >> swapped out without splitting. If you have no other practical use >> cases, I suggest to make it explicit with comments in program. >> [snip] -- Best Regards, Huang, Ying