On Wed, Jul 10, 2024 at 6:47 PM zhiguojiang <justinjiang@xxxxxxxx> wrote: > > > > 在 2024/7/10 12:44, Barry Song 写道: > > [Some people who received this message don't often get email from 21cnbao@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > > > On Wed, Jul 10, 2024 at 4:04 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> On 10.07.24 06:02, Barry Song wrote: > >>> On Wed, Jul 10, 2024 at 3:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >>>> On 10.07.24 05:32, Barry Song wrote: > >>>>> On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > >>>>>> On Tue, 9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@xxxxxxxx> wrote: > >>>>>> > >>>>>>> The releasing process of the non-shared anonymous folio mapped solely by > >>>>>>> an exiting process may go through two flows: 1) the anonymous folio is > >>>>>>> firstly is swaped-out into swapspace and transformed into a swp_entry > >>>>>>> in shrink_folio_list; 2) then the swp_entry is released in the process > >>>>>>> exiting flow. This will result in the high cpu load of releasing a > >>>>>>> non-shared anonymous folio mapped solely by an exiting process. > >>>>>>> > >>>>>>> When the low system memory and the exiting process exist at the same > >>>>>>> time, it will be likely to happen, because the non-shared anonymous > >>>>>>> folio mapped solely by an exiting process may be reclaimed by > >>>>>>> shrink_folio_list. > >>>>>>> > >>>>>>> This patch is that shrink skips the non-shared anonymous folio solely > >>>>>>> mapped by an exting process and this folio is only released directly in > >>>>>>> the process exiting flow, which will save swap-out time and alleviate > >>>>>>> the load of the process exiting. > >>>>>> It would be helpful to provide some before-and-after runtime > >>>>>> measurements, please. It's a performance optimization so please let's > >>>>>> see what effect it has. > >>>>> Hi Andrew, > >>>>> > >>>>> This was something I was curious about too, so I created a small test program > >>>>> that allocates and continuously writes to 256MB of memory. Using QEMU, I set > >>>>> up a small machine with only 300MB of RAM to trigger kswapd. > >>>>> > >>>>> qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \ > >>>>> -smp cpus=4 -cpu max \ > >>>>> -m 300M -kernel arch/arm64/boot/Image > >>>>> > >>>>> The test program will be randomly terminated by its subprocess to trigger > >>>>> the use case of this patch. > >>>>> > >>>>> #include <stdio.h> > >>>>> #include <stdlib.h> > >>>>> #include <unistd.h> > >>>>> #include <string.h> > >>>>> #include <sys/types.h> > >>>>> #include <sys/wait.h> > >>>>> #include <time.h> > >>>>> #include <signal.h> > >>>>> > >>>>> #define MEMORY_SIZE (256 * 1024 * 1024) > >>>>> > >>>>> unsigned char *memory; > >>>>> > >>>>> void allocate_and_write_memory() > >>>>> { > >>>>> memory = (unsigned char *)malloc(MEMORY_SIZE); > >>>>> if (memory == NULL) { > >>>>> perror("malloc"); > >>>>> exit(EXIT_FAILURE); > >>>>> } > >>>>> > >>>>> while (1) > >>>>> memset(memory, 0x11, MEMORY_SIZE); > >>>>> } > >>>>> > >>>>> int main() > >>>>> { > >>>>> pid_t pid; > >>>>> srand(time(NULL)); > >>>>> > >>>>> pid = fork(); > >>>>> > >>>>> if (pid < 0) { > >>>>> perror("fork"); > >>>>> exit(EXIT_FAILURE); > >>>>> } > >>>>> > >>>>> if (pid == 0) { > >>>>> int delay = (rand() % 10000) + 10000; > >>>>> usleep(delay * 1000); > >>>>> > >>>>> /* kill parent when it is busy on swapping */ > >>>>> kill(getppid(), SIGKILL); > >>>>> _exit(0); > >>>>> } else { > >>>>> allocate_and_write_memory(); > >>>>> > >>>>> wait(NULL); > >>>>> > >>>>> free(memory); > >>>>> } > >>>>> > >>>>> return 0; > >>>>> } > >>>>> > >>>>> I tracked the number of folios that could be redundantly > >>>>> swapped out by adding a simple counter as shown below: > >>>>> > >>>>> @@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio, > >>>>> check_stable_address_space(vma->vm_mm)) && > >>>>> folio_test_swapbacked(folio) && > >>>>> !folio_likely_mapped_shared(folio)) { > >>>>> + static long i, size; > >>>>> + size += folio_size(folio); > >>>>> + pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size); > >>>>> pra->referenced = -1; > >>>>> page_vma_mapped_walk_done(&pvmw); > >>>>> return false; > >>>>> > >>>>> > >>>>> This is what I have observed: > >>>>> > >>>>> / # /home/barry/develop/linux/skip_swap_out_test > >>>>> [ 82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536 > >>>>> [ 82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072 > >>>>> [ 82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608 > >>>>> [ 82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144 > >>>>> [ 82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680 > >>>>> [ 82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216 > >>>>> ... > >>>>> [ 88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848 > >>>>> [ 88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384 > >>>>> [ 89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480 > >>>>> > >>>>> I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB > >>>>> mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during > >>>>> the process exit. > >>>>> > >>>>> Despite the numerous mistakes Zhiguo made in sending this patch, it is still > >>>>> quite valuable. Please consider pulling his v9 into the mm tree for testing. > >>>> BTW, we dropped the folio_test_anon() check, but what about shmem? They > >>>> also do __folio_set_swapbacked()? > >>> my point is that the purpose is skipping redundant swap-out, if shmem is single > >>> mapped, they could be also skipped. > >> But they won't get necessarily *freed* when unmapping them. They might > >> just continue living in tmpfs? where some other process might just map > >> them later? > >> > > You're correct. I overlooked this aspect, focusing on swap and thinking of shmem > > solely in terms of swap. > > > >> IMHO, there is a big difference here between anon and shmem. (well, > >> anon_shmem would actually be different :) ) > > Even though anon_shmem behaves similarly to anonymous memory when > > releasing memory, it doesn't seem worth the added complexity? > > > > So unfortunately it seems Zhiguo still needs v10 to take folio_test_anon() > > back? Sorry for my bad, Zhiguo. > If folio_test_anon(folio) && folio_test_swapbacked(folio) condition is > used, can > it means that the folio is anonymous anther than shmem definitely? So does > folio_likely_mapped_shared() need to be removed? No, shared memory (shmem) isn't necessarily shared, and private anonymous memory isn't necessarily unshared. There is no direct relationship between them. In the case of a fork, your private anonymous folio can be shared by two or more processes before CoW. > > > >> -- > >> Cheers, > >> > >> David / dhildenb > >> > > Thanks > > Barry > Thanks > Zhiguo >