在 2024/7/10 15:11, Barry Song 写道:
[Some people who received this message don't often get email from 21cnbao@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
On Wed, Jul 10, 2024 at 6:47 PM zhiguojiang <justinjiang@xxxxxxxx> wrote:
在 2024/7/10 12:44, Barry Song 写道:
[Some people who received this message don't often get email from 21cnbao@xxxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
On Wed, Jul 10, 2024 at 4:04 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 10.07.24 06:02, Barry Song wrote:
On Wed, Jul 10, 2024 at 3:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 10.07.24 05:32, Barry Song wrote:
On Wed, Jul 10, 2024 at 9:23 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Tue, 9 Jul 2024 20:31:15 +0800 Zhiguo Jiang <justinjiang@xxxxxxxx> wrote:
The releasing process of the non-shared anonymous folio mapped solely by
an exiting process may go through two flows: 1) the anonymous folio is
firstly is swaped-out into swapspace and transformed into a swp_entry
in shrink_folio_list; 2) then the swp_entry is released in the process
exiting flow. This will result in the high cpu load of releasing a
non-shared anonymous folio mapped solely by an exiting process.
When the low system memory and the exiting process exist at the same
time, it will be likely to happen, because the non-shared anonymous
folio mapped solely by an exiting process may be reclaimed by
shrink_folio_list.
This patch is that shrink skips the non-shared anonymous folio solely
mapped by an exting process and this folio is only released directly in
the process exiting flow, which will save swap-out time and alleviate
the load of the process exiting.
It would be helpful to provide some before-and-after runtime
measurements, please. It's a performance optimization so please let's
see what effect it has.
Hi Andrew,
This was something I was curious about too, so I created a small test program
that allocates and continuously writes to 256MB of memory. Using QEMU, I set
up a small machine with only 300MB of RAM to trigger kswapd.
qemu-system-aarch64 -M virt,gic-version=3,mte=off -nographic \
-smp cpus=4 -cpu max \
-m 300M -kernel arch/arm64/boot/Image
The test program will be randomly terminated by its subprocess to trigger
the use case of this patch.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <signal.h>
#define MEMORY_SIZE (256 * 1024 * 1024)
unsigned char *memory;
void allocate_and_write_memory()
{
memory = (unsigned char *)malloc(MEMORY_SIZE);
if (memory == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
while (1)
memset(memory, 0x11, MEMORY_SIZE);
}
int main()
{
pid_t pid;
srand(time(NULL));
pid = fork();
if (pid < 0) {
perror("fork");
exit(EXIT_FAILURE);
}
if (pid == 0) {
int delay = (rand() % 10000) + 10000;
usleep(delay * 1000);
/* kill parent when it is busy on swapping */
kill(getppid(), SIGKILL);
_exit(0);
} else {
allocate_and_write_memory();
wait(NULL);
free(memory);
}
return 0;
}
I tracked the number of folios that could be redundantly
swapped out by adding a simple counter as shown below:
@@ -879,6 +880,9 @@ static bool folio_referenced_one(struct folio *folio,
check_stable_address_space(vma->vm_mm)) &&
folio_test_swapbacked(folio) &&
!folio_likely_mapped_shared(folio)) {
+ static long i, size;
+ size += folio_size(folio);
+ pr_err("index: %d skipped folio:%lx total size:%d\n", i++, (unsigned long)folio, size);
pra->referenced = -1;
page_vma_mapped_walk_done(&pvmw);
return false;
This is what I have observed:
/ # /home/barry/develop/linux/skip_swap_out_test
[ 82.925645] index: 0 skipped folio:fffffdffc0425400 total size:65536
[ 82.925960] index: 1 skipped folio:fffffdffc0425800 total size:131072
[ 82.927524] index: 2 skipped folio:fffffdffc0425c00 total size:196608
[ 82.928649] index: 3 skipped folio:fffffdffc0426000 total size:262144
[ 82.929383] index: 4 skipped folio:fffffdffc0426400 total size:327680
[ 82.929995] index: 5 skipped folio:fffffdffc0426800 total size:393216
...
[ 88.469130] index: 6112 skipped folio:fffffdffc0390080 total size:97230848
[ 88.469966] index: 6113 skipped folio:fffffdffc038d000 total size:97296384
[ 89.023414] index: 6114 skipped folio:fffffdffc0366cc0 total size:97300480
I observed that this patch effectively skipped 6114 folios (either 4KB or 64KB
mTHP), potentially reducing the swap-out by up to 92MB (97,300,480 bytes) during
the process exit.
Despite the numerous mistakes Zhiguo made in sending this patch, it is still
quite valuable. Please consider pulling his v9 into the mm tree for testing.
BTW, we dropped the folio_test_anon() check, but what about shmem? They
also do __folio_set_swapbacked()?
my point is that the purpose is skipping redundant swap-out, if shmem is single
mapped, they could be also skipped.
But they won't get necessarily *freed* when unmapping them. They might
just continue living in tmpfs? where some other process might just map
them later?
You're correct. I overlooked this aspect, focusing on swap and thinking of shmem
solely in terms of swap.
IMHO, there is a big difference here between anon and shmem. (well,
anon_shmem would actually be different :) )
Even though anon_shmem behaves similarly to anonymous memory when
releasing memory, it doesn't seem worth the added complexity?
So unfortunately it seems Zhiguo still needs v10 to take folio_test_anon()
back? Sorry for my bad, Zhiguo.
If folio_test_anon(folio) && folio_test_swapbacked(folio) condition is
used, can
it means that the folio is anonymous anther than shmem definitely? So does
folio_likely_mapped_shared() need to be removed?
No, shared memory (shmem) isn't necessarily shared, and private anonymous
memory isn't necessarily unshared. There is no direct relationship between
them.
In the case of a fork, your private anonymous folio can be shared by
two or more processes before CoW.
Hi,
I have added folio_test_anon(folio) condition in v10.
Thanks
--
Cheers,
David / dhildenb
Thanks
Barry
Thanks
Zhiguo