On 20.01.22 21:09, David Hildenbrand wrote: > On 20.01.22 21:07, Matthew Wilcox wrote: >> On Thu, Jan 20, 2022 at 08:55:12PM +0100, David Hildenbrand wrote: >>>>>> David, does any of it regards the lru_cache_add() reference issue that I >>>>>> mentioned? [1] >> >>> +++ b/mm/memory.c >>> @@ -3291,19 +3291,28 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) >>> if (PageAnon(vmf->page)) { >>> struct page *page = vmf->page; >>> >>> - /* PageKsm() doesn't necessarily raise the page refcount */ >>> - if (PageKsm(page) || page_count(page) != 1) >>> + /* >>> + * PageKsm() doesn't necessarily raise the page refcount. >>> + * >>> + * These checks are racy as long as we haven't locked the page; >>> + * they are a pure optimization to avoid trying to lock the page >>> + * and trying to free the swap cache when there is little hope >>> + * it will actually result in a refcount of 1. >>> + */ >>> + if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page)) >>> goto copy; >>> if (!trylock_page(page)) >>> goto copy; >>> - if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) { >>> + if (PageSwapCache(page)) >>> + try_to_free_swap(page); >>> + if (PageKsm(page) || page_count(page) != 1) { >>> unlock_page(page); >>> goto copy; >>> } >>> /* >>> - * Ok, we've got the only map reference, and the only >>> - * page count reference, and the page is locked, >>> - * it's dark out, and we're wearing sunglasses. Hit it. >>> + * Ok, we've got the only page reference from our mapping >>> + * and the page is locked, it's dark out, and we're wearing >>> + * sunglasses. Hit it. >>> */ >>> unlock_page(page); >>> wp_page_reuse(vmf); >>> >>> >>> I added some vmstats that monitor various paths. After one run of >>> ./forceswap 2 1000000 1 >>> I'm left with a rough delta (including some noise) of >>> anon_wp_copy_count 1799 >>> anon_wp_copy_count_early 1 >>> anon_wp_copy_lock 983396 >>> anon_wp_reuse 0 >>> >>> The relevant part of your reproducer is >>> >>> for (i = 0; i < nops; i++) { >>> if (madvise((void *)p, PAGE_SIZE * npages, MADV_PAGEOUT)) { >>> perror("madvise"); >>> exit(-1); >>> } >>> >>> for (j = 0; j < npages; j++) { >>> c = p[j * PAGE_SIZE]; >>> c++; >>> time -= rdtscp(); >>> p[j * PAGE_SIZE] = c; >>> time += rdtscp(); >>> } >>> } >>> >>> For this specific reproducer at least, the page lock seems to be the thingy that prohibits >>> reuse if I interpret the numbers correctly. We pass the initial page_count() check. >>> >>> Haven't looked into the details, and I would be curious how that performs with actual >>> workloads, if we can reproduce similar behavior. >> >> I don't see how that patch addresses the lru issue. Wouldn't we need >> something like ... >> >> if (!PageLRU(page)) >> lru_add_drain_all(); >> lru_add_drain_all() takes a mutex ... best we can do I guess is drain the local CPU using lru_add_drain(). I'll go play with it and see what breaks :) -- Thanks, David / dhildenb