The patch titled Subject: fs: fix leaked psi pressure state has been added to the -mm mm-hotfixes-unstable branch. Its filename is fs-fix-leaked-psi-pressure-state.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/fs-fix-leaked-psi-pressure-state.patch This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Johannes Weiner <hannes@xxxxxxxxxxx> Subject: fs: fix leaked psi pressure state Date: Thu, 3 Nov 2022 17:34:31 -0400 When psi annotations were added to to btrfs compression reads, the psi state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was faulty. A pressure state, once entered, is never left. This results in incorrectly elevated pressure, which triggers OOM kills. pflags record the *previous* memstall state when we enter a new one. The code tried to initialize pflags to 1, and then optimize the leave call when we either didn't enter a memstall, or were already inside a nested stall. However, there can be multiple PageWorkingset pages in the bio, at which point it's that path itself that enters repeatedly and overwrites pflags. This causes us to miss the exit. Enter the stall only once if needed, then unwind correctly. erofs has the same problem, fix that up too. And move the memstall exit past submit_bio() to restore submit accounting originally added by b8e24a9300b0 ("block: annotate refault stalls from IO submission"). Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@xxxxxxxxxxx Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads") Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space") Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer") Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@xxxxxxxxxxxxx/ Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> Reported-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx> Tested-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx> Cc: Chao Yu <chao@xxxxxxxxxx> Cc: Chris Mason <clm@xxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: David Sterba <dsterba@xxxxxxxx> Cc: Gao Xiang <xiang@xxxxxxxxxx> Cc: Jens Axboe <axboe@xxxxxxxxx> Cc: Josef Bacik <josef@xxxxxxxxxxxxxx> Cc: Suren Baghdasaryan <surenb@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/btrfs/compression.c | 14 ++++++++------ fs/erofs/zdata.c | 18 +++++++++++------- 2 files changed, 19 insertions(+), 13 deletions(-) --- a/fs/btrfs/compression.c~fs-fix-leaked-psi-pressure-state +++ a/fs/btrfs/compression.c @@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bi static noinline int add_ra_bio_pages(struct inode *inode, u64 compressed_end, struct compressed_bio *cb, - unsigned long *pflags) + int *memstall, unsigned long *pflags) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); unsigned long end_index; @@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(str continue; } - if (PageWorkingset(page)) + if (!*memstall && PageWorkingset(page)) { psi_memstall_enter(pflags); + *memstall = 1; + } ret = set_page_extent_mapped(page); if (ret < 0) { @@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct u64 em_len; u64 em_start; struct extent_map *em; - /* Initialize to 1 to make skip psi_memstall_leave unless needed */ - unsigned long pflags = 1; + unsigned long pflags; + int memstall = 0; blk_status_t ret; int ret2; int i; @@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct goto fail; } - add_ra_bio_pages(inode, em_start + em_len, cb, &pflags); + add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags); /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; @@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct } } - if (!pflags) + if (memstall) psi_memstall_leave(&pflags); if (refcount_dec_and_test(&cb->pending_ios)) --- a/fs/erofs/zdata.c~fs-fix-leaked-psi-pressure-state +++ a/fs/erofs/zdata.c @@ -1412,8 +1412,8 @@ static void z_erofs_submit_queue(struct struct block_device *last_bdev; unsigned int nr_bios = 0; struct bio *bio = NULL; - /* initialize to 1 to make skip psi_memstall_leave unless needed */ - unsigned long pflags = 1; + unsigned long pflags; + int memstall = 0; bi_private = jobqueueset_init(sb, q, fgq, force_fg); qtail[JQ_BYPASS] = &q[JQ_BYPASS]->head; @@ -1463,14 +1463,18 @@ static void z_erofs_submit_queue(struct if (bio && (cur != last_index + 1 || last_bdev != mdev.m_bdev)) { submit_bio_retry: - if (!pflags) - psi_memstall_leave(&pflags); submit_bio(bio); + if (memstall) { + psi_memstall_leave(&pflags); + memstall = 0; + } bio = NULL; } - if (unlikely(PageWorkingset(page))) + if (unlikely(PageWorkingset(page)) && !memstall) { psi_memstall_enter(&pflags); + memstall = 1; + } if (!bio) { bio = bio_alloc(mdev.m_bdev, BIO_MAX_VECS, @@ -1500,9 +1504,9 @@ submit_bio_retry: } while (owned_head != Z_EROFS_PCLUSTER_TAIL); if (bio) { - if (!pflags) - psi_memstall_leave(&pflags); submit_bio(bio); + if (memstall) + psi_memstall_leave(&pflags); } /* _ Patches currently in -mm which might be from hannes@xxxxxxxxxxx are fs-fix-leaked-psi-pressure-state.patch mm-vmscan-make-rotations-a-secondary-factor-in-balancing-anon-vs-file.patch mm-vmscan-split-khugepaged-stats-from-direct-reclaim-stats.patch mm-vmscan-fix-extreme-overreclaim-and-swap-floods.patch