On Thu, Nov 03, 2022 at 11:46:52AM +0100, Thorsten Leemhuis wrote: > Hi Christoph! > > On 15.09.22 11:41, Christoph Hellwig wrote: > > btrfs compressed reads try to always read the entire compressed chunk, > > even if only a subset is requested. Currently this is covered by the > > magic PSI accounting underneath submit_bio, but that is about to go > > away. Instead add manual psi_memstall_{enter,leave} annotations. > > > > Note that for readahead this really should be using readahead_expand, > > but the additionals reads are also done for plain ->read_folio where > > readahead_expand can't work, so this overall logic is left as-is for > > now. > > It seems this patch makes systemd-oomd overreact on my day-to-day > machine and aggressively kill applications. I'm not the only one that > noticed such a behavior with 6.1 pre-releases: > https://bugzilla.redhat.com/show_bug.cgi?id=2133829 > https://bugzilla.redhat.com/show_bug.cgi?id=2134971 > > I think I have a pretty reliable way to trigger the issue that involves > starting the apps that I normally use and a VM that I occasionally use, > which up to now never resulted in such a behaviour. > > On master as of today (8e5423e991e8) I can trigger the problem within a > minute or two. But I fail to trigger it with v6.0.6 or when I revert > 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads"). > And yes, I use btrfs with compression for / and /home/. > > See [1] for a log msg from systemd-oomd. > > Note, I had some trouble with bisecting[2]. This series looked > suspicious, so I removed it completely ontop of master and the problem > went away. Then I tried reverting only 4088a47e78f9 which helped, too. > Let me know if you want me to try another combination or need more data. Oh, I think I see the bug. We can leak pressure state from the bio submission, which causes the task to permanently drive up pressure. Can you try this patch? >From 499e5cab7b39fc4c90a0f96e33cdc03274b316fd Mon Sep 17 00:00:00 2001 From: Johannes Weiner <hannes@xxxxxxxxxxx> Date: Thu, 3 Nov 2022 17:34:31 -0400 Subject: [PATCH] fs: btrfs: fix leaked psi pressure state When psi annotations were added to to btrfs compression reads, the psi state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was faulty. The task can remain in a stall state after the read. This results in incorrectly elevated pressure, which triggers OOM kills. pflags record the *previous* memstall state when we enter a new one. The code tried to initialize pflags to 1, and then optimize the leave call when we either didn't enter a memstall, or were already inside a nested stall. However, there can be multiple PageWorkingset pages in the bio, at which point it's that path itself that re-enters the state and overwrites pflags. This causes us to miss the exit. Enter the stall only once if needed, then unwind correctly. Reported-by: Thorsten Leemhuis <linux@xxxxxxxxxxxxx> Fixes: 4088a47e78f9 btrfs: add manual PSI accounting for compressed reads Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> --- fs/btrfs/compression.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index f1f051ad3147..e6635fe70067 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -512,7 +512,7 @@ static u64 bio_end_offset(struct bio *bio) static noinline int add_ra_bio_pages(struct inode *inode, u64 compressed_end, struct compressed_bio *cb, - unsigned long *pflags) + int *memstall, unsigned long *pflags) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); unsigned long end_index; @@ -581,8 +581,10 @@ static noinline int add_ra_bio_pages(struct inode *inode, continue; } - if (PageWorkingset(page)) + if (!*memstall && PageWorkingset(page)) { psi_memstall_enter(pflags); + *memstall = 1; + } ret = set_page_extent_mapped(page); if (ret < 0) { @@ -670,8 +672,8 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, u64 em_len; u64 em_start; struct extent_map *em; - /* Initialize to 1 to make skip psi_memstall_leave unless needed */ - unsigned long pflags = 1; + unsigned long pflags; + int memstall = 0; blk_status_t ret; int ret2; int i; @@ -727,7 +729,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, goto fail; } - add_ra_bio_pages(inode, em_start + em_len, cb, &pflags); + add_ra_bio_pages(inode, em_start + em_len, cb, &memstall, &pflags); /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; @@ -807,7 +809,7 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, } } - if (!pflags) + if (memstall) psi_memstall_leave(&pflags); if (refcount_dec_and_test(&cb->pending_ios)) -- 2.38.1