Re: [PATCH 0/2] mm: skip memcg for certain address space

Qu Wenruo <wqu@xxxxxxxx> · Thu, 18 Jul 2024 18:20:47 +0930

在 2024/7/18 17:58, Vlastimil Babka (SUSE) 写道:
On 7/18/24 9:52 AM, Qu Wenruo wrote:

在 2024/7/18 16:47, Vlastimil Babka (SUSE) 写道:
On 7/18/24 12:38 AM, Qu Wenruo wrote:
[...]
Another question is, I only see this hang with larger folio (order 2 vs
the old order 0) when adding to the same address space.

Does the folio order has anything related to the problem or just a
higher order makes it more possible?

I didn't spot anything in the memcg charge path that would depend on the
order directly, hm. Also what kernel version was showing these soft lockups?

The previous rc kernel. IIRC it's v6.10-rc6.

But that needs extra btrfs patches, or btrfs are still only doing the
order-0 allocation, then add the order-0 folio into the filemap.

The extra patch just direct btrfs to allocate an order 2 folio (matching
the default 16K nodesize), then attach the folio to the metadata filemap.

With extra coding handling corner cases like different folio sizes etc.

Hm right, but the same code is triggered for high-order folios (at least for
user mappable page cache) today by some filesystems AFAIK, so we should be
seeing such lockups already? btrfs case might be special that it's for the
internal node as you explain, but that makes no difference for
filemap_add_folio(), right? Or is it the only user with GFP_NOFS? Also is
that passed as gfp directly or are there some extra scoped gfp resctrictions
involved? (memalloc_..._save()).

I'm not sure about other fses, but for that hang case, it's very 
metadata heavy, and ALL folios for that btree inode filemap is in order 
2, since we're always allocating the order folios using GFP_NOFAIL, and 
attaching that folio into the filemap using GFP_NOFAIL too.

Not sure if other fses can have such situation.

[...]
If I understand it correctly, we have implemented release_folio()
callback, which does the btrfs metadata checks to determine if we can
release the current folio, and avoid releasing folios that's still under
IO etc.

I see, thanks. Sounds like there might be potentially some suboptimal
handling in that the folio will appear inactive because there's no
references that folio_check_references() can detect, unless there's some
folio_mark_accessed() calls involved (I see some FGP_ACCESSED in btrfs so
maybe that's fine enough) so reclaim could consider it often, only to be
stopped by release_folio failing.

For the page accessed part, btrfs handles it by 
mark_extent_buffer_accessed() call, and it's called every time we try to 
grab an extent buffer structure (the structure used to represent a 
metadata block inside btrfs).

So the accessed flag part should be fine I guess?

Thanks,
Qu

(sorry if the questions seem noob, I'm not that much familiar with the page
cache side of mm)

No worry at all, I'm also a newbie on the whole mm part.

Thanks,
Qu

Thanks,
Qu