Re: [PATCH 0/2] mm: skip memcg for certain address space

"Vlastimil Babka (SUSE)" <vbabka@xxxxxxxxxx> · Thu, 18 Jul 2024 11:19:46 +0200

On 7/18/24 10:50 AM, Qu Wenruo wrote:
> 
> 
> 在 2024/7/18 17:58, Vlastimil Babka (SUSE) 写道:
>> On 7/18/24 9:52 AM, Qu Wenruo wrote:
>>>
>>> The previous rc kernel. IIRC it's v6.10-rc6.
>>>
>>> But that needs extra btrfs patches, or btrfs are still only doing the
>>> order-0 allocation, then add the order-0 folio into the filemap.
>>>
>>> The extra patch just direct btrfs to allocate an order 2 folio (matching
>>> the default 16K nodesize), then attach the folio to the metadata filemap.
>>>
>>> With extra coding handling corner cases like different folio sizes etc.
>> 
>> Hm right, but the same code is triggered for high-order folios (at least for
>> user mappable page cache) today by some filesystems AFAIK, so we should be
>> seeing such lockups already? btrfs case might be special that it's for the
>> internal node as you explain, but that makes no difference for
>> filemap_add_folio(), right? Or is it the only user with GFP_NOFS? Also is
>> that passed as gfp directly or are there some extra scoped gfp resctrictions
>> involved? (memalloc_..._save()).
> 
> I'm not sure about other fses, but for that hang case, it's very 
> metadata heavy, and ALL folios for that btree inode filemap is in order 
> 2, since we're always allocating the order folios using GFP_NOFAIL, and 
> attaching that folio into the filemap using GFP_NOFAIL too.
> 
> Not sure if other fses can have such situation.

Doh right of course, the __GFP_NOFAIL is the special part compared to the
usual page cache usage.

> [...]
>>> If I understand it correctly, we have implemented release_folio()
>>> callback, which does the btrfs metadata checks to determine if we can
>>> release the current folio, and avoid releasing folios that's still under
>>> IO etc.
>> 
>> I see, thanks. Sounds like there might be potentially some suboptimal
>> handling in that the folio will appear inactive because there's no
>> references that folio_check_references() can detect, unless there's some
>> folio_mark_accessed() calls involved (I see some FGP_ACCESSED in btrfs so
>> maybe that's fine enough) so reclaim could consider it often, only to be
>> stopped by release_folio failing.
> 
> For the page accessed part, btrfs handles it by 
> mark_extent_buffer_accessed() call, and it's called every time we try to 
> grab an extent buffer structure (the structure used to represent a 
> metadata block inside btrfs).
> 
> So the accessed flag part should be fine I guess?

Sounds good then, thanks!

> Thanks,
> Qu
>> 
>>>>
>>>> (sorry if the questions seem noob, I'm not that much familiar with the page
>>>> cache side of mm)
>>>
>>> No worry at all, I'm also a newbie on the whole mm part.
>>>
>>> Thanks,
>>> Qu
>>>
>>>>
>>>>> Thanks,
>>>>> Qu
>>>>
>>