On 7/18/24 10:50 AM, Qu Wenruo wrote: > > > 在 2024/7/18 17:58, Vlastimil Babka (SUSE) 写道: >> On 7/18/24 9:52 AM, Qu Wenruo wrote: >>> >>> The previous rc kernel. IIRC it's v6.10-rc6. >>> >>> But that needs extra btrfs patches, or btrfs are still only doing the >>> order-0 allocation, then add the order-0 folio into the filemap. >>> >>> The extra patch just direct btrfs to allocate an order 2 folio (matching >>> the default 16K nodesize), then attach the folio to the metadata filemap. >>> >>> With extra coding handling corner cases like different folio sizes etc. >> >> Hm right, but the same code is triggered for high-order folios (at least for >> user mappable page cache) today by some filesystems AFAIK, so we should be >> seeing such lockups already? btrfs case might be special that it's for the >> internal node as you explain, but that makes no difference for >> filemap_add_folio(), right? Or is it the only user with GFP_NOFS? Also is >> that passed as gfp directly or are there some extra scoped gfp resctrictions >> involved? (memalloc_..._save()). > > I'm not sure about other fses, but for that hang case, it's very > metadata heavy, and ALL folios for that btree inode filemap is in order > 2, since we're always allocating the order folios using GFP_NOFAIL, and > attaching that folio into the filemap using GFP_NOFAIL too. > > Not sure if other fses can have such situation. Doh right of course, the __GFP_NOFAIL is the special part compared to the usual page cache usage. > [...] >>> If I understand it correctly, we have implemented release_folio() >>> callback, which does the btrfs metadata checks to determine if we can >>> release the current folio, and avoid releasing folios that's still under >>> IO etc. >> >> I see, thanks. Sounds like there might be potentially some suboptimal >> handling in that the folio will appear inactive because there's no >> references that folio_check_references() can detect, unless there's some >> folio_mark_accessed() calls involved (I see some FGP_ACCESSED in btrfs so >> maybe that's fine enough) so reclaim could consider it often, only to be >> stopped by release_folio failing. > > For the page accessed part, btrfs handles it by > mark_extent_buffer_accessed() call, and it's called every time we try to > grab an extent buffer structure (the structure used to represent a > metadata block inside btrfs). > > So the accessed flag part should be fine I guess? Sounds good then, thanks! > Thanks, > Qu >> >>>> >>>> (sorry if the questions seem noob, I'm not that much familiar with the page >>>> cache side of mm) >>> >>> No worry at all, I'm also a newbie on the whole mm part. >>> >>> Thanks, >>> Qu >>> >>>> >>>>> Thanks, >>>>> Qu >>>> >>