On Wed, Apr 24, 2024 at 11:20:28AM -0400, Peter Xu wrote: > On Fri, Apr 05, 2024 at 04:32:23PM +0100, Matthew Wilcox (Oracle) wrote: > > free_unref_folios() can now handle non-hugetlb large folios, so keep > > normal large folios in the batch. hugetlb folios still need to be > > handled specially. I believe that folios freed using put_pages_list() > > cannot be accounted to a memcg (or the small folios would trip the "page > > still charged to cgroup" warning), but put an assertion in to check that. > > There's such user, iommu uses put_pages_list() to free IOMMU pgtables, and > they can be memcg accounted; since 2023 iommu_map switched to use > GFP_KERNEL_ACCOUNT. > > I hit below panic when testing my local branch over mm-everthing when > running some VFIO workloads. > > For this specific vfio use case, see 160912fc3d4a ("vfio/type1: account > iommu allocations"). > > I think we should remove the VM_BUG_ON_FOLIO() line, as the memcg will then > be properly taken care of later in free_pages_prepare(). Fixup attached at > the end that will fix this crash for me. Yes, I think you're right. I was concerned about the deferred split list / memcg charge problem, but (a) page table pages can't ever be on the deferred split list, (b) just passing them through to free_unref_folios() works fine. The problem was that folios_put_refs() was uncharging a batch before passing them to free_unref_folios(). That does bring up the question though ... should we be uncharging these folios as a batch for better performance? Do you have a workload which frees a lot of page tables? Presumably an exit would do that. If so, does adding a call to mem_cgroup_uncharge_folios() before calling free_unref_folios() improve performance in any noticable way? In the meantime, this patch: Reviewed-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> although I think Andrew will just fold it into "mm: free non-hugetlb large folios in a batch" Andrew, if you do do that, please also edit out the last couple of sentences from the commit message: free_unref_folios() can now handle non-hugetlb large folios, so keep normal large folios in the batch. hugetlb folios still need to be handled - specially. I believe that folios freed using put_pages_list() cannot be - accounted to a memcg (or the small folios would trip the "page still - charged to cgroup" warning), but put an assertion in to check that. + specially.